• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 84
  • 20
  • 13
  • 8
  • 5
  • 2
  • 1
  • Tagged with
  • 161
  • 161
  • 161
  • 48
  • 35
  • 33
  • 28
  • 26
  • 25
  • 24
  • 24
  • 23
  • 22
  • 20
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Meta-Learning as a Markov Decision Process / Meta-Learning en tant que processus de décision Markovien

Sun-Hosoya, Lisheng 19 December 2019 (has links)
L'apprentissage automatique (ML) a connu d'énormes succès ces dernières années et repose sur un nombre toujours croissant d'applications réelles. Cependant, la conception d'algorithmes prometteurs pour un problème spécifique nécessite toujours un effort humain considérable. L'apprentissage automatique (AutoML) a pour objectif de sortir l'homme de la boucle. AutoML est généralement traité comme un problème de sélection d’algorithme / hyper-paramètre. Les approches existantes incluent l’optimisation Bayésienne, les algorithmes évolutionnistes et l’apprentissage par renforcement. Parmi eux, auto-sklearn, qui intègre des techniques de meta-learning à l'initialisation de la recherche, occupe toujours une place de choix dans les challenges AutoML. Cette observation a orienté mes recherches vers le domaine du meta-learning. Cette orientation m'a amené à développer un nouveau cadre basé sur les processus de décision Markovien (MDP) et l'apprentissage par renforcement (RL). Après une introduction générale (chapitre 1), mon travail de thèse commence par une analyse approfondie des résultats du Challenge AutoML (chapitre 2). Cette analyse a orienté mon travail vers le meta-learning, menant tout d’abord à proposer une formulation d’AutoML en tant que problème de recommandation, puis à formuler une nouvelle conceptualisation du problème en tant que MDP (chapitre 3). Dans le cadre du MDP, le problème consiste à remplir de manière aussi rapide et efficace que possible une matrice S de meta-learning, dans laquelle les lignes correspondent aux tâches et les colonnes aux algorithmes. Un élément de matrice S (i, j) est la performance de l'algorithme j appliqué à la tâche i. La recherche efficace des meilleures valeurs dans S nous permet d’identifier rapidement les algorithmes les mieux adaptés à des tâches données. Dans le chapitre 4, nous examinons d’abord le cadre classique d’optimisation des hyper-paramètres. Au chapitre 5, une première approche de meta-learning est introduite, qui combine des techniques d'apprentissage actif et de filtrage collaboratif pour prédire les valeurs manquantes dans S. Nos dernières recherches appliquent RL au problème du MDP défini pour apprendre une politique efficace d’exploration de S. Nous appelons cette approche REVEAL et proposons une analogie avec une série de jeux pour permettre de visualiser les stratégies des agents pour révéler progressivement les informations. Cette ligne de recherche est développée au chapitre 6. Les principaux résultats de mon projet de thèse sont : 1) Sélection HP / modèle : j'ai exploré la méthode Freeze-Thaw et optimisé l'algorithme pour entrer dans le premier challenge AutoML, obtenant la 3ème place du tour final (chapitre 3). 2) ActivMetaL : j'ai conçu un nouvel algorithme pour le meta-learning actif (ActivMetaL) et l'ai comparé à d'autres méthodes de base sur des données réelles et artificielles. Cette étude a démontré qu'ActiveMetaL est généralement capable de découvrir le meilleur algorithme plus rapidement que les méthodes de base. 3) REVEAL : j'ai développé une nouvelle conceptualisation du meta-learning en tant que processus de décision Markovien et je l'ai intégrée dans le cadre plus général des jeux REVEAL. Avec un stagiaire en master, j'ai développé des agents qui apprennent (avec l'apprentissage par renforcement) à prédire le meilleur algorithme à essayer. Le travail présenté dans ma thèse est de nature empirique. Plusieurs méta-données du monde réel ont été utilisées dans cette recherche. Des méta-données artificielles et semi-artificielles sont également utilisées dans mon travail. Les résultats indiquent que RL est une approche viable de ce problème, bien qu'il reste encore beaucoup à faire pour optimiser les algorithmes et les faire passer à l’échelle aux problèmes de méta-apprentissage plus vastes. / Machine Learning (ML) has enjoyed huge successes in recent years and an ever- growing number of real-world applications rely on it. However, designing promising algorithms for a specific problem still requires huge human effort. Automated Machine Learning (AutoML) aims at taking the human out of the loop and develop machines that generate / recommend good algorithms for a given ML tasks. AutoML is usually treated as an algorithm / hyper-parameter selection problems, existing approaches include Bayesian optimization, evolutionary algorithms as well as reinforcement learning. Among them, auto-sklearn which incorporates meta-learning techniques in their search initialization, ranks consistently well in AutoML challenges. This observation oriented my research to the Meta-Learning domain. This direction led me to develop a novel framework based on Markov Decision Processes (MDP) and reinforcement learning (RL).After a general introduction (Chapter 1), my thesis work starts with an in-depth analysis of the results of the AutoML challenge (Chapter 2). This analysis oriented my work towards meta-learning, leading me first to propose a formulation of AutoML as a recommendation problem, and ultimately to formulate a novel conceptualisation of the problem as a MDP (Chapter 3). In the MDP setting, the problem is brought back to filling up, as quickly and efficiently as possible, a meta-learning matrix S, in which lines correspond to ML tasks and columns to ML algorithms. A matrix element S(i, j) is the performance of algorithm j applied to task i. Searching efficiently for the best values in S allows us to identify quickly algorithms best suited to given tasks. In Chapter 4 the classical hyper-parameter optimization framework (HyperOpt) is first reviewed. In Chapter 5 a first meta-learning approach is introduced along the lines of our paper ActivMetaL that combines active learning and collaborative filtering techniques to predict the missing values in S. Our latest research applies RL to the MDP problem we defined to learn an efficient policy to explore S. We call this approach REVEAL and propose an analogy with a series of toy games to help visualize agents’ strategies to reveal information progressively, e.g. masked areas of images to be classified, or ship positions in a battleship game. This line of research is developed in Chapter 6. The main results of my PhD project are: 1) HP / model selection: I have explored the Freeze-Thaw method and optimized the algorithm to enter the first AutoML challenge, achieving 3rd place in the final round (Chapter 3). 2) ActivMetaL: I have designed a new algorithm for active meta-learning (ActivMetaL) and compared it with other baseline methods on real-world and artificial data. This study demonstrated that ActiveMetaL is generally able to discover the best algorithm faster than baseline methods. 3) REVEAL: I developed a new conceptualization of meta-learning as a Markov Decision Process and put it into the more general framework of REVEAL games. With a master student intern, I developed agents that learns (with reinforcement learning) to predict the next best algorithm to be tried. To develop this agent, we used surrogate toy tasks of REVEAL games. We then applied our methods to AutoML problems. The work presented in my thesis is empirical in nature. Several real world meta-datasets were used in this research. Artificial and semi-artificial meta-datasets are also used in my work. The results indicate that RL is a viable approach to this problem, although much work remains to be done to optimize algorithms to make them scale to larger meta-learning problems.
52

Selectively decentralized reinforcement learning

Nguyen, Thanh Minh 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The main contributions in this thesis include the selectively decentralized method in solving multi-agent reinforcement learning problems and the discretized Markov-decision-process (MDP) algorithm to compute the sub-optimal learning policy in completely unknown learning and control problems. These contributions tackle several challenges in multi-agent reinforcement learning: the unknown and dynamic nature of the learning environment, the difficulty in computing the closed-form solution of the learning problem, the slow learning performance in large-scale systems, and the questions of how/when/to whom the learning agents should communicate among themselves. Through this thesis, the selectively decentralized method, which evaluates all of the possible communicative strategies, not only increases the learning speed, achieves better learning goals but also could learn the communicative policy for each learning agent. Compared to the other state-of-the-art approaches, this thesis’s contributions offer two advantages. First, the selectively decentralized method could incorporate a wide range of well-known algorithms, including the discretized MDP, in single-agent reinforcement learning; meanwhile, the state-of-the-art approaches usually could be applied for one class of algorithms. Second, the discretized MDP algorithm could compute the sub-optimal learning policy when the environment is described in general nonlinear format; meanwhile, the other state-of-the-art approaches often assume that the environment is in limited format, particularly in feedback-linearization form. This thesis also discusses several alternative approaches for multi-agent learning, including Multidisciplinary Optimization. In addition, this thesis shows how the selectively decentralized method could successfully solve several real-worlds problems, particularly in mechanical and biological systems.
53

Improved Heuristic Search Algorithms for Decision-Theoretic Planning

Abdoulahi, Ibrahim 08 December 2017 (has links)
A large class of practical planning problems that require reasoning about uncertain outcomes, as well as tradeoffs among competing goals, can be modeled as Markov decision processes (MDPs). This model has been studied for over 60 years, and has many applications that range from stochastic inventory control and supply-chain planning, to probabilistic model checking and robotic control. Standard dynamic programming algorithms solve these problems for the entire state space. A more efficient heuristic search approach focuses computation on solving these problems for the relevant part of the state space only, given a start state, and using heuristics to identify irrelevant parts of the state space that can be safely ignored. This dissertation considers the heuristic search approach to this class of problems, and makes three contributions that advance this approach. The first contribution is a novel algorithm for solving MDPs that integrates the standard value iteration algorithm with branch-and-bound search. Called branch-and-bound value iteration, the new algorithm has several advantages over existing algorithms. The second contribution is the integration of recently-developed suboptimality bounds in heuristic search algorithm for MDPs, making it possible for iterative algorithms for solving these planning problems to detect convergence to a bounded-suboptimal solution. The third contribution is the evaluation and analysis of some techniques that are widely-used by state-of-the-art planning algorithms, the identification of some weaknesses of these techniques, and the development of a more efficient implementation of one of these techniques -- a solved-labeling procedure that speeds converge by leveraging a decomposition of the state-space graph of a planning problem into strongly-connected components. The new algorithms and techniques introduced in this dissertation are experimentally evaluated on a range of widely-used planning benchmarks.
54

Principals' Perceptions and Self-efficacy in Relation to School Security

Jones, Julian 01 January 2015 (has links)
Principals in the nation's schools have been tasked with managing crisis incidents that may occur with students and others on their campuses on a daily basis. The purposes of this study were to determine the differences, if any, that existed in Central Florida public school principals' perceptions regarding school security, their perceived confidence to address critical crisis incidents on their campuses, their perceptions of the likelihood critical incidents would occur, their perceptions of interaction with law enforcement, the critical incidents they fear the most, and their perceptions of factors impacting the incidents they fear the most. Principal subgroup mean responses to the Principal Safety and Security Perceptions Survey in the three areas of Bandura's (1997) triadic reciprocal causation were examined in the context of principals' gender, longevity, student enrollment, grade configuration, free and reduced lunch rate, presence of a law enforcement officer, and presence of a security plan. Findings revealed significant differences between categorical groups of principals in multiple areas. It was determined that significant differences in principals' perceptions warrant further study. Recommendations for practice include security policy development and practical application of noted trends.
55

An Operating System Architecture and Hybrid Scheduling Methodology for Real-Time Systems with Uncertainty

Apte, Manoj Shriganesh 11 December 2004 (has links)
Personal computer desktops, and other standardized computer architectures are optimized to provide the best performance for frequently occurring conditions. Real-time systems designed using worst-case analysis for such architectures under-utilize the hardware. This shortcoming provides the motivation for scheduling algorithms that can improve overall utilization by accounting for inherent uncertainty in task execution duration. A real-time task dispatcher must perform its function with constant scheduling overhead. Given the NP-hard nature of the problem of scheduling non-preemptible tasks, dispatch decisions for such systems cannot be made in real-time. This argues for a hybrid architecture that includes an offline policy generator, and an online dispatcher. This dissertation proposes, and demonstrates a hybrid operating system architecture that enables cost-optimal task dispatch on Commercial-Off-The-Shelf (COTS) systems. This is achieved by explicitly accounting for the stochastic nature of each task?s execution time, and dynamically learning the system behavior. Decision Theoretic Scheduling (DTS) provides the framework for scheduling under uncertainty. The real-time scheduling problem is cast as a Markov Decision Process (MDP). An offline policy generator discovers an epsilon-optimal policy using value iteration with model learning. For the selected representation of state, action, model, and rewards, the policydiscovered using value iteration is proved to have a probability of failure that is less than any arbitrarily small user-specified value. The PromisQoS operating system architecture demonstrates a practical implementation of the proposed approach. PromisQoS is a Linux based platform that supports concurrent execution of time-based (preemptible and non-preemptible) real-time tasks, and best-effort processes on an interactive workstation. Several examples demonstrate that model learning, and scheduling under uncertainty enables PromisQoS to achieve better CPU utilization than other scheduling methods. Real-time task sets that solve practical problems, such as a Laplace solver, matrix multiplication, and transpose, demonstrate the robustness and correctness of PromisQoS design and implementation. This pioneering application demonstrates the feasibility of MDP based scheduling for real-time tasks in practical systems. It also opens avenues for further research into the use of such DTS techniques in real-time system design.
56

Resource Allocation to Improve Equity in Service Operations

Yang, Muer 23 September 2011 (has links)
No description available.
57

Analysis of Attacks on Controlled Stochastic Systems

Russo, Alessio January 2022 (has links)
In this thesis, we investigate attack vectors against Markov decision processes anddynamical systems. This work is motivated by the recent interest in the researchcommunity towards making Machine Learning models safer to malicious attacks. Wefocus on different attack vectors: (I) attacks that alter the input/output signal of aMarkov decision process; (II) eavesdropping attacks whose aim is to detect a change ina dynamical system; (III) poisoning attacks against data-driven control methods.(I) For attacks on Markov decision processes we focus on 2 types of attacks: (1) attacksthat alter the observations of the victim, and (2) attacks that alter the control signalof the victim. Regarding (1), we investigate the problem of devising optimal attacksthat minimize the collected reward of the victim. We show that when the policy andthe system are known to the attacker, designing optimal attacks amounts to solving aMarkov decision process. We also show that, for the victim, the system uncertaintiesinduced by the attack can be modeled using a Partially Observable Markov decisionprocess (POMDP) framework. We demonstrate that using Reinforcement Learningmethods tailored to POMDP lead to more resilient policies. Regarding (2), we insteadinvestigate the problem of designing optimal stealthy poisoning attacks on the controlchannel of Markov decision processes. Previous work constrained the amplitude ofthe adversarial perturbation, with the hope that this constraint will make the attackimperceptible. However, such constraints do not grant any level of undetectabilityand do not take into account the dynamic nature of the underlying Markov process.To design an optimal stealthy attack, we investigate a new attack formulation, basedon information-theoretical quantities, that considers the objective of minimizing thedetectability of the attack as well as the performance of the controlled process.(II) In the second part of this thesis we analyse the problem where an eavesdropper triesto detect a change in a Markov decision process. These processes may be affected bychanges that need to remain private. We study the problem using theoretical tools fromoptimal detection theory to motivate a definition of online privacy based on the averageamount of information per observation of the underlying stochastic system. We provideways to derive privacy upper-bounds and compute policies that attain a higher privacylevel, concluding with examples and numerical simulations.(III) Lastly, we investigate poisoning attacks against data-driven control methods.Specifically, we analyse how a malicious adversary can slightly poison the data soas to minimize the performance of a controller trained using this data. We show thatidentifying the most impactful attack boils down to solving a bi-level non-convexoptimization problem, and provide theoretical insights on the attack. We present ageneric algorithm finding a local optimum of this problem and illustrate our analysisfor various techniques. Numerical experiments reveal that minimal but well-craftedchanges in the data-set are sufficient to deteriorate the performance of data-drivencontrol methods significantly, and even make the closed-loop system unstable. / <p>QC 20220510</p><p></p><p>Topic: Alessio Russo - LicentiateTime: May 31, 2022 04:00 PM Madrid</p><p> Zoom Meeting link https://kth-se.zoom.us/j/69452765598</p>
58

Cognitive Radar Applied To Target Tracking Using Markov Decision Processes

Selvi, Ersin Suleyman 30 January 2018 (has links)
The radio-frequency spectrum is a precious resource, with many applications and users, especially with the recent spectrum auction in the United States. Future platforms and devices, such as radars and radios, need to be adaptive to their spectral environment in order to continue serving the needs of their users. This thesis considers an environment with one tracking radar, a single target, and a communications system. The radar-communications coexistence problem is modeled as a Markov decision process (MDP), and reinforcement learning is applied to drive the radar to optimal behavior. / Master of Science
59

Designförslag på belöningsfunktioner för självkörande bilar i TORCS som inte krockar / Design suggestion on reward functions for self-driving cars in TORCS that do not crash

Andersson, Björn, Eriksson, Felix January 2018 (has links)
Den här studien använder sig av TORCS (The Open Racing Car Simulator) som är ett intressant spel att skapa självkörande bilar i då det finns nitton olika typer av sensorer som beskriver omgivningen för agenten. Problemet för denna studie har varit att identifiera vilka av alla dessa sensorer som kan användas i en belöningsfunktion och hur denna sedan skall implementeras. Studien har anammat en kvantitativa experimentell studie där forskningsfrågan är: Hur kan en belöningsfunktion utformas så att agenten klarar av att manövrera i spelet TORCS utan att krocka och med ett konsekvent resultat Den kvantitativ experimentell studien valdes då författarna behövde designa, implementera, utföra experiment och utvärdera resultatet för respektive belöningsfunktion. Det har utförts totalt femton experiment över tolv olika belöningsfunktioner i spelet TORCS på två olika banor E-Track 5(E-5) och Aalborg. De tolv belöningsfunktionerna utförde varsitt experiment på E-5 där de tre som fick bäst resultat: Charlie, Foxtrot och Juliette utförde ett experiment på Aalborg, då denna är en svårare bana. Detta för att kunna styrka om den kan köra på mer än en bana och om belöningsfunktionen då är generell. Juliette är den belöningsfunktion som var ensam med att klara både E-5 och Aalborg utan att krocka. Genom de utförda experimenten drogs slutsatsen att Juliette uppfyller forskningsfrågan då den klarar bägge banorna utan att krocka och när den lyckas får den ett konsekvent resultat. Studien har därför lyckats designa och implementera en belöningsfunktion som uppfyller forskningsfrågan. / For this study TORCS (The Open Racing Car Simulator) have been used, since it is an interesting game to create self-driving cars in. This is due to the fact there is nineteen different sensors available that describes the environment for the agent. The problem for this study has been to identify what sensor can be used in a reward function and how should this reward function be implemented. The study have been utilizing a quantitative experimental method where the research questions have been: How can a reward function be designed so that an Agent can maneuver in TORCS without crashing and at the same time have a consistent result The quantitative experimental method was picked since the writer’s hade to design, implement, conduct experiment and evaluate the result for each reward function. Fifteen experiments have been conducted over twelve reward functions on two different maps: E-Track 5 (E-5) and Aalborg. Each of the twelve reward function conducted an experiment on E-5, where the three once with the best result: Charlie, Foxtrot and Juliette conducted an additional experiment on Aalborg. The test on Aalborg was conducted in order to prove if the reward function can maneuver on more than one map. Juliette was the only reward function that managed to complete a lap on both E-5 and Aalborg without crashing. Based on the conducted experiment the conclusion that Juliette fulfills the research question was made, due to it being capable of completing both maps without crashing and if it succeeded it gets a consistent result. Therefor this study has succeeded in answering the research question.
60

Optimal mobility patterns in epidemic networks

Nirkhiwale, Supriya January 1900 (has links)
Master of Science / Department of Electrical and Computer Engineering / Caterina M. Scoglio / Disruption Tolerant Networks or opportunistic networks represent a class of networks where there is no contemporaneous path from source to destination. In other words, these are networks with intermittent connections. These networks are generally sparse or highly mobile wireless networks. Each node has a limited radio range and the connections between nodes may be disrupted due to node movement, hostile environments or power sleep schedules, etc. A common example of such networks is a sensor network monitoring nature or military field or a herd of animals under study. Epidemic routing is a widely proposed routing mechanism for data propagation in these type of networks. According to this mechanism, the source copies its packets to all the nodes it meets in its radio range. These nodes in turn copy the received packets to the other nodes they meet and so on. The data to be transmitted travels in a way analogous to the spread of an infection in a biological network. The destination finally receives the packet and measures are taken to eradicate the packet from the network. The task of routing in epidemic networks faces certain difficulties involving minimizing the delivery delay with a reduced consumption of resources. Every node has severe power constraints and the network is also susceptible to temporary but random failure of nodes. In the previous work, the parameter of mobility has been considered a constant for a certain setting. In our setting, we consider a varying parameter of mobility. In this framework, we determine the optimal mobility pattern and a forwarding policy that a network should follow in order to meet the trade-off between delivery delay and power consumption. In addition, the mobility pattern should be such that it can be practically incorporated. In our work, we formulate an optimization problem which is solved by using the principles of dynamic programming. We have tested the optimal algorithm through extensive simulations and they show that this optimization problem has a global solution.

Page generated in 0.1948 seconds