1 |
Learning successful strategies in repeated general-sum games /Crandall, Jacob W., January 2005 (has links) (PDF)
Thesis (Ph.D.)--Brigham Young University. Dept. of Computer Science, 2005. / Includes bibliographical references (p. 163-168).
|
2 |
Task localization, similarity, and transfer : towards a reinforcement learning task library system /Carroll, James Lamond, January 2005 (has links) (PDF)
Thesis (M.S.)--Brigham Young University. Dept. of Computer Science, 2005. / Includes bibliographical references (p. 113-117).
|
3 |
Adaptive modelling and planning for learning intelligent behaviourKochenderfer, Mykel J. January 2006 (has links)
An intelligent agent must be capable of using its past experience to develop an understanding of how its actions affect the world in which it is situated. Given some objective, the agent must be able to effectively use its understanding of the world to produce a plan that is robust to the uncertainty present in the world. This thesis presents a novel computational framework called the Adaptive Modelling and Planning System (AMPS) that aims to meet these requirements for intelligence. The challenge of the agent is to use its experience in the world to generate a model. In problems with large state and action spaces, the agent can generalise from limited experience by grouping together similar states and actions, effectively partitioning the state and action spaces into finite sets of regions. This process is called abstraction. Several different abstraction approaches have been proposed in the literature, but the existing algorithms have many limitations. They generally only increase resolution, require a large amount of data before changing the abstraction, do not generalise over actions, and are computationally expensive. AMPS aims to solve these problems using a new kind of approach. AMPS splits and merges existing regions in its abstraction according to a set of heuristics. The system introduces splits using a mechanism related to supervised learning and is defined in a general way, allowing AMPS to leverage a wide variety of representations. The system merges existing regions when an analysis of the current plan indicates that doing so could be useful. Because several different regions may require revision at any given time, AMPS prioritises revision to best utilise whatever computational resources are available. Changes in the abstraction lead to changes in the model, requiring changes to the plan. AMPS prioritises the planning process, and when the agent has time, it replans in high-priority regions. This thesis demonstrates the flexibility and strength of this approach in learning intelligent behaviour from limited experience.
|
4 |
User experience driven CPU frequency scaling on mobile devices : towards better energy efficiencySeeker, Volker Günter January 2017 (has links)
With the development of modern smartphones, mobile devices have become ubiquitous in our daily lives. With high processing capabilities and a vast number of applications, users now need them for both business and personal tasks. Unfortunately, battery technology did not scale with the same speed as computational power. Hence, modern smartphone batteries often last for less than a day before they need to be recharged. One of the most power hungry components is the central processing unit (CPU). Multiple techniques are applied to reduce CPU energy consumption. Among them is dynamic voltage and frequency scaling (DVFS). This technique reduces energy consumption by dynamically changing CPU supply voltage depending on the currently running workload. Reducing voltage, however, also makes it necessary to reduce the clock frequency, which can have a significant impact on task performance. Current DVFS algorithms deliver a good user experience, however, as experiments conducted later in this thesis will show, they do not deliver an optimal energy efficiency for an interactive mobile workload. This thesis presents methods and tools to determine where energy can be saved during mobile workload execution when using DVFS. Furthermore, an improved DVFS technique is developed that achieves a higher energy efficiency than the current standard. One important question when developing a DVFS technique is: How much can you slow down a task to save energy before the negative effect on performance becomes intolerable? The ultimate goal when optimising a mobile system is to provide a high quality of experience (QOE) to the end user. In that context, task slowdowns become intolerable when they have a perceptible effect on QOE. Experiments conducted in this thesis answer this question by identifying workload periods in which performance changes are directly perceptible by the end user and periods where they are imperceptible, namely interaction lags and interaction idle periods. Interaction lags are the time it takes the system to process a user interaction and display a corresponding response. Idle periods are the periods between interactions where the user perceives the system as idle and ready for the next input. By knowing where those periods are and how they are affected by frequency changes, a more energy efficient DVFS governor can be developed. This thesis begins by introducing a methodology that measures the duration of interaction lags as perceived by the user. It uses them as an indicator to benchmark the quality of experience for a workload execution. A representative benchmark workload is generated comprising 190 minutes of interactions collected from real users. In conjunction with this QOE benchmark, a DVFS Oracle study is conducted. It is able to find a frequency profile for an interactive mobile workload which has the maximum energy savings achievable without a perceptible performance impact on the user. The developed Oracle performance profile achieves a QOE which is indistinguishable from always running on the fastest frequency while needing 45% less energy. Furthermore, this Oracle is used as a baseline to evaluate how well current mobile frequency governors are performing. It shows that none of these governors perform particularly well and up to 32% energy savings are possible. Equipped with a benchmark and an optimisation baseline, a user perception aware DVFS technique is developed in the second part of this thesis. Initially, a runtime heuristic is introduced which is able to detect interaction lags as the user would perceive them. Using this heuristic, a reinforcement learning driven governor is developed which is able to learn good frequency settings for interaction lag and idle periods based on sample observations. It consumes up to 22% less energy than current standard governors on mobile devices, and maintains a low impact on QOE.
|
5 |
Fault-tolerant mapping and localization for Quadrotor UAVGilson, Maximillian Andrew January 2019 (has links)
No description available.
|
6 |
Application of machine learning to improve to performance of a pressure-controlled systemKreutmayr, Fabian, Imlauer, Markus 23 June 2020 (has links)
Due to the robustness and flexibility of hydraulic components, hydraulic control systems are used in a wide range of applications under various environmental conditions. However, the coverage of this broad field of applications often comes with a loss of performance. Especially when conditions and working points change often, hydraulic control systems cannot work at their optimum. Flexible electronic controllers in combination with techniques from the field of machine learning have the potential to overcome these issues. By applying a reinforcement learning algorithm, this paper examines whether learned controllers can compete with an expert-tuned solution. Thereby, the method is thoroughly validated by using simulations and experiments as well.
|
7 |
Tiefes Reinforcement Lernen auf Basis visueller WahrnehmungenLange, Sascha 19 May 2010 (has links)
Die vorliegende Arbeit widmet sich der Untersuchung und Weiterentwicklung selbständig lernender maschineller Lernverfahren (Reinforcement Lernen) in der Anwendung auf visuelle Wahrnehmungen. Zuletzt wurden mit der Einführung speicherbasierter Methoden in das Reinforcement Lernen große Fortschritte beim Lernen an realen Systemen erzielt, aber der Umgang mit hochkomplexen visuellen Eingabedaten, wie sie z.B. von einer digitalen Kamera aufgezeichnet werden, stellt weiterhin ein ungelöstes Problem dar. Bestehende Methoden sind auf den Umgang mit niedrigdimensionalen Zustandsbeschreibungen beschränkt, was eine Anwendung dieser Verfahren direkt auf den Strom von Bilddaten bisher ausschließt und den vorgeschalteten Einsatz klassischer Methoden des Bildverstehens zur Extraktion und geeigneten Kodierung der relevanten Informationen erfordert.
Einen Ausweg bietet der Einsatz von so genannten `tiefen Autoencodern'. Diese mehrschichtigen neuronalen Netze ermöglichen es, selbstorganisiert niedrigdimensionale Merkmalsräume zur Repräsentation hochdimensionaler Eingabedaten zu erlernen und so eine klassische, aufgabenspezifische Bildanalyse zu ersetzen. In typischen Objekterkennungsaufgaben konnten auf Basis dieser erlernten Repräsentationen bereits beeindruckende Ergebnisse erzielt werden. Im Rahmen der vorliegenden Arbeit werden nun die tiefen Autoencodernetze auf ihre grundsätzliche Tauglichkeit zum Einsatz im Reinforcement Lernen untersucht. Mit dem ``Deep Fitted Q''-Algorithmus wird ein neuer Algorithmus entwickelt, der das Training der tiefen Autoencodernetze auf effiziente Weise in den Reinforcement Lernablauf integriert und so den Umgang mit visuellen Wahrnehmungen beim Strategielernen ermöglicht. Besonderes Augenmerk wird neben der Dateneffizienz auf die Stabilität des Verfahrens gelegt. Im Anschluss an eine Diskussion der theoretischen Aspekte des Verfahrens wird eine ausführliche empirische Evaluation der erzeugten Merkmalsräume und der erlernten Strategien an simulierten und realen Systemen durchgeführt. Dabei gelingt es im Rahmen der vorliegenden Arbeit mit Hilfe der entwickelten Methoden erstmalig, Strategien zur Steuerung realer Systeme direkt auf Basis der unvorverarbeiteten Bildinformationen zu erlernen, wobei von außen nur das zu erreichende Ziel vorgegeben werden muss.
|
8 |
Automating rule creation in a Smart Home prototype with Learning Classifier SystemAnderzén, Anton, Winroth, Markus January 2018 (has links)
The name ”smart homes” gives a promise of intelligent behavior. Today automation of the home environment is a manual task, with the creation of rules controlling devices relying on the user. For smart homes this tedious manual task can be automated. The purpose of this thesis is development of a prototype that will help users in smart homes create rules. The rules should be automatically created by the use of a machine learning solution. A learning classifier system algorithm is found as a suitable machine learning solution. A learning classifier system is used to find and create rules from sensor data. In the prototype a Raspberry Pi is used to collect the data. This data is processedby the learning classifier system, generating a set of rules. These rules predict actions for controlling a smart lighting system. The rules are continuously updated with new sensory information from the environment constantly reevaluating the previous found rules. The learning classifier system prototype solves the problem of how rules can be generated automatically by the use of machine learning. / Uttrycket ”smarta hem” utlovar ett intelligent beteende. Idag är automatiseringen av hemmiljön en manuell uppgift, där användaren formulerar regler som styr systemet. I smarta hem kan denna uppgift bli automatiserad. Syftet med denna kandidatuppsats är att utveckla en prototyp som ska hjälpa användare i smarta hem att skapa regler. Reglerna ska skapas automatiskt med hjälp av en maskininlärningslösning. Ett självlärande klassificeringssystem bedöms uppfylla den kravställning som görs. Det självlärande klassificeringssystemet används för att skapa regler från sensordata. I prototypen används en Raspberry Pi för att samla in data. Insamlad data behandlas av det självlärande klassificeringssystem som genererar en uppsättning regler. Dessa regler används för att kontrollera ett smart ljussystem. Reglerna uppdateras kontinuerligt med ny sensorinformation från omgivningen och utvärderar de tidigare funna reglerna. Den självlärande klassificeringssystemprototypen löser problemet om hur regler kan skapas automatiskt med hjälp av maskininlärning.
|
Page generated in 0.1049 seconds