Spelling suggestions: "subject:"hyperparameter aptimization"" "subject:"hyperparameter anoptimization""
21 |
[pt] CONJUNTOS ONLINE PARA APRENDIZADO POR REFORÇO PROFUNDO EM ESPAÇOS DE AÇÃO CONTÍNUA / [en] ONLINE ENSEMBLES FOR DEEP REINFORCEMENT LEARNING IN CONTINUOUS ACTION SPACESRENATA GARCIA OLIVEIRA 01 February 2022 (has links)
[pt] Este trabalho busca usar o comitê de algoritmos de aprendizado por
reforço profundo (deep reinforcement learning) sob uma nova perspectiva.
Na literatura, a técnica de comitê é utilizada para melhorar o desempenho,
mas, pela primeira vez, esta pesquisa visa utilizar comitê para minimizar a
dependência do desempenho de aprendizagem por reforço profundo no ajuste
fino de hiperparâmetros, além de tornar o aprendizado mais preciso e robusto.
Duas abordagens são pesquisadas; uma considera puramente a agregação de
ação, enquanto que a outra também leva em consideração as funções de valor.
Na primeira abordagem, é criada uma estrutura de aprendizado online com
base no histórico de escolha de ação contínua do comitê com o objetivo de
integrar de forma flexível diferentes métodos de ponderação e agregação para
as ações dos agentes. Em essência, a estrutura usa o desempenho passado para
combinar apenas as ações das melhores políticas. Na segunda abordagem, as
políticas são avaliadas usando seu desempenho esperado conforme estimado
por suas funções de valor. Especificamente, ponderamos as funções de valor do
comitê por sua acurácia esperada, calculada pelo erro da diferença temporal.
As funções de valor com menor erro têm maior peso. Para medir a influência do
esforço de ajuste do hiperparâmetro, grupos que consistem em uma mistura de
diferentes quantidades de algoritmos bem e mal parametrizados foram criados.
Para avaliar os métodos, ambientes clássicos como o pêndulo invertido, cart
pole e cart pole duplo são usados como benchmarks. Na validação, os ambientes
de simulação Half Cheetah v2, um robô bípede, e o Swimmer v2 apresentaram
resultados superiores e consistentes demonstrando a capacidade da técnica de
comitê em minimizar o esforço necessário para ajustar os hiperparâmetros dos
algoritmos. / [en] This work seeks to use ensembles of deep reinforcement learning algorithms from a new perspective. In the literature, the ensemble technique is
used to improve performance, but, for the first time, this research aims to use
ensembles to minimize the dependence of deep reinforcement learning performance on hyperparameter fine-tuning, in addition to making it more precise
and robust. Two approaches are researched; one considers pure action aggregation, while the other also takes the value functions into account. In the first
approach, an online learning framework based on the ensemble s continuous
action choice history is created, aiming to flexibly integrate different scoring
and aggregation methods for the agents actions. In essence, the framework
uses past performance to only combine the best policies actions. In the second approach, the policies are evaluated using their expected performance as
estimated by their value functions. Specifically, we weigh the ensemble s value
functions by their expected accuracy as calculated by the temporal difference
error. Value functions with lower error have higher weight. To measure the
influence on the hyperparameter tuning effort, groups consisting of a mix of
different amounts of well and poorly parameterized algorithms were created.
To evaluate the methods, classic environments such as the inverted pendulum,
cart pole and double cart pole are used as benchmarks. In validation, the Half
Cheetah v2, a biped robot, and Swimmer v2 simulation environments showed
superior and consistent results demonstrating the ability of the ensemble technique to minimize the effort needed to tune the the algorithms.
|
22 |
Wildfire Spread Prediction Using Attention Mechanisms In U-NetShah, Kamen Haresh, Shah, Kamen Haresh 01 December 2022 (has links) (PDF)
An investigation into using attention mechanisms for better feature extraction in wildfire spread prediction models. This research examines the U-net architecture to achieve image segmentation, a process that partitions images by classifying pixels into one of two classes. The deep learning models explored in this research integrate modern deep learning architectures, and techniques used to optimize them. The models are trained on 12 distinct observational variables derived from the Google Earth Engine catalog. Evaluation is conducted with accuracy, Dice coefficient score, ROC-AUC, and F1-score. This research concludes that when augmenting U-net with attention mechanisms, the attention component improves feature suppression and recognition, improving overall performance. Furthermore, employing ensemble modeling reduces bias and variation, leading to more consistent and accurate predictions. When inferencing on wildfire propagation at 30-minute intervals, the architecture presented in this research achieved a ROC-AUC score of 86.2% and an accuracy of 82.1%.
|
23 |
Strojové učení ve strategických hrách / Machine Learning in Strategic GamesVlček, Michael January 2018 (has links)
Machine learning is spearheading progress for the field of artificial intelligence in terms of providing competition in strategy games to a human opponent, be it in a game of chess, Go or poker. A field of machine learning, which shows the most promising results in playing strategy games, is reinforcement learning. The next milestone for the current research lies in a computer game Starcraft II, which outgrows the previous ones in terms of complexity, and represents a potential new breakthrough in this field. The paper focuses on analysis of the problem, and suggests a solution incorporating a reinforcement learning algorithm A2C and hyperparameter optimization implementation PBT, which could mean a step forward for the current progress.
|
24 |
Model-based hyperparameter optimizationCrouther, Paul 04 1900 (has links)
The primary goal of this work is to propose a methodology for discovering hyperparameters.
Hyperparameters aid systems in convergence when well-tuned and handcrafted. However,
to this end, poorly chosen hyperparameters leave practitioners in limbo, between concerns
with implementation or improper choice in hyperparameter and system configuration. We
specifically analyze the choice of learning rate in stochastic gradient descent (SGD), a popular
algorithm. As a secondary goal, we attempt the discovery of fixed points using smoothing of
the loss landscape by exploiting assumptions about its distribution to improve the update
rule in SGD. Smoothing of the loss landscape has been shown to make convergence possible in
large-scale systems and difficult black-box optimization problems. However, we use stochastic
value gradients (SVG) to smooth the loss landscape by learning a surrogate model and then
backpropagate through this model to discover fixed points on the real task SGD is trying to
solve. Additionally, we construct a gym environment for testing model-free algorithms, such
as Proximal Policy Optimization (PPO) as a hyperparameter optimizer for SGD. For tasks,
we focus on a toy problem and analyze the convergence of SGD on MNIST using model-free
and model-based reinforcement learning methods for control. The model is learned from
the parameters of the true optimizer and used specifically for learning rates rather than for
prediction. In experiments, we perform in an online and offline setting. In the online setting,
we learn a surrogate model alongside the true optimizer, where hyperparameters are tuned
in real-time for the true optimizer. In the offline setting, we show that there is more potential
in the model-based learning methodology than in the model-free configuration due to this
surrogate model that smooths out the loss landscape and makes for more helpful gradients
during backpropagation. / L’objectif principal de ce travail est de proposer une méthodologie de découverte des hyperparamètres.
Les hyperparamètres aident les systèmes à converger lorsqu’ils sont bien réglés et
fabriqués à la main. Cependant, à cette fin, des hyperparamètres mal choisis laissent les praticiens
dans l’incertitude, entre soucis de mise en oeuvre ou mauvais choix d’hyperparamètre et
de configuration du système. Nous analysons spécifiquement le choix du taux d’apprentissage
dans la descente de gradient stochastique (SGD), un algorithme populaire. Comme objectif
secondaire, nous tentons de découvrir des points fixes en utilisant le lissage du paysage des
pertes en exploitant des hypothèses sur sa distribution pour améliorer la règle de mise à jour
dans SGD. Il a été démontré que le lissage du paysage des pertes rend la convergence possible
dans les systèmes à grande échelle et les problèmes difficiles d’optimisation de la boîte noire.
Cependant, nous utilisons des gradients de valeur stochastiques (SVG) pour lisser le paysage
des pertes en apprenant un modèle de substitution, puis rétropropager à travers ce modèle
pour découvrir des points fixes sur la tâche réelle que SGD essaie de résoudre. De plus, nous
construisons un environnement de gym pour tester des algorithmes sans modèle, tels que
Proximal Policy Optimization (PPO) en tant qu’optimiseur d’hyperparamètres pour SGD.
Pour les tâches, nous nous concentrons sur un problème de jouet et analysons la convergence
de SGD sur MNIST en utilisant des méthodes d’apprentissage par renforcement sans modèle
et basées sur un modèle pour le contrôle. Le modèle est appris à partir des paramètres du
véritable optimiseur et utilisé spécifiquement pour les taux d’apprentissage plutôt que pour
la prédiction. Dans les expériences, nous effectuons dans un cadre en ligne et hors ligne.
Dans le cadre en ligne, nous apprenons un modèle de substitution aux côtés du véritable
optimiseur, où les hyperparamètres sont réglés en temps réel pour le véritable optimiseur.
Dans le cadre hors ligne, nous montrons qu’il y a plus de potentiel dans la méthodologie
d’apprentissage basée sur un modèle que dans la configuration sans modèle en raison de ce
modèle de substitution qui lisse le paysage des pertes et crée des gradients plus utiles lors de
la rétropropagation.
|
Page generated in 0.1023 seconds