Global ETD Search

21	Information Freshness Optimization in Real-time Network Applications Liu, Zhongdong 12 June 2024 (has links) In recent years, the remarkable development in ubiquitous communication networks and smart portable devices spawned a wide variety of real-time applications that require timely information updates (e.g., autonomous vehicular systems, industrial automation systems, and live streaming services). These real-time applications all have one thing in common: they desire their knowledge of the information source to be as fresh as possible. In order to measure the freshness of information, a new metric, called the Age-of-Information (AoI) is proposed. AoI is defined as the time elapsed since the generation time of the freshest delivered update. This metric is influenced by both the inter-arrival time and the delay of the updates. As a result of these dependencies, the AoI metric exhibits distinct characteristics compared to traditional delay and throughput metrics. In this dissertation, our goal is to optimize AoI under various real-time network applications. Firstly, we investigate a fundamental problem of how exactly various scheduling policies impact AoI performance. Though there is a large body of work studying the AoI performance under different scheduling policies, the use of the update-size information and its combinations with other information (such as arrival-time information and service preemption) to reduce AoI has still not been explored yet. Secondly, as a recently introduced measure of freshness, the relationship between AoI and other performance metrics remains largely ambiguous. We analyze the tradeoffs between AoI and additional performance metrics, including service performance and update cost, within real-world applications. This dissertation is organized into three parts. In the first part, we realize that scheduling policies leveraging the update-size information can substantially reduce the delay, one of the key components of AoI. However, it remains largely unknown how exactly scheduling policies (especially those making use of update-size information) impact the AoI performance. To this end, we conduct a systematic and comparative study to investigate the impact of scheduling policies on the AoI performance in single-server queues and provide useful guidelines for the design of AoI-efficient scheduling policies. In the second part, we analyze the tradeoffs between AoI and other performance metrics in real-world systems. Specifically, we focus on the following two important tradeoffs. (i) The tradeoff between service performance and AoI that arises in the data-driven real-time applications (e.g., Google Maps and stock trading applications). In these applications, the computing resource is often shared for processing both updates from information sources and queries from end users. Hence there is a natural tradeoff between service performance (e.g., response time to queries) and AoI (i.e., the freshness of data in response to user queries). To address this tradeoff, we begin by introducing a simple single-server two-queue model that captures the coupled scheduling between updates and queries. Subsequently, we design threshold-based scheduling policies to prioritize either updates or queries. Finally, we conduct a rigorous analysis of the performance of these threshold-based scheduling policies. (ii) The tradeoff between update cost and AoI that appear in the crowdsensing-based applications (e.g., Google Waze and GasBuddy). On the one hand, users are not satisfied if the responses to their requests are stale; on the other side, there is a cost for the applications to update their information regarding certain points of interest since they typically need to make monetary payments to incentivize users. To capture this tradeoff, we first formulate an optimization problem with the objective of minimizing the sum of the staleness cost (which is a function of the AoI) and the update cost, then we obtain a closed-form optimal threshold-based policy by reformulating the problem as a Markov decision process (MDP). In the third part, we study the minimization of data freshness and transmission costs (e.g., energy cost) under an (arbitrary) time-varying wireless channel without and with machine learning (ML) advice. We consider a discrete-time system where a resource-constrained source transmits time-sensitive data to a destination over a time-varying wireless channel. Each transmission incurs a fixed cost, while not transmitting results in a staleness cost measured by the AoI. The source needs to balance the tradeoff between these transmission and staleness costs. To tackle this challenge, we develop a robust online algorithm aimed at minimizing the sum of transmission and staleness costs, ensuring a worst-case performance guarantee. While online algorithms are robust, they tend to be overly conservative and may perform poorly on average in typical scenarios. In contrast, ML algorithms, which leverage historical data and prediction models, generally perform well on average but lack worst-case performance guarantees. To harness the advantages of both approaches, we design a learning-augmented online algorithm that achieves two key properties: (i) consistency: closely approximating the optimal offline algorithm when the ML prediction is accurate and trusted; (ii) robustness: providing a worst-case performance guarantee even when ML predictions are inaccurate. / Doctor of Philosophy / In recent years, the rapid growth of communication networks and smart devices has spurred the emergence of real-time applications like autonomous vehicles and industrial automation systems. These applications share a common need for timely information. The freshness of information can be measured using a new metric called Age-of-Information (AoI). This dissertation aims to optimize AoI across various real-time network applications, organized into three parts. In the first part, we explore how scheduling policies (particularly those considering update size) impact the AoI performance. Through a systematic and comparative study in single-server queues, we provide useful guidelines for the design of AoI-efficient scheduling policies. The second part explores the tradeoff between update cost and AoI in crowdsensing applications like Google Waze and GasBuddy, where users demand fresh responses to their requests; however, updating information incurs update costs for applications. We aim to minimize the sum of staleness cost (a function of AoI) and update cost. By reformulating the problem as a Markov decision process (MDP), we design a simple threshold-based policy and prove its optimality. In the third part, we study the minimization of data freshness and transmission costs (e.g., energy cost) under a time-varying wireless channel. We first develop a robust online algorithm that achieves a competitive ratio of 3, ensuring a worst-case performance guarantee. Furthermore, when advice is available, e.g., predictions from machine learning (ML) models, we design a learning-augmented online algorithm that exhibits two desired properties: (i) consistency: closely approximating the optimal offline algorithm when the ML prediction is accurate and trusted; (ii) robustness: guaranteeing worst-case performance even with inaccurate ML prediction. While this dissertation marks a significant advancement in AoI research, numerous open problems remain. For instance, our learning-augmented online algorithm treats ML predictions as external inputs. Exploring the co-design and training of ML and online algorithms to improve performance could yield interesting insights. Additionally, while AoI typically assesses update importance based solely on timestamps, the content of updates also holds significance. Incorporating considerations of both age and semantics of information is imperative in future research. Information freshness Age-of-Information latency transmission cost Internet of Things optimization machine learning algorithm algorithm design
22	Optimizing Initialization, Feature Selection, and Tensor Dimension Reduction in Unsupervised Learning: Methods and Applications Huyunting Huang Sr. (8039492) 17 April 2025 (has links) <p dir="ltr">Unsupervised machine learning (ML) is essential for analyzing complex data without labels. Many challenges have been identified. This dissertation addresses three key challenges: clustering initialization, unsupervised feature selection, and dimension reduction for tensors. The thesis also applies unsupervised ML to the airborne LiDAR data.</p><p dir="ltr">Chapter 2 introduces an improved initialization strategy for K-Means clustering and Gaussian Mixture Models (GMM). The proposed method improves clustering stability and accuracy.</p><p dir="ltr">Chapter 3 develops a stepwise unsupervised feature selection framework, called the Forward Partial-Variable Clustering with Full-Variable Loss (FPCFL), to improve clustering performance in high-dimensional data.</p><p dir="ltr">Chapter 4 focuses on tensor dimension reduction and feature selection in multiway data. It introduces Low-Rank Sparse Tensor Approximation (LRSTA) for efficient data compression and High-Order Orthogonal Decomposition (HOOD) for improved sparsity and interpretability, particularly in large-scale datasets like image and video analysis.</p><p dir="ltr">Chapter 5 explores unsupervised ML in airborne LiDAR data, applying clustering and dimensionality reduction to enhance ground filtering and object detection in 3D point clouds.</p><p dir="ltr">This dissertation advances unsupervised ML by improving clustering reliability, optimizing feature selection, and enhancing tensor decomposition, contributing to more effective and scalable data-driven analysis.</p> Semi- and unsupervised learning Statistical theory Optimizatiion Tensor dimension reduction Unsupervised Machine Learning Algorithm feature secelction
23	Feature selection and clustering for malicious and benign software characterization Chhabra, Dalbir Kaur R 13 August 2014 (has links) Malware or malicious code is design to gather sensitive information without knowledge or permission of the users or damage files in the computer system. As the use of computer systems and Internet is increasing, the threat of malware is also growing. Moreover, the increase in data is raising difficulties to identify if the executables are malicious or benign. Hence, we have devised a method that collects features from portable executable file format using static malware analysis technique. We have also optimized the important or useful features by either normalizing or giving weightage to the feature. Furthermore, we have compared accuracy of various unsupervised learning algorithms for clustering huge dataset of samples. So once the clusters are created we can use antivirus (AV) to identify one or two file and if they are detected by AV then all the files in cluster are malicious even if the files contain novel or unknown malware; otherwise all are benign. Static malware analysis Portable Executable unsupervised learning algorithm malicious or benign samples feature selection clustering Information Security
24	A CONTRIBUIÇÃO DA TEORIA HISTÓRICO-CULTURAL DE VYGOTSKY PARA O ENSINO E A APRENDIZAGEM DE ALGORITMO. Faria, Eliézer Marques 21 August 2013 (has links) Made available in DSpace on 2016-07-27T13:44:47Z (GMT). No. of bitstreams: 1 ELIEZER MARQUES FARIA.pdf: 1189777 bytes, checksum: 3b272bdc330a48f2bf2125ac34a9c20c (MD5) Previous issue date: 2013-08-21 / The discipline whose content is algorithms can aid in cognitive development of the student by requiring him to work some higher mental functions such as logical reasoning, abstraction, voluntary attention, among others. To do so requires that he develop reading and text comprehension, memory, the relationship of prior knowledge to solve problems, among other mental abilities. Therefore, their use has been recommended as early as high school. Both the algorithm and the algorithmic thinking are being applied in many different areas of knowledge (Psychology, Medicine, Portuguese, Computer, etc.). Nevertheless, the low level of learning algorithms is a problem that occurs worldwide, including in the upper reaches of Technology in GIS, offered at the Federal Institute of Goiás, Goiânia campus. In general, the work and research carried out on the learning algorithm presented solutions through design/construction of computerized tools or by changing the methodology used in the classroom. We believe it is necessary to understand and analyze this problem from a perspective grounded in a theory of learning, in order to realize the whole process, not limited to the empirical aspects arising from poor training of teachers and students. Therefore, this research was based on Cultural-Historical Theory of Lev S. Vygotsky, considering aspects such as the formation of concepts, the Zone of Proximal Development and Learning. The central question of this research is: the point of view of THC, what is the role of teaching and learning algorithms? This question becomes general objective: To analyze the role of teaching and learning algorithms from the perspective of THC. It was found that the low level of learning algorithm is related to the role assigned to the subject by the subjects involved in this process, which are: the teacher and the students. Therefore, the results measured in this study reinforce the need for change in the paradigm modeled on an instrumental education. / A disciplina cujo conteúdo é Algoritmos pode auxiliar no desenvolvimento cognitivo do aluno ao exigir que ele trabalhe algumas funções mentais superiores tais como: o raciocínio lógico, a abstração, a atenção voluntária, dentre outras. Para tanto, requer que ele desenvolva a leitura e compreensão de texto, a memória, a relação de conhecimentos anteriores para a resolução de problemas, dentre outras habilidades mentais. Por isso, a sua utilização vem sendo recomendada já a partir do ensino médio. Tanto o Algoritmo quanto o pensamento algorítmico vêm sendo aplicados nas mais diversas áreas de conhecimento (Psicologia, Medicina, Português, Computação, etc.). Apesar disso, o baixo nível de aprendizagem em Algoritmos é um problema que ocorre em nível mundial, inclusive no curso superior de Tecnologia em Geoprocessamento, ofertado no Instituo Federal de Goiás, campus Goiânia. De uma maneira geral, os trabalhos e pesquisas realizados sobre a aprendizagem de Algoritmo apresentam soluções que passam pela concepção/construção de ferramentas informatizadas ou pela mudança da metodologia utilizada nas aulas. Acreditamos que se faz necessária a compreensão e a análise deste problema sob uma perspectiva embasada em uma teoria da aprendizagem, de forma a dar conta do processo como um todo, não se limitando aos aspectos empíricos advindos da formação precária dos professores e dos alunos. Para tanto, essa pesquisa foi fundamentada na Teoria Histórico-Cultural de Lev S. Vygotsky, considerando pontos como: a formação de conceitos, a Zona de Desenvolvimento Proximal e a Aprendizagem. A questão central desta pesquisa é: do ponto de vista da THC, qual é o papel do ensino e da aprendizagem de Algoritmos? Desta questão, toma-se por objetivo geral: Analisar o papel do ensino e da aprendizagem de Algoritmos sob a perspectiva da THC. Verificou-se que o baixo nível de aprendizagem de Algoritmo está relacionado com o papel atribuído à disciplina pelos sujeitos envolvidos nesse processo, quais sejam: o professor e os alunos. Logo, os resultados aferidos nesta pesquisa reafirmam a necessidade da mudança no paradigma construído nos moldes de uma educação instrumental. Algoritmo Aprendizagem de Algoritmo Teoria Histórico-cultural Formação de Conceitos Algorithm Learning Algorithm Historic-Cultural Theory Concept Formation CNPQ::CIENCIAS HUMANAS::EDUCACAO
25	Playing is believing: the role of beliefs in multi-agent learning Chang, Yu-Han, Kaelbling, Leslie P. 01 1900 (has links) We propose a new classification for multi-agent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms and discuss some insights that can be gained. We propose an incremental improvement to the existing algorithms that seems to achieve average payoffs that are at least the Nash equilibrium payoffs in the long-run against fair opponents. / Singapore-MIT Alliance (SMA) multi-agent learning algorithm repeated games belief game theory Matrix games Nash equilibrium Stochastic games Reinforcement learning PHC-Exploiter
26	SPARSE DEEP LEARNING FOR TIME SERIES DATA AND MAGNITUDE PRUNING OF LARGE PRETRAINED TRANSFORMER MODELS AND TEMPERING LEARNING Mingxuan Zhang (21215987) 02 May 2025 (has links) <p dir="ltr">Sparse deep learning has proven to be an effective technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale model compression. While most existing research has focused on settings with independent and identically distributed (i.i.d.) observations, there has been limited exploration of scenarios involving dependent data, such as time series and sequential data in natural language processing (NLP). This work addresses this gap by establishing a theoretical foundation for sparse deep learning with dependent data. It demonstrates that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under suitable conditions, enabling accurate prediction uncertainty quantification. Experimental results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in quantifying uncertainty for time series data. Additionally, the method consistently identifies autoregressive orders in time series and surpasses existing approaches in large-scale model compression, with practical applications in fields like finance, healthcare, and energy.</p><p dir="ltr">The success of pruning techniques in RNN-based language models has inspired further exploration of their applicability to modern large language models. Pretrained transformer models have revolutionized NLP with their state-of-the-art performance but face challenges in real-world deployment due to their massive parameter counts. To tackle this issue, parameter pruning strategies have been explored, including magnitude and sensitivity based approaches. However, traditional magnitude pruning has shown limitations, particularly in transfer learning scenarios for modern NLP tasks. A novel pruning algorithm, Mixture Gaussian Prior Pruning (MGPP), is introduced to address these challenges. By employing a mixture Gaussian prior for regularization, MGPP prunes non-expressive weights while retaining the models expressive capabilities. Extensive evaluations on a variety of NLP tasks, including natural language understanding, question answering, and natural language generation, demonstrate the effectiveness of MGPP, particularly in high-sparsity settings. Theoretical analysis further supports the consistency of sparse transformers, providing insights into the success of this approach. These advancements contribute to optimizing large-scale language models for real-world applications, improving efficiency while maintaining performance.</p><p dir="ltr">State-space modeling has recently emerged as a powerful technique across various fields, including biology, finance, and engineering. However, its potential for training deep neural networks (DNNs) and its applicability to generative modeling remain underexplored. In this part of the dissertation, we introduce tempering learning, a novel algorithm that leverages state-space modeling to train deep neural networks. By manually constructing a tempering ladder, we transform the original learning problem to a data assimilation problem. In addition to its optimization advantages, tempering learning can be extended to one-step image generation through a diffusion-like process. Extensive experiments demonstrate the effectiveness of our approach across classical machine learning tasks, while also showcasing its promise for one-step unconditional image generation on CIFAR-10 and ImageNet datasets.</p> Deep learning Statistical data science Statistical theory Bayesian Deep Learning deep learning algorithm; Sparse Deep Learning State Space Modeling
27	Online Learning and Simulation Based Algorithms for Stochastic Optimization Lakshmanan, K January 2012 (has links) (PDF) In many optimization problems, the relationship between the objective and parameters is not known. The objective function itself may be stochastic such as a long-run average over some random cost samples. In such cases finding the gradient of the objective is not possible. It is in this setting that stochastic approximation algorithms are used. These algorithms use some estimates of the gradient and are stochastic in nature. Amongst gradient estimation techniques, Simultaneous Perturbation Stochastic Approximation (SPSA) and Smoothed Functional(SF) scheme are widely used. In this thesis we have proposed a novel multi-time scale quasi-Newton based smoothed functional (QN-SF) algorithm for unconstrained as well as constrained optimization. The algorithm uses the smoothed functional scheme for estimating the gradient and the quasi-Newton method to solve the optimization problem. The algorithm is shown to converge with probability one. We have also provided here experimental results on the problem of optimal routing in a multi-stage network of queues. Policies like Join the Shortest Queue or Least Work Left assume knowledge of the queue length values that can change rapidly or hard to estimate. If the only information available is the expected end-to-end delay as with our case, such policies cannot be used. The QN-SF based probabilistic routing algorithm uses only the total end-to-end delay for tuning the probabilities. We observe from the experiments that the QN-SF algorithm has better performance than the gradient and Jacobi versions of Newton based smoothed functional algorithms. Next we consider constrained routing in a similar queueing network. We extend the QN-SF algorithm to this case. We study the convergence behavior of the algorithm and observe that the constraints are satisfied at the point of convergence. We provide experimental results for the constrained routing setup as well. Next we study reinforcement learning algorithms which are useful for solving Markov Decision Process(MDP) when the precise information on transition probabilities is not known. When the state, and action sets are very large, it is not possible to store all the state-action tuples. In such cases, function approximators like neural networks have been used. The popular Q-learning algorithm is known to diverge when used with linear function approximation due to the ’off-policy’ problem. Hence developing stable learning algorithms when used with function approximation is an important problem. We present in this thesis a variant of Q-learning with linear function approximation that is based on two-timescale stochastic approximation. The Q-value parameters for a given policy in our algorithm are updated on the slower timescale while the policy parameters themselves are updated on the faster scale. We perform a gradient search in the space of policy parameters. Since the objective function and hence the gradient are not analytically known, we employ the efficient one-simulation simultaneous perturbation stochastic approximation(SPSA) gradient estimates that employ Hadamard matrix based deterministic perturbations. Our algorithm has the advantage that, unlike Q-learning, it does not suffer from high oscillations due to the off-policy problem when using function approximators. Whereas it is difficult to prove convergence of regular Q-learning with linear function approximation because of the off-policy problem, we prove that our algorithm which is on-policy is convergent. Numerical results on a multi-stage stochastic shortest path problem show that our algorithm exhibits significantly better performance and is more robust as compared to Q-learning. Future work would be to compare it with other policy-based reinforcement learning algorithms. Finally, we develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process(MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multistage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point. Stochastic Approximation Algorithms Stochastic Optimization Markov Decision Process Reinforcement Learning Algorithm Queueing Networks Queuing Theory Online Q-Learning Algorithm Online Actor-Critic Algorithm Markov Decision Processes Q-learning Algorithm Linear Function Approximation Computer Science
28	Neuro-Fuzzy System Modeling with Self-Constructed Rules and Hybrid Learning Ouyang, Chen-Sen 09 November 2004 (has links) Neuro-fuzzy modeling is an efficient computing paradigm for system modeling problems. It mainly integrates two well-known approaches, neural networks and fuzzy systems, and therefore possesses advantages of them, i.e., learning capability, robustness, human-like reasoning, and high understandability. Up to now, many approaches have been proposed for neuro-fuzzy modeling. However, it still exists many problems need to be solved. We propose in this thesis two self-constructing rule generation methods, i.e., similarity-based rule generation (SRG) and similarity-and-merge-based rule generation (SMRG), and one hybrid learning algorithm (HLA) for structure identification and parameter identification, respectively, of neuro-fuzzy modeling. SRG and SMRG group the input-output training data into a set of fuzzy clusters incrementally based on similarity tests on the input and output spaces. Membership functions associated with each cluster are defined according to statistical means and deviations of the data points included in the cluster. Additionally, SMRG employs a merging mechanism to merge similar clusters dynamically. Then a zero-order or first-order TSK-type fuzzy IF-THEN rule is extracted from each cluster to form an initial fuzzy rule-base which can be directly employed for fuzzy reasoning or be further refined in the next phase of parameter identification. Compared with other methods, both our SRG and SMRG have advantages of generating fuzzy rules quickly, matching membership functions closely with the real distribution of the training data points, and avoiding the generation of the whole set of clusters from the scratch when new training data are considered. Besides, SMRG supports a more reasonable and quick mechanism for cluster merging to alleviate the problems of data-input-order bias and redundant clusters, which are encountered in SRG and other incremental clustering approaches. To refine the fuzzy rules obtained in the structure identification phase, a zero-order or first-order TSK-type fuzzy neural network is constructed accordingly in the parameter identification phase. Then, we develop a HLA composed by a recursive SVD-based least squares estimator and the gradient descent method to train the network. Our HLA has the advantage of alleviating the local minimal problem. Besides, it learns faster, consumes less memory, and produces lower approximation errors than other methods. To verify the practicability of our approaches, we apply them to the applications of function approximation and classification. For function approximation, we apply our approaches to model several nonlinear functions and real cases from measured input-output datasets. For classification, our approaches are applied to a problem of human object segmentation. A fuzzy self-clustering algorithm is used to divide the base frame of a video stream into a set of segments which are then categorized as foreground or background based on a combination of multiple criteria. Then, human objects in the base frame and the remaining frames of the video stream are precisely located by a fuzzy neural network which is constructed with the fuzzy rules previously obtained and is trained by our proposed HLA. Experimental results show that our approaches can improve the accuracy of human object identification in video streams and work well even when the human object presents no significant motion in an image sequence. neural network fuzzy rule fuzzy system fuzzy clustering function approximation human object segmentation hybrid learning algorithm least squares estimator neuro-fuzzy system modeling
29	Learning bisimulation Shenkenfelder, Warren 19 November 2008 (has links) Computational learning theory is a branch of theoretical computer science that re-imagines the role of an algorithm from an agent of computation to an agent of learning. The operations of computers become those of the human mind; an important step towards illuminating the limitations of artificial intelligence. The central difference between a learning algorithm and a traditional algorithm is that the learner has access to an oracle who, in constant time, can answer queries about that to be learned. Normally an algorithm would have to discover such information on its own accord. This subtle change in how we model problem solving results in changes in the computational complexity of some classic problems; allowing us to re-examine them in a new light. Specifically two known result are examined: one positive, one negative. It is know that one can efficiently learn Deterministic Finite Automatons with queries, not so of Non-Deterministic Finite Automatons. We generalize these Automatons into Labeled Transition Systems and attempt to learn them using a stronger query. Learning Theory Angluin's Algorithm Labelled Transition Systems hennessy milner logic Reconstructing graphs learning algorithm
30	Algoritmo Q-learning como estrat?gia de explora??o e/ou explota??o para metaheur?sticas GRASP e algoritmo gen?tico Lima J?nior, Francisco Chagas de 20 March 2009 (has links) Made available in DSpace on 2014-12-17T14:54:52Z (GMT). No. of bitstreams: 1 FranciscoCLJ.pdf: 1181019 bytes, checksum: b3894e0c93f85d3cf920c7015daef964 (MD5) Previous issue date: 2009-03-20 / Techniques of optimization known as metaheuristics have achieved success in the resolution of many problems classified as NP-Hard. These methods use non deterministic approaches that reach very good solutions which, however, don t guarantee the determination of the global optimum. Beyond the inherent difficulties related to the complexity that characterizes the optimization problems, the metaheuristics still face the dilemma of xploration/exploitation, which consists of choosing between a greedy search and a wider exploration of the solution space. A way to guide such algorithms during the searching of better solutions is supplying them with more knowledge of the problem through the use of a intelligent agent, able to recognize promising regions and also identify when they should diversify the direction of the search. This way, this work proposes the use of Reinforcement Learning technique - Q-learning Algorithm - as exploration/exploitation strategy for the metaheuristics GRASP (Greedy Randomized Adaptive Search Procedure) and Genetic Algorithm. The GRASP metaheuristic uses Q-learning instead of the traditional greedy-random algorithm in the construction phase. This replacement has the purpose of improving the quality of the initial solutions that are used in the local search phase of the GRASP, and also provides for the metaheuristic an adaptive memory mechanism that allows the reuse of good previous decisions and also avoids the repetition of bad decisions. In the Genetic Algorithm, the Q-learning algorithm was used to generate an initial population of high fitness, and after a determined number of generations, where the rate of diversity of the population is less than a certain limit L, it also was applied to supply one of the parents to be used in the genetic crossover operator. Another significant change in the hybrid genetic algorithm is the proposal of a mutually interactive cooperation process between the genetic operators and the Q-learning algorithm. In this interactive/cooperative process, the Q-learning algorithm receives an additional update in the matrix of Q-values based on the current best solution of the Genetic Algorithm. The computational experiments presented in this thesis compares the results obtained with the implementation of traditional versions of GRASP metaheuristic and Genetic Algorithm, with those obtained using the proposed hybrid methods. Both algorithms had been applied successfully to the symmetrical Traveling Salesman Problem, which was modeled as a Markov decision process / T?cnicas de otimiza??o conhecidas como metaheur?sticas t?m obtido sucesso na resolu??o de problemas classificados como NP - ?rduos. Estes m?todos utilizam abordagens n?o determin?sticas que geram solu??es pr?ximas do ?timo sem, no entanto, garantir a determina??o do ?timo global. Al?m das dificuldades inerentes ? complexidade que caracteriza os problemas NP-?rduos, as metaheur?sticas enfrentam ainda o dilema de explora??o/explota??o, que consiste em escolher entre intensifica??o da busca em uma regi?o espec?fica e a explora??o mais ampla do espa?o de solu??es. Uma forma de orientar tais algoritmos em busca de melhores solu??es ? supri-los de maior conhecimento do problema atrav?s da utiliza??o de um agente inteligente, capaz de reconhecer regi?es promissoras e/ou identificar em que momento dever? diversificar a dire??o de busca, isto pode ser feito atrav?s da aplica??o de Aprendizagem por Refor?o. Neste contexto, este trabalho prop?e o uso de uma t?cnica de Aprendizagem por Refor?o - especificamente o Algoritmo Q-learning - como uma estrat?gia de explora??o/explota??o para as metaheur?sticas GRASP (Greedy Randomized Adaptive Search Procedure) e Algoritmo Gen?tico. Na implementa??o da metaheur?stica GRASP proposta, utilizou-se o Q-learning em substitui??o ao algoritmo guloso-aleat?rio tradicionalmente usado na fase de constru??o. Tal substitui??o teve como objetivo melhorar a qualidade das solu??es iniciais que ser?o utilizadas na fase de busca local do GRASP, e, ao mesmo tempo, suprir esta metaheur?sticas de um mecanismo de mem?ria adaptativa que permita a reutiliza??o de boas decis?es tomadas em itera??es passadas e que evite a repeti??o de decis?es n?o promissoras. No Algoritmo Gen?tico, o algoritmo Q-learning foi utilizado para gerar uma popula??o inicial de alta aptid?o, e ap?s um determinado n?mero de gera??es, caso a taxa de diversidade da popula??o seja menor do que um determinado limite L, ele ? tamb?m utilizado em uma forma alternativa de operador de cruzamento. Outra modifica??o importante no algoritmo gen?tico h?brido ? a proposta de um processo de intera??o mutuamente cooperativa entre o os operadores gen?ticos e o Algoritmo Q-learning. Neste processo interativo/cooperativo o algoritmo Q-learning recebe uma atualiza??o adicional na matriz dos Q-valores com base na solu??o elite da popula??o corrente. Os experimentos computacionais apresentados neste trabalho consistem em comparar os resultados obtidos com a implementa??o de vers?es tradicionais das metaheur?sticas citadas, com aqueles obtidos utilizando os m?todos h?bridos propostos. Ambos os algoritmos foram aplicados com sucesso ao problema do caixeiro viajante sim?trico, que por sua vez, foi modelado como um processo de decis?o de Markov Metaheur?sticaGRASP Algoritmos gen?ticos AlgoritmoQ-learning Problema do caixeiro viajante GRASP metaheuristic Genetic algorithm Q-learning algorithm Travelling salesman problem CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA

Search results