Return to search

Bilevel Optimization in the Deep Learning Era: Methods and Applications

Neural networks, coupled with their associated optimization algorithms, have demonstrated remarkable efficacy and versatility across an extensive array of tasks, encompassing image recognition, speech recognition, object detection, sentiment analysis, and more. The inherent strength of neural networks lies in their capability to autonomously learn intricate representations that map input data to corresponding output labels seamlessly. Nevertheless, not all tasks can be neatly encapsulated within the confines of an end-to-end learning paradigm. The complexity and diversity of real-world challenges necessitate innovative approaches that extend beyond conventional formulations. This calls for the exploration of specialized architectures and optimization strategies tailored to the unique intricacies of specific tasks, ensuring a more nuanced and effective solution to the myriad demands of diverse applications.
The bi-level optimization problem stands out as a distinctive form of optimization, characterized by the embedding or nesting of one problem within another. Its relevance persists significantly in the current era dominated by deep learning. A notable instance of its application in the realm of deep learning is observed in hyperparameter optimization. In the context of neural networks, the automatic training of weights through backpropagation represents a crucial aspect. However, certain hyperparameters, such as the learning rate (lr) and the number of layers, must be predetermined and cannot be optimized through the conventional chain rule employed in backpropagation. This underscores the importance of bi-level optimization in addressing the intricate task of fine-tuning these hyperparameters to enhance the overall performance of deep learning models.
The domain of deep learning presents a fertile ground for further exploration and discoveries in optimization. The untapped potential for refining hyperparameters and optimizing various aspects of neural network architectures highlights the ongoing opportunities for advancements and breakthroughs in this dynamic field.
Within this thesis, we delve into significant bi-level optimization challenges, applying these techniques to pertinent real-world tasks. Given that bi-level optimization entails dual layers of optimization, we explore scenarios where neural networks are present in the upper-level, the inner-level, or both. To be more specific, we systematically investigate four distinct tasks: optimizing neural networks towards optimizing neural networks, optimizing attractors towards optimizing neural networks, optimizing graph structures towards optimizing neural network performance, and optimizing architecture towards optimizing neural networks. For each of these tasks, we formulate the problems using the bi-level optimization approach mathematically, introducing more efficient optimization strategies. Furthermore, we meticulously evaluate the performance and efficiency of our proposed techniques. Importantly, our methodologies and insights transcend the realm of bi-level optimization, extending their applicability broadly to various deep learning models. The contributions made in this thesis offer valuable perspectives and tools for advancing optimization techniques in the broader landscape of deep learning. / Doctor of Philosophy / Bilevel optimization proves to be a valuable technique across various applications. Mathematically, it entails optimizing an objective at the upper level while concurrently addressing another optimization problem at the lower level. The key challenge lies in finding optimal solutions at both levels simultaneously, considering the interdependence between decisions made at each level.

The complexity of bilevel optimization escalates when integrated with deep learning. Firstly, deep learning models typically undergo iterative optimization, presenting challenges in streamlining the process within a bilevel optimization framework. Secondly, the bilevel setting introduces complexity, making it difficult to achieve end-to-end optimization for deep learning models.

This thesis delves into the bilevel optimization problem through four distinct approaches that incorporate deep learning. These approaches represent different tasks spanning various domains of machine learning, including neural architecture search, graph structure learning, implicit model, and causal inference. Notably, the proposed methods not only address specific types of bilevel optimization problems but also offer theoretical guarantees. The insights and methodologies presented in this thesis have the potential to aid individuals in solving problems involving high-order decisions.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/117311
Date05 January 2024
CreatorsZhang, Lei
ContributorsComputer Science and Applications, Lu, Chang Tien, Ramakrishnan, Narendran, Cho, Jin-Hee, Wu, Lingfei, Prakash, Bodicherla Aditya
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0024 seconds