• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • 1
  • Tagged with
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Machine Learning and Quantum Computing for Optimization Problems in Power Systems

Gupta, Sarthak 26 January 2023 (has links)
While optimization problems are ubiquitous in all domains of engineering, they are of critical importance to power systems engineers. A safe and economical operation of the power systems entails solving many optimization problems such as security-constrained unit commitment, economic dispatch, optimal power flow, optimal planning, etc. Although traditional optimization solvers and software have been successful so far in solving these problems, there is a growing need to accelerate the solution process. This need arises on account of several aspects of grid modernization, such as distributed energy resources, renewable energy, smart inverters, batteries, etc, that increase the number of decision variables involved. Moreover, the technologies entail faster dynamics and unpredictability, further demanding a solution speedup. Yet another concern is the growing communication overhead that accompanies this large-scale, high-speed, decision-making process. This thesis explores three different directions to address such concerns. The first part of the thesis explores the learning-to-optimize paradigm whereby instead of solving the optimization problems, machine learning (ML) models such as deep neural networks (DNNs) are trained to predict the solution of the optimization problems. The second part of the thesis also employs deep learning, but in a different manner. DNNs are utilized to model the dynamics of IEEE 1547.8 standard-based local Volt/VAR control rules, and then leverage efficient deep learning libraries to solve the resulting optimization problem. The last part of the thesis dives into the evolving field of quantum computing and develops a general strategy for solving stochastic binary optimization problems using variational quantum eigensolvers (VQE). / Doctor of Philosophy / A reliable and economical operation of power systems entails solving large-scale decision-making mathematical problems, termed as optimization problems. Modern additions to power systems demand an acceleration of this decision-making process while managing the accompanying communication overheads efficiently. This thesis explores the application of two recent advancements in computer science -- machine learning (ML) and quantum computing (QC), to address the above needs. The research presented in this thesis can be divided into three parts. The first part proposes replacing conventional mathematical solvers for optimization problems, with ML models that can predict the solutions to these solvers. Colloquially referred to as learning-to-optimize, this paradigm learns from a historical dataset of good solutions and extrapolates them to make new decisions in a fast manner, while requiring potentially limited data. The second part of the thesis also uses ML models, but differently. ML models are used to represent the underlying physical dynamics, and convert an originally challenging optimization problem into a simpler one. The new problem can be solved efficiently using popular ML toolkits. The third and final part of the thesis aims at accelerating the process of finding optimal binary decisions under constraints, using QC.
2

[pt] APRENDIZADO COM RESTRIÇÃO DE TEMPO: PROBLEMAS DE CLASSIFICAÇÃO / [en] TIME CONSTRAINED LEARNING: CLASSIFICATION PROBLEMS

FRANCISCO SERGIO DE FREITAS FILHO 04 September 2023 (has links)
[pt] Com a crescente quantidade de dados sendo gerados e coletados, torna-se mais comum cenários em que se dispõe de dados rotulados em larga escala, mas com recursos computacionais limitados, de modo que não seja possível treinar modelos preditivos utilizando todas as amostras disponíveis. Diante dessa realidade, adotamos o paradigma de Machine Teaching como uma alternativa para obter modelos eficazes utilizando um subconjunto representativo dos dados disponíveis. Inicialmente, consideramos um problema central da área de Machine Teaching que consiste em encontrar o menor conjunto de amostras necessário para obter uma dada hipótese alvo h(asterisco). Adotamos o modelo de ensino black-box learner introduzido em (DASGUPTA et al., 2019), em que o ensino é feito interativamente sem qualquer conhecimento sobre o algoritmo do learner e sua classe de hipóteses, exceto que ela contém a hipótese alvo h(asterisco). Refinamos alguns resultados existentes para esse modelo e estudamos variantes dele. Em particular, estendemos um resultado de (DASGUPTA et al., 2019) para o cenário mais realista em que h(asterisco) pode não estar contido na classe de hipóteses do learner e, portanto, o objetivo do teacher é fazer o learner convergir para a melhor aproximação disponível de h(asterisco). Também consideramos o cenário com black-box learners não adversários e mostramos que podemos obter melhores resultados para o tipo de learner que se move para a próxima hipótese de maneira suave, preferindo hipóteses que são mais próximas da hipótese atual. Em seguida, definimos e abordamos o problema de Aprendizado com Restrição de Tempo considerando um cenário em que temos um enorme conjunto de dados e um limite de tempo para treinar um dado learner usando esse conjunto. Propomos o método TCT, um algoritmo para essa tarefa, desenvolvido com base nos princípios de Machine Teaching. Apresentamos um estudo experimental envolvendo 5 diferentes learners e 20 datasets no qual mostramos que TCT supera métodos alternativos considerados. Finalmente, provamos garantias de aproximação para uma versão simplificada do TCT. / [en] With the growing amount of data being generated and collected, it becomes increasingly common to have scenarios where there are large-scale labeled data but limited computational resources, making it impossible to train predictive models using all available samples. Faced with this reality, we adopt the Machine Teaching paradigm as an alternative to obtain effective models using a representative subset of available data. Initially, we consider a central problem of the Machine Teaching area which consists of finding the smallest set of samples necessary to obtain a given target hypothesis h(asterisk). We adopt the black-box learner teaching model introduced in (DASGUPTA et al., 2019), where teaching is done interactively without any knowledge about the learner s algorithm and its hypothesis class, except that it contains the target hypothesis h(asterisk). We refine some existing results for this model and study its variants. In particular, we extend a result from (DASGUPTA et al., 2019) to the more realistic scenario where h(asterisk) may not be contained in the learner s hypothesis class, and therefore, the teacher s objective is to make the learner converge to the best available approximation of h(asterisk). We also consider the scenario with non-adversarial black-box learners and show that we can obtain better results for the type of learner that moves to the next hypothesis smoothly, preferring hypotheses that are closer to the current hypothesis. Next, we address the Time-Constrained Learning problem, considering a scenario where we have a huge dataset and a time limit to train a given learner using this dataset. We propose the TCT method, an algorithm for this task, developed based on Machine Teaching principles. We present an experimental study involving 5 different learners and 20 datasets in which we show that TCT outperforms alternative methods considered. Finally, we prove approximation guarantees for a simplified version of TCT.
3

Novel neural architectures & algorithms for efficient inference

Kag, Anil 30 August 2023 (has links)
In the last decade, the machine learning universe embraced deep neural networks (DNNs) wholeheartedly with the advent of neural architectures such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), transformers, etc. These models have empowered many applications, such as ChatGPT, Imagen, etc., and have achieved state-of-the-art (SOTA) performance on many vision, speech, and language modeling tasks. However, SOTA performance comes with various issues, such as large model size, compute-intensive training, increased inference latency, higher working memory, etc. This thesis aims at improving the resource efficiency of neural architectures, i.e., significantly reducing the computational, storage, and energy consumption of a DNN without any significant loss in performance. Towards this goal, we explore novel neural architectures as well as training algorithms that allow low-capacity models to achieve near SOTA performance. We divide this thesis into two dimensions: \textit{Efficient Low Complexity Models}, and \textit{Input Hardness Adaptive Models}. Along the first dimension, i.e., \textit{Efficient Low Complexity Models}, we improve DNN performance by addressing instabilities in the existing architectures and training methods. We propose novel neural architectures inspired by ordinary differential equations (ODEs) to reinforce input signals and attend to salient feature regions. In addition, we show that carefully designed training schemes improve the performance of existing neural networks. We divide this exploration into two parts: \textsc{(a) Efficient Low Complexity RNNs.} We improve RNN resource efficiency by addressing poor gradients, noise amplifications, and BPTT training issues. First, we improve RNNs by solving ODEs that eliminate vanishing and exploding gradients during the training. To do so, we present Incremental Recurrent Neural Networks (iRNNs) that keep track of increments in the equilibrium surface. Next, we propose Time Adaptive RNNs that mitigate the noise propagation issue in RNNs by modulating the time constants in the ODE-based transition function. We empirically demonstrate the superiority of ODE-based neural architectures over existing RNNs. Finally, we propose Forward Propagation Through Time (FPTT) algorithm for training RNNs. We show that FPTT yields significant gains compared to the more conventional Backward Propagation Through Time (BPTT) scheme. \textsc{(b) Efficient Low Complexity CNNs.} Next, we improve CNN architectures by reducing their resource usage. They require greater depth to generate high-level features, resulting in computationally expensive models. We design a novel residual block, the Global layer, that constrains the input and output features by approximately solving partial differential equations (PDEs). It yields better receptive fields than traditional convolutional blocks and thus results in shallower networks. Further, we reduce the model footprint by enforcing a novel inductive bias that formulates the output of a residual block as a spatial interpolation between high-compute anchor pixels and low-compute cheaper pixels. This results in spatially interpolated convolutional blocks (SI-CNNs) that have better compute and performance trade-offs. Finally, we propose an algorithm that enforces various distributional constraints during training in order to achieve better generalization. We refer to this scheme as distributionally constrained learning (DCL). In the second dimension, i.e., \textit{Input Hardness Adaptive Models}, we introduce the notion of the hardness of any input relative to any architecture. In the first dimension, a neural network allocates the same resources, such as compute, storage, and working memory, for all the inputs. It inherently assumes that all examples are equally hard for a model. In this dimension, we challenge this assumption using input hardness as our reasoning that some inputs are relatively easy for a network to predict compared to others. Input hardness enables us to create selective classifiers wherein a low-capacity network handles simple inputs while abstaining from a prediction on the complex inputs. Next, we create hybrid models that route the hard inputs from the low-capacity abstaining network to a high-capacity expert model. We design various architectures that adhere to this hybrid inference style. Further, input hardness enables us to selectively distill the knowledge of a high-capacity model into a low-capacity model by cleverly discarding hard inputs during the distillation procedure. Finally, we conclude this thesis by sketching out various interesting future research directions that emerge as an extension of different ideas explored in this work.

Page generated in 0.1675 seconds