Global ETD Search

1	Expected Complexity and Gradients of Deep Maxout Neural Networks and Implications to Parameter Initialization Tseran, Hanna 10 November 2023 (has links) Learning with neural networks depends on the particular parametrization of the functions represented by the network, that is, the assignment of parameters to functions. It also depends on the identity of the functions, which get assigned typical parameters at initialization, and, later, the parameters that arise during training. The choice of the activation function is a critical aspect of the network design that influences these function properties and requires investigation. This thesis focuses on analyzing the expected behavior of networks with maxout (multi-argument) activation functions. On top of enhancing the practical applicability of maxout networks, these findings add to the theoretical exploration of activation functions beyond the common choices. We believe this work can advance the study of activation functions and complicated neural network architectures. We begin by taking the number of activation regions as a complexity measure and showing that the practical complexity of deep networks with maxout activation functions is often far from the theoretical maximum. This analysis extends the previous results that were valid for deep neural networks with single-argument activation functions such as ReLU. Additionally, we demonstrate that a similar phenomenon occurs when considering the decision boundaries in classification tasks. We also show that the parameter space has a multitude of full-dimensional regions with widely different complexity and obtain nontrivial lower bounds on the expected complexity. Finally, we investigate different parameter initialization procedures and show that they can increase the speed of the gradient descent convergence in training. Further, continuing the investigation of the expected behavior, we study the gradients of a maxout network with respect to inputs and parameters and obtain bounds for the moments depending on the architecture and the parameter distribution. We observe that the distribution of the input-output Jacobian depends on the input, which complicates a stable parameter initialization. Based on the moments of the gradients, we formulate parameter initialization strategies that avoid vanishing and exploding gradients in wide networks. Experiments with deep fully-connected and convolutional networks show that this strategy improves SGD and Adam training of deep maxout networks. In addition, we obtain refined bounds on the expected number of linear regions, results on the expected curve length distortion, and results on the NTK. As the result of the research in this thesis, we develop multiple experiments and helpful components and make the code for them publicly available. info:eu-repo/classification/ddc/500 ddc:500
2	The Dynamics of Neural Networks Expressivity with Applications to Remote Sensing Data / Dynamiken i neurala nätverks uttrycksförmåga med tillämpningar på fjärranalysdata Zhang, Hui January 2022 (has links) Deep neural networks (DNN) have been widely demonstrated to be more powerful than their shallower counterparts in a variety of computer vision tasks and remote sensing applications. However, as many techniques are based on trial-and-error experiments as opposed to systematic evaluation, scientific evidence for the superiority of DNN needs more theoretical and experimental foundations. Recent work has shown that the neural network expressivity, measured by the number of linear regions, is independent of the network structure, suggesting that the success of deep neural networks is attributed to its ease of training. Inspired by this, this project aims to investigate novel approaches to train neural networks and obtain desired properties of the regional properties of linear regions. In particular, it highlights the regional structure of linear regions in different decision regions and seeks to initialize the network in a better position that makes it easier to have this regional structure. By counting the total number of linear regions in the input space, we validated that the shallow wider networks and the deep narrow networks share the same upper-bound expressivity in different synthetic datasets. We also discovered that the linear regions along the decision boundary are larger in shape and fewer in number, while being denser and fitted to the data manifold when close to the data. Our experiments indicate that the proposed initialization method can generate more linear regions at initialization, make the training converge faster, and finally generate linear regions that better fit the data manifold on synthetic data. On the EuroSAT satellite dataset, the proposed initialization method does not facilitate the convergence of ResNet-18, but achieves better performance with an average increase of 0.14% on accuracy compared to pre-trained weights and 0.19% compared to He uniform initialization. / Djupa neurala nätverk (Deep Neural Networks, DNN) har i stor utsträckning visat sig vara mer kraftfulla än sina grunda motsvarigheter i en mängd olika datorseendeuppgifter och fjärranalystillämpningar. Många tekniker är dock baserade på försök och misstag snarare än systematisk utvärdering, och vetenskapliga bevis för DNN:s överlägsenhet behöver mer teoretiska och experimentella grunder. Nyligen utförda arbeten har visat att det neurala nätverkets uttrycksförmåga, mätt som antalet linjära regioner, är oberoende av nätverksstrukturen, vilket tyder på att framgången för djupa neurala nätverk beror på att de är lätta att träna. Inspirerat av detta syftar detta projekt till att undersöka nya metoder för att träna neurala nätverk och få önskade egenskaper hos de regionala egenskaperna hos linjära regioner. I synnerhet belyser det den regionala strukturen hos linjära regioner i olika beslutsregioner och försöker initiera nätverket i ett bättre läge som gör det lättare att få denna regionala struktur. Genom att räkna det totala antalet linjära regioner i ingångsutrymmet validerade vi att de grunda bredare nätverken och de djupa smala nätverken har samma övre gräns för uttrycklighet i olika syntetiska dataset. Vi upptäckte också att de linjära regionerna längs beslutsgränsen är större till formen och färre till antalet, samtidigt som de är tätare och anpassade till datamångfalden när de ligger nära data. Våra experiment visar att den föreslagna initialiseringsmetoden kan generera fler linjära regioner vid initialiseringen, få träningen att konvergera snabbare och slutligen generera linjära regioner som bättre passar datamångfalden på syntetiska data. På EuroSAT-satellitdatamängden underlättar den föreslagna initialiseringsmetoden inte konvergensen för ResNet-18, men uppnår bättre prestanda med en genomsnittlig ökning av noggrannheten med 0,14% jämfört med förtränade vikter och 0,19% jämfört med He uniform initialisering. Neural Networks Linear Regions Expressivity Initialization Remote Sensing Neurala nätverk linjära regioner uttrycksfullhet initialisering fjärranalys Elektroteknik och elektronik
3	An Empirical Study on the Generation of Linear Regions in ReLU Networks : Exploring the Relationship Between Data Topology and Network Complexity in Discriminative Modeling / En Empirisk Studie av Linjära Regioner i Styckvis Linjära Neurala Nätverk : En Utforskning av Sambandet Mellan Datatopologi och Komplexiteten hos Neurala Nätverk i Diskriminativ Modellering Eriksson, Petter January 2022 (has links) The far-reaching successes of deep neural networks in a wide variety of learning tasks have prompted research on how model properties account for high network performance. For a specific class of models whose activation functions are piecewise linear, one such property of interest is the number of linear regions that the network generates. Such models themselves define piecewise linear functions by partitioning input space into disjoint regions and fitting a different linear function on each such piece. It would be expected that the number or configuration of such regions would describe the model’s ability to fit complicated functions. However, previous works have shown difficulty in identifying linear regions as satisfactory predictors of model success. In this thesis, the question of whether the generation of linear regions due to training encode the properties of the learning problem is explored. More specifically, it is investigated whether change in linear region density due to model fitting is related to the geometric properties of the training data. In this work, data geometry is characterized in terms of the curvature of the underlying manifold. Models with ReLU activation functions are trained on a variety of regression problems defined on artificial manifolds and the change in linear region density is recorded along trajectories in input space. Learning is performed on problems defined on curves, surfaces and for image data. Experiments are repeated as the data geometry is varied and the change in density is compared with the manifold curvature measure used. In no experimental setting, was the observed change in density found to be clearly linked with curvature. However, density was observed to increase at points of discontinuity. This suggests that linear regions can in some instances model data complexities, however, the findings presented here do not support that data curvature is encoded by the formation of linear regions. Thus, the role that linear regions play in controlling the capacity of piecewise linear networks remains open. Future research is needed to gain further insights into how data geometry and linear regions are connected. / De breda framgångar som djupa neurala nätverk har uppvisat i en mängd olika inlärningsproblem har inspirerat ny forskning med syfte att förklara vilka modellegenskaper som resulterar i högpresterande nätverk. För neurala nätverk som använder styckvis linjära aktiveringsfunktioner är en intressant egenskap att studera de linjära regioner som nätverket genererar i det vektorrum som utgör träningsdatans definitionsmängd. Nätverk med styckvis linjära aktiveringsfunktioner delar upp definitionsmängden i distinkta regioner på vilka olika linjära funktioner avbildas. Dessa nätverk avbildar själva styckvis linjära funktioner. Genom att anpassa flera skilda linjära avbildningar går det att approximera funktioner som är icke-linjära. Därför skulle man kunna förvänta sig att antalet linjära regioner som en modell genererar och hur de är fördelade i rummet kunde fungera som mått på modellens förmåga att lära sig komplicerade funktioner. Tidigare efterforskingar inom detta område har dock inte kunnat demonstrera ett samband mellan antalet eller fördelningen av linjära regioner och modellens prestanda. I den här avhandlingen undersöks det vilken roll linjära regioner spelar i att förklara en modells kapacitet och vad den lär sig. Fångar de linjära regioner som ett nätverk lär sig de underliggande egenskaperna hos träningsdatan? Mer specifikt så studeras huruvida den lokala förändringen i antalet linjära regioner efter modellträning korrelerar med träningsdatans geometri. Träningsdata genereras från syntetiska mångfalder och datageometrin beskrivs i termer av mångfaldens krökning. På dessa mångfalder definieras regressionsproblem och träning upprepas för topologier av olika form och med olika krökning. Skillnaden i antalet linjära regioner efter träning mäts längs banor i definitionsdomänen och jämförs med datans krökning. Ingen av de experiment som utfördes lyckades påvisa något tydligt samband mellan förändring i antal regioner och datans krökning. Det observerades dock att antalet linjära regioner ökar i närheten av punkter som utgör diskontinuiteter. Detta antyder att linjära regioner under vissa omständigheter kan modellera komplexitet. Således förblir rollen som linjära regioner har i att förklara modellförmåga diffus. Neural networks Linear regions Activation patterns Piecewise linear activation functions Data Curvature Neurala nätverk Linjära regioner Styckvis linjära aktiveringsfunktioner Data Krökning Computer and Information Sciences Data- och informationsvetenskap

Search results

Expected Complexity and Gradients of Deep Maxout Neural Networks and Implications to Parameter Initialization

The Dynamics of Neural Networks Expressivity with Applications to Remote Sensing Data / Dynamiken i neurala nätverks uttrycksförmåga med tillämpningar på fjärranalysdata