Global ETD Search

41	Increasing CNN representational power using absolute cosine value regularization Singleton, William S. 05 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The Convolutional Neural Network (CNN) is a mathematical model designed to distill input information into a more useful representation. This distillation process removes information over time through a series of dimensionality reductions, which ultimately, grant the model the ability to resist noise, and generalize effectively. However, CNNs often contain elements that are ineffective at contributing towards useful representations. This Thesis aims at providing a remedy for this problem by introducing Absolute Cosine Value Regularization (ACVR). This is a regularization technique hypothesized to increase the representational power of CNNs by using a Gradient Descent Orthogonalization algorithm to force the vectors that constitute their filters at any given convolutional layer to occupy unique positions in in their respective spaces. This method should in theory, lead to a more effective balance between information loss and representational power, ultimately, increasing network performance. The following Thesis proposes and examines the mathematics and intuition behind ACVR, and goes on to propose Dynamic-ACVR (D-ACVR). This Thesis also proposes and examines the effects of ACVR on the filters of a low-dimensional CNN, as well as the effects of ACVR and D-ACVR on traditional Convolutional filters in VGG-19. Finally, this Thesis proposes and examines regularization of the Pointwise filters in MobileNetv1. Absolute Cosine Value Regularization CIFAR-10 Convolutional Neural Network D-ACVR Gradient Descent Orthogonalization MobileNetv1 VGG-19
42	Spectral Analysis Using Multitaper Whittle Methods with a Lasso Penalty Tang, Shuhan 25 September 2020 (has links) No description available. Statistics basis expansion multitaper spectral estimates proximal gradient descent sample splitting Whittle likelihood
43	Models for fitting correlated non-identical bernoulli random variables with applications to an airline data problem Perez Romo Leroux, Andres January 2021 (has links) Our research deals with the problem of devising models for fitting non- identical dependent Bernoulli variables and using these models to predict fu- ture Bernoulli trials.We focus on modelling and predicting random Bernoulli response variables which meet all of the following conditions: 1. Each observed as well as future response corresponds to a Bernoulli trial 2. The trials are non-identical, having possibly different probabilities of occurrence 3. The trials are mutually correlated, with an underlying complex trial cluster correlation structure. Also allowing for the possible partitioning of trials within clusters into groups. Within cluster - group level correlation is reflected in the correlation structure. 4. The probability of occurrence and correlation structure for both ob- served and future trials can depend on a set of observed covariates. A number of proposed approaches meeting some of the above conditions are present in the current literature. Our research expands on existing statistical and machine learning methods. We propose three extensions to existing models that make use of the above conditions. Each proposed method brings specific advantages for dealing with correlated binary data. The proposed models allow for within cluster trial grouping to be reflected in the correlation structure. We partition sets of trials into groups either explicitly estimated or implicitly inferred. Explicit groups arise from the determination of common covariates; inferred groups arise via imposing mixture models. The main motivation of our research is in modelling and further understanding the potential of introducing binary trial group level correlations. In a number of applications, it can be beneficial to use models that allow for these types of trial groupings, both for improved predictions and better understanding of behavior of trials. The first model extension builds on the Multivariate Probit model. This model makes use of covariates and other information from former trials to determine explicit trial groupings and predict the occurrence of future trials. We call this the Explicit Groups model. The second model extension uses mixtures of univariate Probit models. This model predicts the occurrence of current trials using estimators of pa- rameters supporting mixture models for the observed trials. We call this the Inferred Groups model. Our third methods extends on a gradient descent based boosting algorithm which allows for correlation of binary outcomes called WL2Boost. We refer to our extension of this algorithm as GWL2Boost. Bernoulli trials are divided into observed and future trials; with all trials having associated known covariate information. We apply our methodology to the problem of predicting the set and total number of passengers who will not show up on commercial flights using covariate information and past passenger data. The models and algorithms are evaluated with regards to their capac- ity to predict future Bernoulli responses. We compare the models proposed against a set of competing existing models and algorithms using available air- line passenger no-show data. We show that our proposed algorithm extension GWL2Boost outperforms top existing algorithms and models that assume in- dependence of binary outcomes in various prediction metrics. / Statistics Statistics Applied case study Binary group correlation Correlated binary data Gradient descent boosting Machine learning Multivariate probit model
44	First-Order Algorithms for Communication Efficient Distributed Learning Khirirat, Sarit January 2019 (has links) Technological developments in devices and storages have made large volumes of data collections more accessible than ever. This transformation leads to optimization problems with massive data in both volume and dimension. In response to this trend, the popularity of optimization on high performance computing architectures has increased unprecedentedly. These scalable optimization solvers can achieve high efficiency by splitting computational loads among multiple machines. However, these methods also incur large communication overhead. To solve optimization problems with millions of parameters, communication between machines has been reported to consume up to 80% of the training time. To alleviate this communication bottleneck, many optimization algorithms with data compression techniques have been studied. In practice, they have been reported to significantly save communication costs while exhibiting almost comparable convergence as the full-precision algorithms. To understand this intuition, we develop theory and techniques in this thesis to design communication-efficient optimization algorithms. In the first part, we analyze the convergence of optimization algorithms with direct compression. First, we outline definitions of compression techniques which cover many compressors of practical interest. Then, we provide the unified analysis framework of optimization algorithms with compressors which can be either deterministic or randomized. In particular, we show how the tuning parameters of compressed optimization algorithms must be chosen to guarantee performance. Our results show explicit dependency on compression accuracy and delay effect due to asynchrony of algorithms. This allows us to characterize the trade-off between iteration and communication complexity under gradient compression. In the second part, we study how error compensation schemes can improve the performance of compressed optimization algorithms. Even though convergence guarantees of optimization algorithms with error compensation have been established, there is very limited theoretical support which guarantees improved solution accuracy. We therefore develop theoretical explanations, which show that error compensation guarantees arbitrarily high solution accuracy from compressed information. In particular, error compensation helps remove accumulated compression errors, thus improving solution accuracy especially for ill-conditioned problems. We also provide strong convergence analysis of error compensation on parallel stochastic gradient descent across multiple machines. In particular, the error-compensated algorithms, unlike direct compression, result in significant reduction in the compression error. Applications of the algorithms in this thesis to real-world problems with benchmark data sets validate our theoretical results. / Utvecklandet av kommunikationsteknologi och datalagring har gjort stora mängder av datasamlingar mer tillgängliga än någonsin. Denna förändring leder till numeriska optimeringsproblem med datamängder med stor skala i volym och dimension. Som svar på denna trend har populariteten för högpresterande beräkningsarkitekturer ökat mer än någonsin tidigare. Skalbara optimeringsverktyg kan uppnå hög effektivitet genom att fördela beräkningsbördan mellan ett flertal maskiner. De kommer dock i praktiken med ett pris som utgörs av betydande kommunikationsomkostnader. Detta orsakar ett skifte i flaskhalsen för prestandan från beräkningar till kommunikation. När lösning av verkliga optimeringsproblem sker med ett stort antal parametrar, dominerar kommunikationen mellan maskiner nästan 80% av träningstiden. För att minska kommunikationsbelastningen, har ett flertal kompressionstekniker föreslagits i litteraturen. Även om optimeringsalgoritmer som använder dessa kompressorer rapporteras vara lika konkurrenskraftiga som sina motsvarigheter med full precision, dras de med en förlust av noggrannhet. För att ge en uppfattning om detta, utvecklar vi i denna avhandling teori och tekniker för att designa kommunikations-effektiva optimeringsalgoritmer som endast använder information med låg precision. I den första delen analyserar vi konvergensen hos optimeringsalgoritmer med direkt kompression. Först ger vi en översikt av kompressionstekniker som täcker in många kompressorer av praktiskt intresse. Sedan presenterar vi ett enhetligt analysramverk för optimeringsalgoritmer med kompressorer, som kan vara antingen deterministiska eller randomiserade. I synnerhet visas val av parametrar i komprimerade optimeringsalgoritmer som avgörs av kompressorns parametrar som garanterar konvergens. Våra konvergensgarantier visar beroende av kompressorns noggrannhet och fördröjningseffekter på grund av asynkronicitet hos algoritmer. Detta låter oss karakterisera avvägningen mellan iterations- och kommunikations-komplexitet när kompression används. I den andra delen studerarvi hög prestanda hos felkompenseringsmetoder för komprimerade optimeringsalgoritmer. Även om konvergensgarantier med felkompensering har etablerats finns det väldigt begränsat teoretiskt stöd för konkurrenskraftiga konvergensgarantier med felkompensering. Vi utvecklar därför teoretiska förklaringar, som visar att användande av felkompensering garanterar godtyckligt hög lösningsnoggrannhet från komprimerad information. I synnerhet bidrar felkompensering till att ta bort ackumulerade kompressionsfel och förbättrar därmed lösningsnoggrannheten speciellt för illa konditionerade kvadratiska optimeringsproblem. Vi presenterar också stark konvergensanalys för felkompensering tillämpat på stokastiska gradientmetoder med ett kommunikationsnätverk innehållande ett flertal maskiner. De felkompenserade algoritmerna resulterar, i motsats till direkt kompression, i betydande reducering av kompressionsfelet. Simuleringar av algoritmer i denna avhandling på verkligaproblem med referensdatamängder validerar våra teoretiska resultat. / <p>QC20191120</p> Communication efficient learning Optimization algorithms Quantization Error compensation First-order algorithms Stochastic gradient descent Control Engineering Reglerteknik
45	Regularization: Stagewise Regression and Bagging Ehrlinger, John M. 31 March 2011 (has links) No description available. Statistics regularization linear regression machine learning feature selection LARS lasso gradient descent regularized stagewise bagging out-of-bag
46	Algoritmos de adaptação do padrão de marcha utilizando redes neurais / Gait-pattern adaptation algorithms using neural network Gomes, Marciel Alberto 09 October 2009 (has links) Este trabalho apresenta o desenvolvimento de algoritmos de adaptação do padrão de marcha com a utilização de redes neurais artificiais para uma órtese ativa para membros inferiores. Trajetórias estáveis são geradas durante o processo de otimização, considerando um gerador de trajetórias baseado no critério do ZMP (Zero Moment Point) e no modelo dinâmico do equipamento. Três redes neurais são usadas para diminuir o tempo de cálculo do modelo e da otimização do ZMP, e reproduzir o gerador de trajetórias analítico. A primeira rede aproxima a dinâmica do modelo fornecendo a variação de torque necessária para a realização do processo de otimização dos parâmetros de adaptação da marcha; a segunda rede trabalha no processo de otimização, fornecendo o parâmetro otimizado de acordo com a interação paciente-órtese; a terceira rede reproduz o gerador de trajetórias para um determinado intervalo de tempo do passo que pode ser repetido para qualquer quantidade de passos. Além disso, um controle do tipo torque calculado acrescido de um controle PD é usado para garantir que as trajetórias atuais estejam seguindo as trajetórias desejadas da órtese. O modelo dinâmico da órtese na sua configuração atual, com forças de interação incluídas, é usado para gerar resultados simulados. / This work deals with neural network-based gait-pattern adaptation algorithms for an active lower limbs orthosis. Stable trajectories are generated during the optimization process, considering a trajectory generator based on the Zero Moment Point criterion and on the dynamic model. Additionally, three neural network are used to decrease the time-consuming computation of the model and ZMP optimization and to reproduce the analitical trajectory generator. The first neural network approximates the dynamic model providing the necessary torque variation to gait adaptation parameters process; the second network works in the optimization procedure, giving the adapting parameter according to orthosis-patient interaction; and the third network replaces the trajectory generation for a stablished step time interval which can be reproduced any time during the walking. Also, a computed torque controller plus the PD controller is designed to guarantee the actual trajectories are following the orthosis desired trajectories. The dynamic model of the actual active orthosis, with interaction forces included, is used to generate simulation results. Adaptive algorithms Algoritmos de adaptação Artificial neural network Exo-esqueleto Exoskeleton Gait pattern Gerador de trajetórias Gradient descent method Método do gradiente Optimization Otimização Padrão de marcha Redes neurais artificiais Trajectory generator
47	Optimisation des plans de traitement en radiothérapie grâce aux dernières techniques de calcul de dose rapide / Optimization in radiotherapy treatment planning thanks to a fast dose calculation method Yang, Ming Chao 13 March 2014 (has links) Cette thèse s'inscrit dans la perspective des traitements de radiothérapie en insistant sur la nécessité de disposer d’un logiciel de planification de traitement (TPS) rapide et fiable. Le TPS est composé d'un algorithme de calcul de dose et d’une méthode d’optimisation. L'objectif est de planifier le traitement afin de délivrer la dose à la tumeur tout en sauvegardant les tissus sains et sensibles environnant. La planification des traitements consiste à déterminer les paramètres d’irradiation les mieux adaptés au patient. Dans le cadre de cette thèse, les paramètres d'un traitement par RCMI (Radiothérapie Conformationnelle avec Modulation d'Intensité) sont la position de la source, les orientations des faisceaux et, pour chaque faisceau composé de faisceaux élémentaires, la fluence de ces derniers. La fonction objectif est multicritère en associant des contraintes linéaires. L’objectif de la thèse est de démontrer la faisabilité d'une méthode d'optimisation du plan de traitement fondée sur la technique de calcul de dose rapide développée par (Blanpain, 2009). Cette technique s’appuie sur un fantôme segmenté en mailles homogènes. Le calcul de dose s’effectue en deux étapes. La première étape concerne les mailles : les projections et pondérations y sont paramétrées en fonction de critères physiques et géométriques. La seconde étape concerne les voxels: la dose y est calculée en évaluant les fonctions préalablement associées à leur maille.Une reformulation de cette technique permet d’aborder le problème d’optimisation par la méthode de descente de gradient. L’optimisation en continu des paramètres du traitement devient envisageable. Les résultats obtenus dans le cadre de cette thèse ouvrent de nombreuses perspectives dans le domaine de l’optimisation des plans de traitement en radiothérapie. / This thesis deals with the radiotherapy treatments planning issue which need a fast and reliable treatment planning system (TPS). The TPS is composed of a dose calculation algorithm and an optimization method. The objective is to design a plan to deliver the dose to the tumor while preserving the surrounding healthy and sensitive tissues.The treatment planning aims to determine the best suited radiation parameters for each patient’s treatment. In this thesis, the parameters of treatment with IMRT (Intensity modulated radiation therapy) are the beam angle and the beam intensity. The objective function is multicritiria with linear constraints.The main objective of this thesis is to demonstrate the feasibility of a treatment planning optimization method based on a fast dose-calculation technique developed by (Blanpain, 2009). This technique proposes to compute the dose by segmenting the patient’s phantom into homogeneous meshes. The dose computation is divided into two steps. The first step impacts the meshes: projections and weights are set according to physical and geometrical criteria. The second step impacts the voxels: the dose is computed by evaluating the functions previously associated to their mesh.A reformulation of this technique makes possible to solve the optimization problem by the gradient descent algorithm. The main advantage of this method is that the beam angle parameters could be optimized continuously in 3 dimensions. The obtained results in this thesis offer many opportunities in the field of radiotherapy treatment planning optimization. Radiothérapie Optimisation du plan de traitement TPS Calcul de dose Maille homogène Descente de gradient Radiotherapy Treatments planning optimization TPS Dose calculation Homogeneous mesh Gradient descent
48	Research on Robust Fuzzy Neural Networks Wu, Hsu-Kun 19 November 2010 (has links) In many practical applications, it is well known that data collected inevitably contain one or more anomalous outliers; that is, observations that are well separated from the majority or bulk of the data, or in some fashion deviate from the general pattern of the data. The occurrence of outliers may be due to misplaced decimal points, recording errors, transmission errors, or equipment failure. These outliers can lead to erroneous parameter estimation and consequently affect the correctness and accuracy of the model inference. In order to solve these problems, three robust fuzzy neural networks (FNNs) will be proposed in this dissertation. This provides alternative learning machines when faced with general nonlinear learning problems. Our emphasis will be put particularly on the robustness of these learning machines against outliers. Though we consider only FNNs in this study, the extension of our approach to other neural networks, such as artificial neural networks and radial basis function networks, is straightforward. In the first part of the dissertation, M-estimators, where M stands for maximum likelihood, frequently used in robust regression for linear parametric regression problems will be generalized to nonparametric Maximum Likelihood Fuzzy Neural Networks (MFNNs) for nonlinear regression problems. Simple weight updating rules based on gradient descent and iteratively reweighted least squares (IRLS) will be derived. In the second part of the dissertation, least trimmed squares estimators, abbreviated as LTS-estimators, frequently used in robust (or resistant) regression for linear parametric regression problems will be generalized to nonparametric least trimmed squares fuzzy neural networks, abbreviated as LTS-FNNs, for nonlinear regression problems. Again, simple weight updating rules based on gradient descent and iteratively reweighted least squares (IRLS) algorithms will be provided. In the last part of the dissertation, by combining the easy interpretability of the parametric models and the flexibility of the nonparametric models, semiparametric fuzzy neural networks (semiparametric FNNs) and semiparametric Wilcoxon fuzzy neural networks (semiparametric WFNNs) will be proposed. The corresponding learning rules are based on the backfitting procedure which is frequently used in semiparametric regression. backfitting procedure semiparametric fuzzy neural networks iteratively reweighted least squares Maximum Likelihood Fuzzy Neural Networks gradient descent
49	Concurrent learning for convergence in adaptive control without persistency of excitation Chowdhary, Girish 11 November 2010 (has links) Model Reference Adaptive Control (MRAC) is a widely studied adaptive control methodology that aims to ensure that a nonlinear plant with significant modeling uncertainty behaves like a chosen reference model. MRAC methods attempt to achieve this by representing the modeling uncertainty as a weighted combination of known nonlinear functions, and using a weight update law that ensures weights take on values such that the effect of the uncertainty is mitigated. If the adaptive weights do arrive at an ideal value that best represent the uncertainty, significant performance and robustness gains can be realized. However, most MRAC adaptive laws use only instantaneous data for adaptation and can only guarantee that the weights arrive at these ideal values if and only if the plant states are Persistently Exciting (PE). The condition on PE reference input is restrictive and often infeasible to implement or monitor online. Consequently, parameter convergence cannot be guaranteed in practice for many adaptive control applications. Hence it is often observed that traditional adaptive controllers do not exhibit long-term-learning and global uncertainty parametrization. That is, they exhibit little performance gain even when the system tracks a repeated command. This thesis presents a novel approach to adaptive control that relies on using current and recorded data concurrently for adaptation. The thesis shows that for a concurrent learning adaptive controller, a verifiable condition on the linear independence of the recorded data is sufficient to guarantee that weights arrive at their ideal values even when the system states are not PE. The thesis also shows that the same condition can guarantee exponential tracking error and weight error convergence to zero, thereby allowing the adaptive controller to recover the desired transient response and robustness properties of the chosen reference models and to exhibit long-term-learning. This condition is found to be less restrictive and easier to verify online than the condition on persistently exciting exogenous input required by traditional adaptive laws that use only instantaneous data for adaptation. The concept is explored for several adaptive control architectures, including neuro-adaptive flight control, where a neural network is used as the adaptive element. The performance gains are justified theoretically using Lyapunov based arguments, and demonstrated experimentally through flight-testing on Unmanned Aerial Systems. Gradient descent Flight control systems Flight test Unmanned aerial systems Adaptive control Parameter identification Adaptive control systems Artificial intelligence Robust control
50	Cost-sensitive boosting : a unified approach Nikolaou, Nikolaos January 2016 (has links) In this thesis we provide a unifying framework for two decades of work in an area of Machine Learning known as cost-sensitive Boosting algorithms. This area is concerned with the fact that most real-world prediction problems are asymmetric, in the sense that different types of errors incur different costs. Adaptive Boosting (AdaBoost) is one of the most well-studied and utilised algorithms in the field of Machine Learning, with a rich theoretical depth as well as practical uptake across numerous industries. However, its inability to handle asymmetric tasks has been the subject of much criticism. As a result, numerous cost-sensitive modifications of the original algorithm have been proposed. Each of these has its own motivations, and its own claims to superiority. With a thorough analysis of the literature 1997-2016, we find 15 distinct cost-sensitive Boosting variants - discounting minor variations. We critique the literature using {\em four} powerful theoretical frameworks: Bayesian decision theory, the functional gradient descent view, margin theory, and probabilistic modelling. From each framework, we derive a set of properties which must be obeyed by boosting algorithms. We find that only 3 of the published Adaboost variants are consistent with the rules of all the frameworks - and even they require their outputs to be calibrated to achieve this. Experiments on 18 datasets, across 21 degrees of cost asymmetry, all support the hypothesis - showing that once calibrated, the three variants perform equivalently, outperforming all others. Our final recommendation - based on theoretical soundness, simplicity, flexibility and performance - is to use the original Adaboost algorithm albeit with a shifted decision threshold and calibrated probability estimates. The conclusion is that novel cost-sensitive boosting algorithms are unnecessary if proper calibration is applied to the original. 006.3

Search results