Global ETD Search

21	Using Second-Order Information in Training Deep Neural Networks Ren, Yi January 2022 (has links) In this dissertation, we are concerned with the advancement of optimization algorithms for training deep learning models, and in particular about practical second-order methods that take into account the structure of deep neural networks (DNNs). Although first-order methods such as stochastic gradient descent have long been the predominant optimization algorithm used in deep learning, second-order methods are of interest because of their ability to use curvature information to accelerate the optimization process. After the presentation of some background information in Chapter 1, Chapters 2 and 3 focus on the development of practical quasi-Newton methods for training DNNs. We analyze the Kronecker-factored structure of the Hessian matrix of multi-layer perceptrons and convolutional neural networks and consequently propose block-diagonal Kronecker-factored quasi-Newton methods named K-BFGS and K-BFGS(L). To handle the non-convexity nature of DNNs, we also establish new double damping techniques for our proposed methods. Our K-BFGS and K-BFGS(L) methods have memory requirements comparable to first-order methods and experience only mild overhead in terms of per-iteration time complexity. In Chapter 4, we develop a new approximate natural gradient method named Tensor Normal Training (TNT), in which the Fisher matrix is viewed as the covariance matrix of a tensor normal distribution (a generalized form of the normal distribution). The tractable Kronecker-factored approximation to the Fisher information matrix that results from this approximation enables TNT to enjoy memory requirements and per-iteration computational costs that are only slightly higher than those for first-order methods. Notably, unlike KFAC and K-BFGS/K-BFGS(L), TNT only requires the knowledge of the shape of the trainable parameters of a model and does not depend on the specific model architecture. In Chapter 5, we consider the subsampled versions of Gauss-Newton and natural gradient methods applied to DNNs. Because of the low-rank nature of the subsampled matrices, we make use of the Sherman-Morrison-Woodbury formula along with backpropagation to efficiently compute their inverse. We also show that, under rather mild conditions, the algorithm converges to a stationary point if Levenberg-Marquardt damping is used. The results of a substantial number of numerical experiments are reported in Chapters 2, 3, 4 and 5, in which we compare the performance of our methods to state-of-the-art methods used to train DNNs, that demonstrate the efficiency and effectiveness of our proposed new second-order methods. Operations research Deep learning (Machine learning) Stochastic processes Neural networks (Computer science) Least squares
22	Going Deeper with Images and Natural Language Ma, Yufeng 29 March 2019 (has links) One aim in the area of artificial intelligence (AI) is to develop a smart agent with high intelligence that is able to perceive and understand the complex visual environment around us. More ambitiously, it should be able to interact with us about its surroundings in natural languages. Thanks to the progress made in deep learning, we've seen huge breakthroughs towards this goal over the last few years. The developments have been extremely rapid in visual recognition, in which machines now can categorize images into multiple classes, and detect various objects within an image, with an ability that is competitive with or even surpasses that of humans. Meanwhile, we also have witnessed similar strides in natural language processing (NLP). It is quite often for us to see that now computers are able to almost perfectly do text classification, machine translation, etc. However, despite much inspiring progress, most of the achievements made are still within one domain, not handling inter-domain situations. The interaction between the visual and textual areas is still quite limited, although there has been progress in image captioning, visual question answering, etc. In this dissertation, we design models and algorithms that enable us to build in-depth connections between images and natural languages, which help us to better understand their inner structures. In particular, first we study how to make machines generate image descriptions that are indistinguishable from ones expressed by humans, which as a result also achieved better quantitative evaluation performance. Second, we devise a novel algorithm for measuring review congruence, which takes an image and review text as input and quantifies the relevance of each sentence to the image. The whole model is trained without any supervised ground truth labels. Finally, we propose a brand new AI task called Image Aspect Mining, to detect visual aspects in images and identify aspect level rating within the review context. On the theoretical side, this research contributes to multiple research areas in Computer Vision (CV), Natural Language Processing (NLP), interactions between CVandNLP, and Deep Learning. Regarding impact, these techniques will benefit related users such as the visually impaired, customers reading reviews, merchants, and AI researchers in general. / Doctor of Philosophy / One aim in the area of artificial intelligence (AI) is to develop a smart agent with high intelligence that is able to perceive and understand the complex visual environment around us. More ambitiously, it should be able to interact with us about its surroundings in natural languages. Thanks to the progress made in deep learning, we’ve seen huge breakthroughs towards this goal over the last few years. The developments have been extremely rapid in visual recognition, in which machines now can categorize images into multiple classes, and detect various objects within an image, with an ability that is competitive with or even surpasses that of humans. Meanwhile, we also have witnessed similar strides in natural language processing (NLP). It is quite often for us to see that now computers are able to almost perfectly do text classification, machine translation, etc. However, despite much inspiring progress, most of the achievements made are still within one domain, not handling inter-domain situations. The interaction between the visual and textual areas is still quite limited, although there has been progress in image captioning, visual question answering, etc. In this dissertation, we design models and algorithms that enable us to build in-depth connections between images and natural languages, which help us to better understand their inner structures. In particular, first we study how to make machines generate image descriptions that are indistinguishable from ones expressed by humans, which as a result also achieved better quantitative evaluation performance. Second, we devise a novel algorithm for measuring review congruence, which takes an image and review text as input and quantifies the relevance of each sentence to the image. The whole model is trained without any supervised ground truth labels. Finally, we propose a brand new AI task called Image Aspect Mining, to detect visual aspects in images and identify aspect level rating within the review context. On the theoretical side, this research contributes to multiple research areas in Computer Vision (CV), Natural Language Processing (NLP), interactions between CV&NLP, and Deep Learning. Regarding impact, these techniques will benefit related users such as the visually impaired, customers reading reviews, merchants, and AI researchers in general. Image Captioning Quasi-Supervised Learning Image Aspect Mining GANs Deep learning (Machine learning)
23	Application of Machine Learning to Multi Antenna Transmission and Machine Type Resource Allocation Emenonye, Don-Roberts Ugochukwu 11 September 2020 (has links) Wireless communication systems is a well-researched area in electrical engineering that has continually evolved over the past decades. This constant evolution and development have led to well-formulated theoretical baselines in terms of reliability and efficiency. However, most communication baselines are derived by splitting the baseband communications into a series of modular blocks like modulation, coding, channel estimation, and orthogonal frequency modulation. Subsequently, these blocks are independently optimized. Although this has led to a very efficient and reliable process, a theoretical verification of the optimality of this design process is not feasible due to the complexities of each individual block. In this work, we propose two modifications to these conventional wireless systems. First, with the goal of designing better space-time block codes for improved reliability, we propose to redesign the transmit and receive blocks of the physical layer. We replace a portion of the transmit chain - from modulation to antenna mapping with a neural network. Similarly, the receiver/decoder is also replaced with a neural network. In other words, the first part of this work focuses on jointly optimizing the transmit and receive blocks to produce a set of space-time codes that are resilient to Rayleigh fading channels. We compare our results to the conventional orthogonal space-time block codes for multiple antenna configurations. The second part of this work investigates the possibility of designing a distributed multiagent reinforcement learning-based multi-access algorithm for machine type communication. This work recognizes that cellular networks are being proposed as a solution for the connectivity of machine type devices (MTDs) and one of the most crucial aspects of scheduling in cellular connectivity is the random access procedure. The random access process is used by conventional cellular users to receive an allocation for the uplink transmissions. This process usually requires six resource blocks. It is efficient for cellular users to perform this process because transmission of cellular data usually requires more than six resource blocks. Hence, it is relatively efficient to perform the random access process in order to establish a connection. Moreover, as long as cellular users maintain synchronization, they do not have to undertake the random access process every time they have data to transmit. They can maintain a connection with the base station through discontinuous reception. On the other hand, the random access process is unsuitable for MTDs because MTDs usually have small-sized packets. Hence, performing the random access process to transmit such small-sized packets is highly inefficient. Also, most MTDs are power constrained, thus they turn off when they have no data to transmit. This means that they lose their connection and can't maintain any form of discontinuous reception. Hence, they perform the random process each time they have data to transmit. Due to these observations, explicit scheduling is undesirable for MTC. To overcome these challenges, we propose bypassing the entire scheduling process by using a grant free resource allocation scheme. In this scheme, MTDs pseudo-randomly transmit their data in random access slots. Note that this results in the possibility of a large number of collisions during the random access slots. To alleviate the resulting congestion, we exploit a heterogeneous network and investigate the optimal MTD-BS association which minimizes the long term congestion experienced in the overall cellular network. Our results show that we can derive the optimal MTD-BS association when the number of MTDs is less than the total number of random access slots. / Master of Science / Wireless communication systems is a well researched area of engineering that has continually evolved over the past decades. This constant evolution and development has led to well formulated theoretical baselines in terms of reliability and efficiency. This two part thesis investigates the possibility of improving these wireless systems with machine learning. First, with the goal of designing more resilient codes for transmission, we propose to redesign the transmit and receive blocks of the physical layer. We focus on jointly optimizing the transmit and receive blocks to produce a set of transmit codes that are resilient to channel impairments. We compare our results to the current conventional codes for various transmit and receive antenna configuration. The second part of this work investigates the possibility of designing a distributed multi-access scheme for machine type devices. In this scheme, MTDs pseudo-randomly transmit their data by randomly selecting time slots. This results in the possibility of a large number of collisions occurring in the duration of these slots. To alleviate the resulting congestion, we employ a heterogeneous network and investigate the optimal MTD-BS association which minimizes the long term congestion experienced in the overall network. Our results show that we can derive the optimal MTD-BS algorithm when the number of MTDs is less than the total number of slots. Machine Type Communication Space Time Block Coding Deep learning (Machine learning) Reinforcement Learning
24	Multimodal deep learning systems for analysis of human behavior, preference, and state Koorathota, Sharath Chandra January 2023 (has links) Deep learning has become a widely used tool for inference and prediction in neuroscience research. Despite their differences, most neural network architectures convert raw input data into lower-dimensional vector representations that subsequent network layers can more easily process. Significant advancements have been made in improving latent representations in audiovisual problems. However, human neurophysiological data is often scarcer, noisier, and more challenging to learn from when integrated from multiple sources. The present work integrates neural, physiological, and behavioral data to improve human behavior, preference, and state prediction. Across five studies, we explore (i) how embeddings, or vectorized representations, can be designed to understand the context of input data better, (ii) how the attention mechanism found in transformer models can be adapted to capture crossmodal relationships in an interpretable way, and (iii) how humans make sensorimotor decisions in a realistic scenario with implications for designing automated systems. Part I focuses on improving the context for latent representations in deep neural networks. We achieve this by introducing a hierarchical structure in clinical data to predict cognitive performance in a large, longitudinal cohort study. In a separate study, we present a recurrent neural network that captures non-cognitive pupil dynamics by utilizing visual areas of interest as inputs. In Part II, we employ attention-based approaches for multimodal integration by learning to weigh modalities that differ in the type of information they capture. We show that our crossmodal attention framework can adapt to audiovisual and neurophysiological input data. Part III proposes a novel paradigm to study sensorimotor decision-making in a driving scenario and study brain connectivity in the context of pupil-linked arousal. Our findings reveal that embeddings that capture input data's hierarchical or temporal context consistently yield high performance across different tasks. Moreover, our studies demonstrate the versatility of the attention mechanism, which we show can effectively integrate various modalities such as text descriptions, perceived differences in video clips, and recognized objects. Our multimodal transformer, designed to handle neurophysiological data, improves the prediction of emotional states by integrating brain and autonomic activity. Taken together, our work advances the development of multimodal systems for predicting human behavior, preference, and state across domains. Artificial intelligence Neurosciences Deep learning (Machine learning) Neural networks (Computer science) Human behavior
25	Quantifying Trust in Deep Learning Ultrasound Models by Investigating Hardware and Operator Variance Zhu, Calvin January 2021 (has links) Ultrasound (US) is the most widely used medical imaging modality due to its low cost, portability, real time imaging ability and use of non-ionizing radiation. However, unlike other imaging modalities such as CT or MRI, it is a heavily operator dependent, requiring trained expertise to leverage these benefits. Recently there has been an explosion of interest in artificial intelligence (AI) across the medical community and many are turning to the growing trend of deep learning (DL) models to assist in diagnosis. However, deep learning models do not perform as well when training data is not fully representative of the problem. Due to this difference in training and deployment, model performance suffers which can lead to misdiagnosis. This issue is known as dataset shift. Two aims to address dataset shift were proposed. The first was to quantify how US operator skill and hardware affects acquired images. The second was to use this skill quantification method to screen and match data to deep learning models to improve performance. A BLUE phantom from CAE Healthcare (Sarasota, FL) with various mock lesions was scanned by three operators using three different US systems (Siemens S3000, Clarius L15, and Ultrasonix SonixTouch) producing 39013 images. DL models were trained on a specific set to classify the presence of a simulated tumour and tested with data from differing sets. The Xception, VGG19, and ResNet50 architectures were used to test the effects with varying frameworks. K-Means clustering was used to separate images generated by operator and hardware into clusters. This clustering algorithm was then used to screen incoming images during deployment to best match input to an appropriate DL model which is trained specifically to classify that type of operator or hardware. Results showed a noticeable difference when models were given data from differing datasets with the largest accuracy drop being 81.26% to 31.26%. Overall, operator differences more significantly affected DL model performance. Clustering models had much higher success separating hardware data compared to operator data. The proposed method reflects this result with a much higher accuracy across the hardware test set compared to the operator data. / Thesis / Master of Applied Science (MASc)
26	Efficient Machine Teaching Frameworks for Natural Language Processing Karamanolakis, Ioannis January 2022 (has links) The past decade has seen tremendous growth in potential applications of language technologies in our daily lives due to increasing data, computational resources, and user interfaces. An important step to support emerging applications is the development of algorithms for processing the rich variety of human-generated text and extracting relevant information. Machine learning, especially deep learning, has seen increasing success on various text benchmarks. However, while standard benchmarks have static tasks with expensive human-labeled data, real-world applications are characterized by dynamic task specifications and limited resources for data labeling, thus making it challenging to transfer the success of supervised machine learning to the real world. To deploy language technologies at scale, it is crucial to develop alternative techniques for teaching machines beyond data labeling. In this dissertation, we address this data labeling bottleneck by studying and presenting resource-efficient frameworks for teaching machine learning models to solve language tasks across diverse domains and languages. Our goal is to (i) support emerging real-world problems without the expensive requirement of large-scale manual data labeling; and (ii) assist humans in teaching machines via more flexible types of interaction. Towards this goal, we describe our collaborations with experts across domains (including public health, earth sciences, news, and e-commerce) to integrate weakly-supervised neural networks into operational systems, and we present efficient machine teaching frameworks that leverage flexible forms of declarative knowledge as supervision: coarse labels, large hierarchical taxonomies, seed words, bilingual word translations, and general labeling rules. First, we present two neural network architectures that we designed to leverage weak supervision in the form of coarse labels and hierarchical taxonomies, respectively, and highlight their successful integration into operational systems. Our Hierarchical Sigmoid Attention Network (HSAN) learns to highlight important sentences of potentially long documents without sentence-level supervision by, instead, using coarse-grained supervision at the document level. HSAN improves over previous weakly supervised learning approaches across sentiment classification benchmarks and has been deployed to help inspections in health departments for the discovery of foodborne illness outbreaks. We also present TXtract, a neural network that extracts attributes for e-commerce products from thousands of diverse categories without using manually labeled data for each category, by instead considering category relationships in a hierarchical taxonomy. TXtract is a core component of Amazon’s AutoKnow, a system that collects knowledge facts for over 10K product categories, and serves such information to Amazon search and product detail pages. Second, we present architecture-agnostic machine teaching frameworks that we applied across domains, languages, and tasks. Our weakly-supervised co-training framework can train any type of text classifier using just a small number of class-indicative seed words and unlabeled data. In contrast to previous work that use seed words to initialize embedding layers, our iterative seed word distillation (ISWD) method leverages the predictive power of seed words as supervision signals and shows strong performance improvements for aspect detection in reviews across domains and languages. We further demonstrate the cross-lingual transfer abilities of our co-training approach via cross-lingual teacher-student (CLTS), a method for training document classifiers across diverse languages using labeled documents only in English and a limited budget for bilingual translations. Not all classification tasks, however, can be effectively addressed using human supervision in the form of seed words. To capture a broader variety of tasks, we present weakly-supervised self-training (ASTRA), a weakly-supervised learning framework for training a classifier using more general labeling rules in addition to labeled and unlabeled data. As a complete set of accurate rules may be hard to obtain all in one shot, we further present an interactive framework that assists human annotators by automatically suggesting candidate labeling rules. In conclusion, this thesis demonstrates the benefits of teaching machines with different types of interaction than the standard data labeling paradigm and shows promising results for new applications across domains and languages. To facilitate future research, we publish our code implementations and design new challenging benchmarks with various types of supervision. We believe that our proposed frameworks and experimental findings will influence research and will enable new applications of language technologies without the costly requirement of large manually labeled datasets. Artificial intelligence Deep learning (Machine learning) Machine learning Amazon.com (Firm)
27	Accelerating Structural Design and Optimization using Machine Learning Singh, Karanpreet 13 January 2020 (has links) Machine learning techniques promise to greatly accelerate structural design and optimization. In this thesis, deep learning and active learning techniques are applied to different non-convex structural optimization problems. Finite Element Analysis (FEA) based standard optimization methods for aircraft panels with bio-inspired curvilinear stiffeners are computationally expensive. The main reason for employing many of these standard optimization methods is the ease of their integration with FEA. However, each optimization requires multiple computationally expensive FEA evaluations, making their use impractical at times. To accelerate optimization, the use of Deep Neural Networks (DNNs) is proposed to approximate the FEA buckling response. The results show that DNNs obtained an accuracy of 95% for evaluating the buckling load. The DNN accelerated the optimization by a factor of nearly 200. The presented work demonstrates the potential of DNN-based machine learning algorithms for accelerating the optimization of bio-inspired curvilinearly stiffened panels. But, the approach could have disadvantages for being only specific to similar structural design problems, and requiring large datasets for DNNs training. An adaptive machine learning technique called active learning is used in this thesis to accelerate the evolutionary optimization of complex structures. The active learner helps the Genetic Algorithms (GA) by predicting if the possible design is going to satisfy the required constraints or not. The approach does not need a trained surrogate model prior to the optimization. The active learner adaptively improve its own accuracy during the optimization for saving the required number of FEA evaluations. The results show that the approach has the potential to reduce the total required FEA evaluations by more than 50%. Lastly, the machine learning is used to make recommendations for modeling choices while analyzing a structure using FEA. The decisions about the selection of appropriate modeling techniques are usually based on an analyst's judgement based upon their knowledge and intuition from past experience. The machine learning-based approach provides recommendations within seconds, thus, saving significant computational resources for making accurate design choices. / Doctor of Philosophy / This thesis presents an innovative application of artificial intelligence (AI) techniques for designing aircraft structures. An important objective for the aerospace industry is to design robust and fuel-efficient aerospace structures. The state of the art research in the literature shows that the structure of aircraft in future could mimic organic cellular structure. However, the design of these new panels with arbitrary structures is computationally expensive. For instance, applying standard optimization methods currently being applied to aerospace structures to design an aircraft, can take anywhere from a few days to months. The presented research demonstrates the potential of AI for accelerating the optimization of an aircraft structures. This will provide an efficient way for aircraft designers to design futuristic fuel-efficient aircraft which will have positive impact on the environment and the world. Structural Design and Optimization Finite Element Methods Parallel Processing Machine learning Deep learning (Machine learning) Active Learning
28	A General Framework for Model Adaptation to Meet Practical Constraints in Computer Vision Huang, Shiyuan January 2024 (has links) Recent advances in deep learning models have shown impressive capabilities in various computer vision tasks, which encourages the integration of these models into real-world vision systems such as smart devices. This integration presents new challenges as models need to meet complex real-world requirements. This thesis is dedicated to building practical deep learning models, where we focus on two main challenges in vision systems: data efficiency and variability. We address these issues by providing a general model adaptation framework that extends models with practical capabilities. In the first part of the thesis, we explore model adaptation approaches for efficient representation. We illustrate the benefits of different types of efficient data representations, including compressed video modalities from video codecs, low-bit features and sparsified frames and texts. By using such efficient representation, the system complexity such as data storage, processing and computation can be greatly reduced. We systematically study various methods to extract, learn and utilize these representations, presenting new methods to adapt machine learning models for them. The proposed methods include a compressed-domain video recognition model with coarse-to-fine distillation training strategy, a task-specific feature compression framework for low-bit video-and-language understanding, and a learnable token sparsification approach for sparsifying human-interpretable video inputs. We demonstrate new perspectives of representing vision data in a more practical and efficient way in various applications. The second part of the thesis focuses on open environment challenges, where we explore model adaptation for new, unseen classes and domains. We examine the practical limitations in current recognition models, and introduce various methods to empower models in addressing open recognition scenarios. This includes a negative envisioning framework for managing new classes and outliers, and a multi-domain translation approach for dealing with unseen domain data. Our study shows a promising trajectory towards models exhibiting the capability to navigate through diverse data environments in real-world applications. Computer science Deep learning (Machine learning) Computer vision--Mathematical models Machine learning--Mathematical models Video compression
29	Machine Learning Framework for Causal Modeling for Process Fault Diagnosis and Mechanistic Explanation Generation Sivaram, Abhishek January 2023 (has links) Machine learning models, typically deep learning models, often come at the cost of explainability. To generate explanations of such systems, models need to be rooted in first-principles, at least mechanistically. In this work we look at a gamete of machine learning models based on different levels of process knowledge for process fault diagnosis and generating mechanistic explanations of processes. In chapter 1, we introduce the thesis using a range of problems from causality, explainability, aiming towards the goal of generating mechanistic explanations of process systems. Chapter 2 looks at an approach for generating causal models purely through data-centric approach, with minimal process knowledge with respect to equipment connectivity and identifying causality in the domains. These causal models generated can be utilized for process fault diagnosis. Chapter 3 and chapter 4 show how deep learning models can be used for both classification for process fault diagnosis and regression. We see that depending on the hyperparameters, i.e., purely the breadth and depth of a neural network, the learned hidden representations vary from a simple set of features, to more complex sets of features. While these hidden representations may be exploited to aid in classification and regression problems, the true explanations of these representations do not correlate with mechanisms in the system of interest. There is thus a requirement to add more mechanistic information about the features generated to aid in explainability. Chapter 5 shows how incorporating process knowledge can aid in generating such mechanistic explanations based on automated variable transformations. In this chapter we show how process knowledge can be used to generate features, or model forms to generate explainable models. These models have the ability of extracting the true models of the system from the model knowledge provided. Chemical engineering Artificial intelligence Machine learning Deep learning (Machine learning)
30	Deep Learning for Spatiotemporal Nowcasting Franch, Gabriele 08 March 2021 (has links) Nowcasting – short-term forecasting using current observations – is a key challenge that human activities have to face on a daily basis. We heavily rely on short-term meteorological predictions in domains such as aviation, agriculture, mobility, and energy production. One of the most important and challenging task for meteorology is the nowcasting of extreme events, whose anticipation is highly needed to mitigate risk in terms of social or economic costs and human safety. The goal of this thesis is to contribute with new machine learning methods to improve the spatio-temporal precision of nowcasting of extreme precipitation events. This work relies on recent advances in deep learning for nowcasting, adding methods targeted at improving nowcasting using ensembles and trained on novel original data resources. Indeed, the new curated multi-year radar scan dataset (TAASRAD19) is introduced that contains more than 350.000 labelled precipitation records over 10 years, to provide a baseline benchmark, and foster reproducibility of machine learning modeling. A TrajGRU model is applied to TAASRAD19, and implemented in an operational prototype. The thesis also introduces a novel method for fast analog search based on manifold learning: the tool leverages the entire dataset history in less than 5 seconds and demonstrates the feasibility of predictive ensembles. In the final part of the thesis, the new deep learning architecture ConvSG based on stacked generalization is presented, introducing novel concepts for deep learning in precipitation nowcasting: ConvSG is specifically designed to improve predictions of extreme precipitation regimes over published methods, and shows a 117% skill improvement on extreme rain regimes over a single member. Moreover, ConvSG shows superior or equal skills compared to Lagrangian Extrapolation models for all rain rates, achieving a 49% average improvement in predictive skill over extrapolation on the higher precipitation regimes. Settore INF/01 - Informatica

Search results