1 |
Deep neural networks for video classification in ecologyConway, Alexander January 2020 (has links)
Analyzing large volumes of video data is a challenging and time-consuming task. Automating this process would very valuable, especially in ecological research where massive amounts of video can be used to unlock new avenues of ecological research into the behaviour of animals in their environments. Deep Neural Networks, particularly Deep Convolutional Neural Networks, are a powerful class of models for computer vision. When combined with Recurrent Neural Networks, Deep Convolutional models can be applied to video for frame level video classification. This research studies two datasets: penguins and seals. The purpose of the research is to compare the performance of image-only CNNs, which treat each frame of a video independently, against a combined CNN-RNN approach; and to assess whether incorporating the motion information in the temporal aspect of video improves the accuracy of classifications in these two datasets. Video and image-only models offer similar out-of-sample performance on the simpler seals dataset but the video model led to moderate performance improvements on the more complex penguin action recognition dataset.
|
2 |
Automatic Microseismic Event Location Using Deep Neural NetworksYang, Yuanyuan 10 1900 (has links)
In contrast to large-scale earthquakes which are caused when energy is released as a result of rock failure along a fault, microseismic events are caused when human activities, such as mining or oil and gas production, change the stress distribution or the volume of a rockmass. During such processes, microseismic event location, which aims at estimating source locations accurately, is a vital component of observing, diagnosing and acting upon the dynamic indications in reservoir performance by tracking the fracturing properly.
Conventional methods for microseismic event location face considerable drawbacks. For example, traveltime based methods require manual labor in traveltime picking and thus suffer from the heavy workload of human interactions and manmade errors. Migration based and waveform inversion based location methods demand large computational memory and time for simulating the wavefields, especially in face of tens of thousands of microseismic events recorded.
In this thesis research, we developed an approach based on a deep CNN for the purpose of microseismic event location, which is completely automatic with no human interactions like traveltime picking and also computationally friendly due to no requirement of wavefield simulations. An example in which the network is well-trained on the synthetic data from the smooth SEAM model and tested on the true SEAM model has shown its accuracy and efficiency. Moreover, we have proved that this approach is not only feasible for the cases with a uniform receiver distribution, but also applicable to cases where the passive seismic data are acquired with an irregular spacing geometry of sensors, which makes this approach more practical in reality.
|
3 |
Check Your Other Door: Creating Backdoor Attacks in the Frequency DomainHammoud, Hasan Abed Al Kader 04 1900 (has links)
Deep Neural Networks (DNNs) are ubiquitous and span a variety of applications ranging from image classification and facial recognition to medical image analysis and real-time object detection. As DNN models become more sophisticated and complex, the computational cost of training these models becomes a burden. For this reason, outsourcing the training process has been the go-to option for many DNN users. Unfortunately, this comes at the cost of vulnerability to backdoor attacks. These attacks aim at establishing hidden backdoors in the DNN such that it performs well on clean samples but outputs a particular target label when a trigger is applied to the input. Current backdoor attacks generate triggers in the spatial domain; however, as we show in this work, it is not the only domain to exploit and one should always "check the other doors". To the best of our knowledge, this work is the first to propose a pipeline for generating a spatially dynamic (changing) and invisible (low norm) backdoor attack in the frequency domain. We show the advantages of utilizing the frequency domain for creating undetectable and powerful backdoor attacks through extensive experiments on various datasets and network architectures. Unlike most spatial domain attacks, frequency-based backdoor attacks can achieve high attack success rates with low poisoning rates and little to no drop in performance while remaining imperceptible to the human eye. Moreover, we show that the backdoored models (poisoned by our attacks) are resistant to various state-of-the-art (SOTA) defenses, and so we contribute two possible defenses that can successfully evade the attack. We conclude the work with some remarks regarding a network’s learning capacity and the capability of embedding a backdoor attack in the model.
|
4 |
Vector Quantization of Deep Convolutional Neural Networks with Learned CodebookYang, Siyuan 16 February 2022 (has links)
Deep neural networks (DNNs), particularly convolutional neural networks (CNNs), have been widely applied in the many fields, such as computer vision, natural language processing, speech recognition and etc. Although DNNs achieve dramatic accuracy improvements in these real-world tasks, they require significant amounts of resources (e.g., memory, energy, storage, bandwidth and computation resources). This limits the application of these networks on resource-constrained systems, such as mobile and edge devices. A large body of literature has been proposed to addresses this problem from the perspective of compressing DNNs while preserving their performance. In this thesis, we focus on compressing deep CNNs based on vector quantization techniques.
The first part of this thesis summarizes some basic concepts in machine learning and popular techniques on model compression, including pruning, quantization, low-rank factorization and knowledge distillation approaches. Our main interest is quantization techniques, which compress networks by reducing the precision of parameters. Full-precision weights, activations and even gradients in networks can be quantized to 16-bit floating point numbers, 8-bit integers, or even binary numbers. Despite a possible performance degradation, quantization can greatly reduce the model size while maintaining model accuracy.
In the second part of this thesis, we propose a novel vector quantization approach, which we refer to as Vector Quantization with Learned Codebook, or VQLC, for CNNs. Rather than performing scalar quantization, we choose vector quantization that can simultaneously quantize multiple weights at once. Instead of taking a pretraining/clustering approach as in most works, in VQLC, the codebook for quantization are learned together with neural network training from scratch. For the forward pass, the traditional convolutional filters are replaced by the convex combinations of a set of learnable codewords. During inference, the compressed model will be represented by a small-sized codebook and a set of indices, resulting in a significant reduction of model size while preserving the network's performance.
Lastly, we validate our approach by quantizing multiple modern CNNs on several popular image classification benchmarks and compare with state-of-the-art quantization techniques. Our experimental results show that VQLC demonstrates at least comparable and often superior
performance to the existing schemes. In particular, VQLC
demonstrates significant advantages over the existing approaches
on wide networks at the high rate of compression.
|
5 |
Neural Network Emulation for Computer Model with High Dimensional Outputs using Feature Engineering and Data AugmentationAlamari, Mohammed Barakat January 2022 (has links)
No description available.
|
6 |
Exploring Accumulated Gradient-Based Quantization and Compression for Deep Neural NetworksGaopande, Meghana Laxmidhar 29 May 2020 (has links)
The growing complexity of neural networks makes their deployment on resource-constrained embedded or mobile devices challenging. With millions of weights and biases, modern deep neural networks can be computationally intensive, with large memory, power and computational requirements. In this thesis, we devise and explore three quantization methods (post-training, in-training and combined quantization) that quantize 32-bit floating-point weights and biases to lower bit width fixed-point parameters while also achieving significant pruning, leading to model compression. We use the total accumulated absolute gradient over the training process as the indicator of importance of a parameter to the network. The most important parameters are quantized by the smallest amount. The post-training quantization method sorts and clusters the accumulated gradients of the full parameter set and subsequently assigns a bit width to each cluster. The in-training quantization method sorts and divides the accumulated gradients into two groups after each training epoch. The larger group consisting of the lowest accumulated gradients is quantized. The combined quantization method performs in-training quantization followed by post-training quantization. We assume storage of the quantized parameters using compressed sparse row format for sparse matrix storage. On LeNet-300-100 (MNIST dataset), LeNet-5 (MNIST dataset), AlexNet (CIFAR-10 dataset) and VGG-16 (CIFAR-10 dataset), post-training quantization achieves 7.62x, 10.87x, 6.39x and 12.43x compression, in-training quantization achieves 22.08x, 21.05x, 7.95x and 12.71x compression and combined quantization achieves 57.22x, 50.19x, 13.15x and 13.53x compression, respectively. Our methods quantize at the cost of accuracy, and we present our work in the light of the accuracy-compression trade-off. / Master of Science / Neural networks are being employed in many different real-world applications. By learning the complex relationship between the input data and ground-truth output data during the training process, neural networks can predict outputs on new input data obtained in real time. To do so, a typical deep neural network often needs millions of numerical parameters, stored in memory. In this research, we explore techniques for reducing the storage requirements for neural network parameters. We propose software methods that convert 32-bit neural network parameters to values that can be stored using fewer bits. Our methods also convert a majority of numerical parameters to zero. Using special storage methods that only require storage of non-zero parameters, we gain significant compression benefits. On typical benchmarks like LeNet-300-100 (MNIST dataset), LeNet-5 (MNIST dataset), AlexNet (CIFAR-10 dataset) and VGG-16 (CIFAR-10 dataset), our methods can achieve up to 57.22x, 50.19x, 13.15x and 13.53x compression respectively. Storage benefits are achieved at the cost of classification accuracy, and we present our work in the light of the accuracy-compression trade-off.
|
7 |
Learning representations for speech recognition using artificial neural networksSwietojanski, Paweł January 2016 (has links)
Learning representations is a central challenge in machine learning. For speech recognition, we are interested in learning robust representations that are stable across different acoustic environments, recording equipment and irrelevant inter– and intra– speaker variabilities. This thesis is concerned with representation learning for acoustic model adaptation to speakers and environments, construction of acoustic models in low-resource settings, and learning representations from multiple acoustic channels. The investigations are primarily focused on the hybrid approach to acoustic modelling based on hidden Markov models and artificial neural networks (ANN). The first contribution concerns acoustic model adaptation. This comprises two new adaptation transforms operating in ANN parameters space. Both operate at the level of activation functions and treat a trained ANN acoustic model as a canonical set of fixed-basis functions, from which one can later derive variants tailored to the specific distribution present in adaptation data. The first technique, termed Learning Hidden Unit Contributions (LHUC), depends on learning distribution-dependent linear combination coefficients for hidden units. This technique is then extended to altering groups of hidden units with parametric and differentiable pooling operators. We found the proposed adaptation techniques pose many desirable properties: they are relatively low-dimensional, do not overfit and can work in both a supervised and an unsupervised manner. For LHUC we also present extensions to speaker adaptive training and environment factorisation. On average, depending on the characteristics of the test set, 5-25% relative word error rate (WERR) reductions are obtained in an unsupervised two-pass adaptation setting. The second contribution concerns building acoustic models in low-resource data scenarios. In particular, we are concerned with insufficient amounts of transcribed acoustic material for estimating acoustic models in the target language – thus assuming resources like lexicons or texts to estimate language models are available. First we proposed an ANN with a structured output layer which models both context–dependent and context–independent speech units, with the context-independent predictions used at runtime to aid the prediction of context-dependent states. We also propose to perform multi-task adaptation with a structured output layer. We obtain consistent WERR reductions up to 6.4% in low-resource speaker-independent acoustic modelling. Adapting those models in a multi-task manner with LHUC decreases WERRs by an additional 13.6%, compared to 12.7% for non multi-task LHUC. We then demonstrate that one can build better acoustic models with unsupervised multi– and cross– lingual initialisation and find that pre-training is a largely language-independent. Up to 14.4% WERR reductions are observed, depending on the amount of the available transcribed acoustic data in the target language. The third contribution concerns building acoustic models from multi-channel acoustic data. For this purpose we investigate various ways of integrating and learning multi-channel representations. In particular, we investigate channel concatenation and the applicability of convolutional layers for this purpose. We propose a multi-channel convolutional layer with cross-channel pooling, which can be seen as a data-driven non-parametric auditory attention mechanism. We find that for unconstrained microphone arrays, our approach is able to match the performance of the comparable models trained on beamform-enhanced signals.
|
8 |
Deep neural networks in computer vision and biomedical image analysisXie, Weidi January 2017 (has links)
This thesis proposes different models for a variety of applications, such as semantic segmentation, in-the-wild face recognition, microscopy cell counting and detection, standardized re-orientation of 3D ultrasound fetal brain and Magnetic Resonance (MR) cardiac video segmentation. Our approach is to employ the large-scale machine learning models, in particular deep neural networks. Expert knowledge is either mathematically modelled as a differentiable hidden layer in the Artificial Neural Networks, or we tried to break the complex tasks into several small and easy-to-solve tasks. Multi-scale contextual information plays an important role in pixel-wise predic- tion, e.g. semantic segmentation. To capture the spatial contextual information, we present a new block for learning receptive field adaptively by within-layer recurrence. While interleaving with the convolutional layers, receptive fields are effectively enlarged, reaching across the entire feature map or image. The new block can be initialized as identity and inserted into any pre-trained networks, therefore taking benefit from the "pre-train and fine-tuning" paradigm. Current face recognition systems are mostly driven by the success of image classification, where the models are trained to by identity classification. We propose a multi-column deep comparator networks for face recognition. The architecture takes two sets (each contains an arbitrary number of faces) of images or frames as inputs, facial part-based (e.g. eyes, noses) representations of each set are pooled out, dynamically calibrated based on the quality of input images, and further compared with local "experts" in a pairwise way. Unlike the computer vision applications, collecting data and annotation is usually more expensive in biomedical image analysis. Therefore, the models that can be trained with fewer data and weaker annotations are of great importance. We approach the microscopy cell counting and detection based on density estimation, where only central dot annotations are needed. The proposed fully convolutional regression networks are first trained on a synthetic dataset of cell nuclei, later fine-tuned and shown to generalize to real data. In 3D fetal ultrasound neurosonography, establishing a coordinate system over the fetal brain serves as a precursor for subsequent tasks, e.g. localization of anatomical landmarks, extraction of standard clinical planes for biometric assessment of fetal growth, etc. To align brain volumes into a common reference coordinate system, we decompose the complex transformation into several simple ones, which can be easily tackled with Convolutional Neural Networks. The model is therefore designed to leverage the closely related tasks by sharing low-level features, and the task-specific predictions are then combined to reproduce the transformation matrix as the desired output. Finally, we address the problem of MR cardiac video analysis, in which we are interested in assisting clinical diagnosis based on the fine-grained segmentation. To facilitate segmentation, we present one end-to-end trainable model that achieves multi-view structure detection, alignment (standardized re-orientation), and fine- grained segmentation simultaneously. This is motivated by the fact that the CNNs in essence is not rotation equivariance or invariance, therefore, adding the pre-alignment into the end-to-end trainable pipeline can effectively decrease the complexity of segmentation for later stages of the model.
|
9 |
Deep Probabilistic Models for Camera Geo-CalibrationZhai, Menghua 01 January 2018 (has links)
The ultimate goal of image understanding is to transfer visual images into numerical or symbolic descriptions of the scene that are helpful for decision making. Knowing when, where, and in which direction a picture was taken, the task of geo-calibration makes it possible to use imagery to understand the world and how it changes in time. Current models for geo-calibration are mostly deterministic, which in many cases fails to model the inherent uncertainties when the image content is ambiguous. Furthermore, without a proper modeling of the uncertainty, subsequent processing can yield overly confident predictions. To address these limitations, we propose a probabilistic model for camera geo-calibration using deep neural networks. While our primary contribution is geo-calibration, we also show that learning to geo-calibrate a camera allows us to implicitly learn to understand the content of the scene.
|
10 |
Deep GCNs with Random Partition and Generalized AggregatorXiong, Chenxin 25 November 2020 (has links)
Graph Convolutional Networks (GCNs) draws significant attention due to its power of representation learning on graphs. Recent works developed frameworks to train deep GCNs. Such works show impressive results in tasks like point cloud classification and segmentation, and protein interaction prediction. While for large-scale graphs, doing full-batch training by GCNs is still challenging especially when GCNs go deeper. By fully analyzing a clustering-based mini-batch training algorithm ClusterGCN, we propose random partition which is a more efficient and effective method to implement mini-batch training. Besides, selecting different permutation invariance function (such as max, mean or add) for neighbors’ information aggregation will result in every different results. Therefore, we propose to alleviate it by introducing a novel Generalized Aggregation Function. In this thesis, I analyze the drawbacks caused by ClusterGCN and discuss about its limits. I further compare the performance of ClusterGCN with random partition and the final experimental results show that simple random partition outperforms ClusterGCN with very obvious advantageous for node property prediction task. For the techniques which are commonly used to make GCNs go deeper, I demonstrate a better way of applying residual connections (pre-activation) to stack more layers for GCNs. Last, I show the complete work of training deeper GCNs with generalized aggregators and display the promising results over several datasets from the Open Graph Benchmark (OGB).
|
Page generated in 0.0481 seconds