Spelling suggestions: "subject:"convolutional neural networks"" "subject:"onvolutional neural networks""
1 |
Regularization, Uncertainty Estimation and Out of Distribution Detection in Convolutional Neural NetworksKrothapalli, Ujwal K. 11 September 2020 (has links)
Classification is an important task in the field of machine learning and when classifiers are trained on images, a variety of problems can surface during inference. 1) Recent trends of using convolutional neural networks (CNNs) for various machine learning tasks has borne many successes and CNNs are surprisingly expressive in their learning ability due to a large number of parameters and numerous stacked layers in the CNNs. This increased model complexity also increases the risk of overfitting to the training data. Increasing the size of the training data using synthetic or artificial means (data augmentation) helps CNNs learn better by reducing the amount of over-fitting and producing a regularization effect to improve generalization of the learned model. 2) CNNs have proven to be very good classifiers and generally localize objects well; however, the loss functions typically used to train classification CNNs do not penalize inability to localize an object, nor do they take into account an object's relative size in the given image when producing confidence measures. 3) Convolutional neural networks always output in the space of the learnt classes with high confidence while predicting the class of a given image regardless of what the image consists of. For example an ImageNet-1K trained CNN can not say if the given image has no objects that it was trained on if it is provided with an image of a dinosaur (not an ImageNet category) or if the image has the main object cut out of it (context only). We approach these three different problems using bounding box information and learning to produce high entropy predictions on out of distribution classes.
To address the first problem, we propose a novel regularization method called CopyPaste. The idea behind our approach is that images from the same class share similar context and can be 'mixed' together without affecting the labels. We use bounding box annotations that are available for a subset of ImageNet images. We consistently outperform the standard baseline and explore the idea of combining our approach with other recent regularization methods as well. We show consistent performance gains on PASCAL VOC07, MS-COCO and ImageNet datasets.
For the second problem we employ objectness measures to learn meaningful CNN predictions. Objectness is a measure of likelihood of an object from any class being present in a given image. We present a novel approach to object localization that combines the ideas of objectness and label smoothing during training. Unlike previous methods, we compute a smoothing factor that is adaptive based on relative object size within an image.
We present extensive results using ImageNet and OpenImages to demonstrate that CNNs trained using adaptive label smoothing are much less likely to be overconfident in their predictions, as compared to CNNs trained using hard targets. We train CNNs using objectness computed from bounding box annotations that are available for the ImageNet dataset and the OpenImages dataset. We perform extensive experiments with the aim of improving the ability of a classification CNN to learn better localizable features and show object detection performance improvements, calibration and classification performance on standard datasets. We also show qualitative results using class activation maps to illustrate the improvements.
Lastly, we extend the second approach to train CNNs with images belonging to out of distribution and context using a uniform distribution of probability over the set of target classes for such images. This is a novel way to use uniform smooth labels as it allows the model to learn better confidence bounds. We sample 1000 classes (mutually exclusive to the 1000 classes in ImageNet-1K) from the larger ImageNet dataset comprising about 22K classes. We compare our approach with standard baselines and provide entropy and confidence plots for in distribution and out of distribution validation sets. / Doctor of Philosophy / Categorization is an important task in everyday life. Humans can perform the task of classifying objects effortlessly in pictures. Machines can also be trained to classify objects in images. With the tremendous growth in the area of artificial intelligence, machines have surpassed human performance for some tasks. However, there are plenty of challenges for artificial neural networks. Convolutional Neural Networks (CNNs) are a type of artificial neural networks. 1) Sometimes, CNNs simply memorize the samples provided during training and fail to work well with images that are slightly different from the training samples. 2) CNNs have proven to be very good classifiers and generally localize objects well; however, the objective functions typically used to train classification CNNs do not penalize inability to localize an object, nor do they take into account an object's relative size in the given image. 3) Convolutional neural networks always produce an output in the space of the learnt classes with high confidence while predicting the class of a given image regardless of what the image consists of. For example, an ImageNet-1K (a popular dataset) trained CNN can not say if the given image has no objects that it was trained on if it is provided with an image of a dinosaur (not an ImageNet category) or if the image has the main object cut out of it (images with background only).
We approach these three different problems using object position information and learning to produce low confidence predictions on out of distribution classes.
To address the first problem, we propose a novel regularization method called CopyPaste. The idea behind our approach is that images from the same class share similar context and can be 'mixed' together without affecting the labels. We use bounding box annotations that are available for a subset of ImageNet images. We consistently outperform the standard baseline and explore the idea of combining our approach with other recent regularization methods as well. We show consistent performance gains on PASCAL VOC07, MS-COCO and ImageNet datasets.
For the second problem we employ objectness measures to learn meaningful CNN predictions. Objectness is a measure of likelihood of an object from any class being present in a given image. We present a novel approach to object localization that combines the ideas of objectness and label smoothing during training. Unlike previous methods, we compute a smoothing factor that is adaptive based on relative object size within an image.
We present extensive results using ImageNet and OpenImages to demonstrate that CNNs trained using adaptive label smoothing are much less likely to be overconfident in their predictions, as compared to CNNs trained using hard targets. We train CNNs using objectness computed from bounding box annotations that are available for the ImageNet dataset and the OpenImages dataset. We perform extensive experiments with the aim of improving the ability of a classification CNN to learn better localizable features and show object detection performance improvements, calibration and classification performance on standard datasets. We also show qualitative results to illustrate the improvements.
Lastly, we extend the second approach to train CNNs with images belonging to out of distribution and context using a uniform distribution of probability over the set of target classes for such images. This is a novel way to use uniform smooth labels as it allows the model to learn better confidence bounds. We sample 1000 classes (mutually exclusive to the 1000 classes in ImageNet-1K) from the larger ImageNet dataset comprising about 22K classes. We compare our approach with standard baselines on `in distribution' and `out of distribution' validation sets.
|
2 |
Deep Convolutional Neural Networks for Segmenting Unruptured Intracranial Aneurysms from 3D TOF-MRA ImagesBoonaneksap, Surasith 07 February 2022 (has links)
Despite facing technical issues (e.g., overfitting, vanishing and exploding gradients), deep neural networks have the potential to capture complex patterns in data. Understanding how depth impacts neural networks performance is vital to the advancement of novel deep learning architectures. By varying hyperparameters on two sets of architectures with different depths, this thesis aims to examine if there are any potential benefits from developing deep networks for segmenting intracranial aneurysms from 3D TOF-MRA scans in the ADAM dataset. / Master of Science / With the technologies we have today, people are constantly generating data. In this pool of information, gaining insight into the data proves to be extremely valuable. Deep learning is one method that allows for automatic pattern recognition by iteratively improving the disparity between its prediction and the ground truth. Complex models can learn complex patterns, and such models introduce challenges. This thesis explores the potential benefits of deep neural networks whether they stand to gain improvement despite the challenges. The models will be trained to segment intracranial aneurysms from volumetric images.
|
3 |
Multimodal Affective Computing Using Temporal Convolutional Neural Network and Deep Convolutional Neural NetworksAyoub, Issa 24 June 2019 (has links)
Affective computing has gained significant attention from researchers in the last decade due to the wide variety of applications that can benefit from this technology. Often, researchers describe affect using emotional dimensions such as arousal and valence. Valence refers to the spectrum of negative to positive emotions while arousal determines the level of excitement. Describing emotions through continuous dimensions (e.g. valence and arousal) allows us to encode subtle and complex affects as opposed to discrete emotions, such as the basic six emotions: happy, anger, fear, disgust, sad and neutral.
Recognizing spontaneous and subtle emotions remains a challenging problem for computers. In our work, we employ two modalities of information: video and audio. Hence, we extract visual and audio features using deep neural network models. Given that emotions are time-dependent, we apply the Temporal Convolutional Neural Network (TCN) to model the variations in emotions. Additionally, we investigate an alternative model that combines a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). Given our inability to fit the latter deep model into the main memory, we divide the RNN into smaller segments and propose a scheme to back-propagate gradients across all segments. We configure the hyperparameters of all models using Gaussian processes to obtain a fair comparison between the proposed models. Our results show that TCN outperforms RNN for the recognition of the arousal and valence emotional dimensions. Therefore, we propose the adoption of TCN for emotion detection problems as a baseline method for future work. Our experimental results show that TCN outperforms all RNN based models yielding a concordance correlation coefficient of 0.7895 (vs. 0.7544) on valence and 0.8207 (vs. 0.7357) on arousal on the validation dataset of SEWA dataset for emotion prediction.
|
4 |
Situated face detectionEspinosa-Romero, Arturo January 2001 (has links)
In the last twenty years, important advances have been made in the field of automatic face processing, given the importance of human faces for personal identification, emotional expression and verbal and non verbal communication. The very first step in a face processing algorithm is the detection of faces; while this is a trivial problem in controlled environments, the detection of faces in real environments is still a challenging task. Until now, the most successful approaches for face detection represent the face as a grey-level pattern, and the problem itself is considered as the classification between "face" and "non-face" patterns. Satisfactory results have been achieved in this area. The main disadvantage is that an exhaustive search has to be done on each image in order to locate the faces. This search normally involves testing every single position on the image at different scales, and although this does not represent an important drawback in off-line face processing systems, in those cases where a real-time response is needed it is still a problem. In the different proposed methods for face detection, the "observer" is a disembodied entity, which holds no relationship with the observed scene. This thesis presents a framework for an efficient location of faces in real scenes, in which, by considering both the observer to be situated in the world, and the relationships that hold between the two, a set of constraints in the search space can be defined. The constraints rely on two main assumptions; first, the observer can purposively interact with the world (i.e. change its position relative to the observed scene) and second, the camera is fully calibrated. The first source constraint is the structural information about the observer environment, represented as a depth map of the scene in front of the camera. From this representation the search space can be constrained in terms of the range of scales where a face might be found as different positions in the image. The second source of constraint is the geometrical relationship between the camera and the scene, which allows us to project a model of the subject into the scene in order to eliminate those areas where faces are unlikely to be found. In order to test the proposed framework, a system based on the premises stated above was constructed. It is based on three different modules: a face/non-face classifier, a depth estimation module and a search module. The classifier is composed of a set of convolutional neural networks (CNN) that were trained to differentiate between face and non-face patterns, the depth estimation modules uses a multilevel algorithm to compute the scene depth map from a sequence of images captured the depth information and the subject model into the image where the search will be performed in order to constrain the search space. Finally, the proposed system was validated by running a set of experiments on the individual modules and then on the whole system.
|
5 |
Multi-scale convolutional neural networks for segmentation of pulmonary structures in computed tomographyGerard, Sarah E. 01 December 2018 (has links)
Computed tomography (CT) is routinely used for diagnosing lung disease and developing treatment plans using images of intricate lung structure with submillimeter resolution. Automated segmentation of anatomical structures in such images is important to enable efficient processing in clinical and research settings. Convolution neural networks (ConvNets) are largely successful at performing image segmentation with the ability to learn discriminative abstract features that yield generalizable predictions. However, constraints in hardware memory do not allow deep networks to be trained with high-resolution volumetric CT images. Restricted by memory constraints, current applications of ConvNets on volumetric medical images use a subset of the full image; limiting the capacity of the network to learn informative global patterns. Local patterns, such as edges, are necessary for precise boundary localization, however, they suffer from low specificity. Global information can disambiguate structures that are locally similar.
The central thesis of this doctoral work is that both local and global information is important for segmentation of anatomical structures in medical images. A novel multi-scale ConvNet is proposed that divides the learning task across multiple networks; each network learns features over different ranges of scales. It is hypothesized that multi-scale ConvNets will lead to improved segmentation performance, as no compromise needs to be made between image resolution, image extent, and network depth. Three multi-scale models were designed to specifically target segmentation of three pulmonary structures: lungs, fissures, and lobes.
The proposed models were evaluated on a diverse datasets and compared to architectures that do not use both local and global features. The lung model was evaluated on humans and three animal species; the results demonstrated the multi-scale model outperformed single scale models at different resolutions. The fissure model showed superior performance compared to both a traditional Hessian filter and a standard U-Net architecture that is limited in global extent.
The results demonstrated that multi-scale ConvNets improved pulmonary CT segmentation by incorporating both local and global features using multiple ConvNets within a constrained-memory system. Overall, the proposed pipeline achieved high accuracy and was robust to variations resulting from different imaging protocols, reconstruction kernels, scanners, lung volumes, and pathological alterations; demonstrating its potential for enabling high-throughput image analysis in clinical and research settings.
|
6 |
A study of semantics across different representations of languageDharmaretnam, Dhanush 28 May 2018 (has links)
Semantics is the study of meaning and here we explore it through three major
representations: brain, image and text. Researchers in the past have performed various
studies to understand the similarities between semantic features across all the three representations. Distributional Semantic (DS) models or word vectors that are trained on text corpora have been widely used to study the convergence of semantic information in the human brain. Moreover, they have been incorporated into various NLP applications such as document categorization, speech to text and machine translation. Due to their widespread adoption by researchers and industry alike, it becomes imperative to test and evaluate the performance of di erent word vectors models. In this thesis, we publish the second iteration of BrainBench: a system designed to evaluate and benchmark word vectors using brain data by incorporating two new Italian brain datasets collected using fMRI and EEG technology.
In the second half of the thesis, we explore semantics in Convolutional Neural Network
(CNN). CNN is a computational model that is the state of the art technology for object recognition from images. However, these networks are currently considered a black-box and there is an apparent lack of understanding on why various CNN architectures perform better than the other. In this thesis, we also propose a novel method to understand CNNs by studying the semantic representation through its hierarchical
layers. The convergence of semantic information in these networks is studied with
the help of DS models following similar methodologies used to study semantics in the
human brain. Our results provide substantial evidence that Convolutional Neural Networks do learn semantics from the images, and the features learned by the CNNs
correlate to the semantics of the object in the image. Our methodology and results
could potentially pave the way for improved design and debugging of CNNs. / Graduate
|
7 |
COMPARISON OF PRE-TRAINED CONVOLUTIONAL NEURAL NETWORK PERFORMANCE ON GLIOMA CLASSIFICATIONUnknown Date (has links)
Gliomas are an aggressive class of brain tumors that are associated with a better prognosis at a lower grade level. Effective differentiation and classification are imperative for early treatment. MRI scans are a popular medical imaging modality to detect and diagnosis brain tumors due to its capability to non-invasively highlight the tumor region. With the rise of deep learning, researchers have used convolution neural networks for classification purposes in this domain, specifically pre-trained networks to reduce computational costs. However, with various MRI modalities, MRI machines, and poor image scan quality cause different network structures to have different performance metrics. Each pre-trained network is designed with a different structure that allows robust results given specific problem conditions. This thesis aims to cover the gap in the literature to compare the performance of popular pre-trained networks on a controlled dataset that is different than the network trained domain. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection
|
8 |
Image Steganography Using Deep Learning TechniquesAnthony Rene Guzman (12468519) 27 April 2022 (has links)
<p>Digital image steganography is the process of embedding information withina cover image in a secure, imperceptible, and recoverable way.The three main methods of digital image steganography are spatial, transform, and neural network methods. Spatial methods modify the pixel valuesof an image to embed information, while transform methods embed hidden information within the frequency of the image.Neural network-based methods use neural networks to perform the hiding process, which is the focus of the proposed methodology.</p>
<p>This research explores the use of deep convolutional neural networks (CNNs) in digital image steganography. This work extends an existing implementation that used a two-dimensional CNN to perform the preparation, hiding, and extraction phases of the steganography process. The methodology proposed in this research, however, introduced changes into the structure of the CNN and used a gain function based on several image similarity metrics to maximize the imperceptibility between a cover and steganographic image.</p>
<p>The performance of the proposed method was measuredusing some frequently utilized image metrics such as structured similarity index measurement (SSIM), mean square error (MSE), and peak signal to noise ratio (PSNR). The results showed that the steganographic images produced by the proposed methodology areimperceptible to the human eye, while still providing good recoverability. Comparingthe results of the proposed methodologyto theresults of theoriginalmethodologyrevealed that our proposed network greatly improved over the base methodology in terms of SSIM andcompareswell to existing steganography methods.</p>
|
9 |
Detection of pulmonary tuberculosis using deep learning convolutional neural networksNorval, Michael John 11 1900 (has links)
If Pulmonary Tuberculosis (PTB) is detected early in a patient, the greater the chances of treating
and curing the disease. Early detection of PTB could result in an overall lower mortality rate.
Detection of PTB is achieved in many ways, for instance, by using tests like the sputum culture
test. The problem is that conducting tests like these can be a lengthy process and takes up precious
time. The best and quickest PTB detection method is viewing the chest X-Ray image (CXR) of
the patient. To make an accurate diagnosis requires a qualified professional Radiologist. Neural
Networks have been around for several years but is only now making ground-breaking
advancements in speech and image processing because of the increased processing power at our
disposal. Artificial intelligence, especially Deep Learning Convolutional Neural Networks
(DLCNN), has the potential to diagnose and detect the disease immediately. If DLCNN can be
used in conjunction with the professional medical institutions, crucial time and effort can be saved.
This project aims to determine and investigate proper methods to identify and detect Pulmonary
Tuberculosis in the patient chest X-Ray images using DLCNN. Detection accuracy and success
form a crucial part of the research. Simulations on an input dataset of infected and healthy patients
are carried out. My research consists of firstly evaluating the colour depth and image resolution of
the input images. The best resolution to use is found to be 64x64. Subsequently, a colour depth of
8 bit is found to be optimal for CXR images. Secondly, building upon the optimal resolution and
colour depth, various image pre-processing techniques are evaluated. In further simulations, the
pre-processed images with the best outcome are used. Thirdly the techniques evaluated are transfer
learning, hyperparameter adjustment and data augmentation. Of these, the best results are obtained
from data augmentation. Fourthly, a proposed hybrid approach. The hybrid method is a mixture
of CAD and DLCNN using only the lung ROI images as training data. Finally, a combination of
the proposed hybrid method, coupled with augmented data and specific hyperparameter
adjustment, is evaluated. Overall, the best result is obtained from the proposed hybrid method
combined with synthetic augmented data and specific hyperparameter adjustment. / Electrical and Mining Engineering
|
10 |
Mass Classification of Digital Mammograms Using Convolutional Neural NetworksFranklin, Elijah 04 May 2018 (has links)
This thesis explores the current deep learning (DL) approaches to computer aided diagnosis (CAD) of digital mammographic images and presents two novel designs for overcoming current obstacles endemic to the field, using convolutional neural networks (CNNs). The first method employed utilizes Bayesian statistics to perform decision level fusion from multiple images of an individual. The second method utilizes a new data pre-processing scheme to artificially expand the limited available training data and reduce model overitting.
|
Page generated in 0.1228 seconds