• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 246
  • 36
  • 15
  • 10
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 392
  • 392
  • 392
  • 255
  • 192
  • 176
  • 119
  • 90
  • 84
  • 78
  • 72
  • 65
  • 59
  • 58
  • 55
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Regularization, Uncertainty Estimation and Out of Distribution Detection in Convolutional Neural Networks

Krothapalli, Ujwal K. 11 September 2020 (has links)
Classification is an important task in the field of machine learning and when classifiers are trained on images, a variety of problems can surface during inference. 1) Recent trends of using convolutional neural networks (CNNs) for various machine learning tasks has borne many successes and CNNs are surprisingly expressive in their learning ability due to a large number of parameters and numerous stacked layers in the CNNs. This increased model complexity also increases the risk of overfitting to the training data. Increasing the size of the training data using synthetic or artificial means (data augmentation) helps CNNs learn better by reducing the amount of over-fitting and producing a regularization effect to improve generalization of the learned model. 2) CNNs have proven to be very good classifiers and generally localize objects well; however, the loss functions typically used to train classification CNNs do not penalize inability to localize an object, nor do they take into account an object's relative size in the given image when producing confidence measures. 3) Convolutional neural networks always output in the space of the learnt classes with high confidence while predicting the class of a given image regardless of what the image consists of. For example an ImageNet-1K trained CNN can not say if the given image has no objects that it was trained on if it is provided with an image of a dinosaur (not an ImageNet category) or if the image has the main object cut out of it (context only). We approach these three different problems using bounding box information and learning to produce high entropy predictions on out of distribution classes. To address the first problem, we propose a novel regularization method called CopyPaste. The idea behind our approach is that images from the same class share similar context and can be 'mixed' together without affecting the labels. We use bounding box annotations that are available for a subset of ImageNet images. We consistently outperform the standard baseline and explore the idea of combining our approach with other recent regularization methods as well. We show consistent performance gains on PASCAL VOC07, MS-COCO and ImageNet datasets. For the second problem we employ objectness measures to learn meaningful CNN predictions. Objectness is a measure of likelihood of an object from any class being present in a given image. We present a novel approach to object localization that combines the ideas of objectness and label smoothing during training. Unlike previous methods, we compute a smoothing factor that is adaptive based on relative object size within an image. We present extensive results using ImageNet and OpenImages to demonstrate that CNNs trained using adaptive label smoothing are much less likely to be overconfident in their predictions, as compared to CNNs trained using hard targets. We train CNNs using objectness computed from bounding box annotations that are available for the ImageNet dataset and the OpenImages dataset. We perform extensive experiments with the aim of improving the ability of a classification CNN to learn better localizable features and show object detection performance improvements, calibration and classification performance on standard datasets. We also show qualitative results using class activation maps to illustrate the improvements. Lastly, we extend the second approach to train CNNs with images belonging to out of distribution and context using a uniform distribution of probability over the set of target classes for such images. This is a novel way to use uniform smooth labels as it allows the model to learn better confidence bounds. We sample 1000 classes (mutually exclusive to the 1000 classes in ImageNet-1K) from the larger ImageNet dataset comprising about 22K classes. We compare our approach with standard baselines and provide entropy and confidence plots for in distribution and out of distribution validation sets. / Doctor of Philosophy / Categorization is an important task in everyday life. Humans can perform the task of classifying objects effortlessly in pictures. Machines can also be trained to classify objects in images. With the tremendous growth in the area of artificial intelligence, machines have surpassed human performance for some tasks. However, there are plenty of challenges for artificial neural networks. Convolutional Neural Networks (CNNs) are a type of artificial neural networks. 1) Sometimes, CNNs simply memorize the samples provided during training and fail to work well with images that are slightly different from the training samples. 2) CNNs have proven to be very good classifiers and generally localize objects well; however, the objective functions typically used to train classification CNNs do not penalize inability to localize an object, nor do they take into account an object's relative size in the given image. 3) Convolutional neural networks always produce an output in the space of the learnt classes with high confidence while predicting the class of a given image regardless of what the image consists of. For example, an ImageNet-1K (a popular dataset) trained CNN can not say if the given image has no objects that it was trained on if it is provided with an image of a dinosaur (not an ImageNet category) or if the image has the main object cut out of it (images with background only). We approach these three different problems using object position information and learning to produce low confidence predictions on out of distribution classes. To address the first problem, we propose a novel regularization method called CopyPaste. The idea behind our approach is that images from the same class share similar context and can be 'mixed' together without affecting the labels. We use bounding box annotations that are available for a subset of ImageNet images. We consistently outperform the standard baseline and explore the idea of combining our approach with other recent regularization methods as well. We show consistent performance gains on PASCAL VOC07, MS-COCO and ImageNet datasets. For the second problem we employ objectness measures to learn meaningful CNN predictions. Objectness is a measure of likelihood of an object from any class being present in a given image. We present a novel approach to object localization that combines the ideas of objectness and label smoothing during training. Unlike previous methods, we compute a smoothing factor that is adaptive based on relative object size within an image. We present extensive results using ImageNet and OpenImages to demonstrate that CNNs trained using adaptive label smoothing are much less likely to be overconfident in their predictions, as compared to CNNs trained using hard targets. We train CNNs using objectness computed from bounding box annotations that are available for the ImageNet dataset and the OpenImages dataset. We perform extensive experiments with the aim of improving the ability of a classification CNN to learn better localizable features and show object detection performance improvements, calibration and classification performance on standard datasets. We also show qualitative results to illustrate the improvements. Lastly, we extend the second approach to train CNNs with images belonging to out of distribution and context using a uniform distribution of probability over the set of target classes for such images. This is a novel way to use uniform smooth labels as it allows the model to learn better confidence bounds. We sample 1000 classes (mutually exclusive to the 1000 classes in ImageNet-1K) from the larger ImageNet dataset comprising about 22K classes. We compare our approach with standard baselines on `in distribution' and `out of distribution' validation sets.
2

Deep Convolutional Neural Networks for Segmenting Unruptured Intracranial Aneurysms from 3D TOF-MRA Images

Boonaneksap, Surasith 07 February 2022 (has links)
Despite facing technical issues (e.g., overfitting, vanishing and exploding gradients), deep neural networks have the potential to capture complex patterns in data. Understanding how depth impacts neural networks performance is vital to the advancement of novel deep learning architectures. By varying hyperparameters on two sets of architectures with different depths, this thesis aims to examine if there are any potential benefits from developing deep networks for segmenting intracranial aneurysms from 3D TOF-MRA scans in the ADAM dataset. / Master of Science / With the technologies we have today, people are constantly generating data. In this pool of information, gaining insight into the data proves to be extremely valuable. Deep learning is one method that allows for automatic pattern recognition by iteratively improving the disparity between its prediction and the ground truth. Complex models can learn complex patterns, and such models introduce challenges. This thesis explores the potential benefits of deep neural networks whether they stand to gain improvement despite the challenges. The models will be trained to segment intracranial aneurysms from volumetric images.
3

Multimodal Affective Computing Using Temporal Convolutional Neural Network and Deep Convolutional Neural Networks

Ayoub, Issa 24 June 2019 (has links)
Affective computing has gained significant attention from researchers in the last decade due to the wide variety of applications that can benefit from this technology. Often, researchers describe affect using emotional dimensions such as arousal and valence. Valence refers to the spectrum of negative to positive emotions while arousal determines the level of excitement. Describing emotions through continuous dimensions (e.g. valence and arousal) allows us to encode subtle and complex affects as opposed to discrete emotions, such as the basic six emotions: happy, anger, fear, disgust, sad and neutral. Recognizing spontaneous and subtle emotions remains a challenging problem for computers. In our work, we employ two modalities of information: video and audio. Hence, we extract visual and audio features using deep neural network models. Given that emotions are time-dependent, we apply the Temporal Convolutional Neural Network (TCN) to model the variations in emotions. Additionally, we investigate an alternative model that combines a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). Given our inability to fit the latter deep model into the main memory, we divide the RNN into smaller segments and propose a scheme to back-propagate gradients across all segments. We configure the hyperparameters of all models using Gaussian processes to obtain a fair comparison between the proposed models. Our results show that TCN outperforms RNN for the recognition of the arousal and valence emotional dimensions. Therefore, we propose the adoption of TCN for emotion detection problems as a baseline method for future work. Our experimental results show that TCN outperforms all RNN based models yielding a concordance correlation coefficient of 0.7895 (vs. 0.7544) on valence and 0.8207 (vs. 0.7357) on arousal on the validation dataset of SEWA dataset for emotion prediction.
4

Situated face detection

Espinosa-Romero, Arturo January 2001 (has links)
In the last twenty years, important advances have been made in the field of automatic face processing, given the importance of human faces for personal identification, emotional expression and verbal and non verbal communication. The very first step in a face processing algorithm is the detection of faces; while this is a trivial problem in controlled environments, the detection of faces in real environments is still a challenging task. Until now, the most successful approaches for face detection represent the face as a grey-level pattern, and the problem itself is considered as the classification between "face" and "non-face" patterns. Satisfactory results have been achieved in this area. The main disadvantage is that an exhaustive search has to be done on each image in order to locate the faces. This search normally involves testing every single position on the image at different scales, and although this does not represent an important drawback in off-line face processing systems, in those cases where a real-time response is needed it is still a problem. In the different proposed methods for face detection, the "observer" is a disembodied entity, which holds no relationship with the observed scene. This thesis presents a framework for an efficient location of faces in real scenes, in which, by considering both the observer to be situated in the world, and the relationships that hold between the two, a set of constraints in the search space can be defined. The constraints rely on two main assumptions; first, the observer can purposively interact with the world (i.e. change its position relative to the observed scene) and second, the camera is fully calibrated. The first source constraint is the structural information about the observer environment, represented as a depth map of the scene in front of the camera. From this representation the search space can be constrained in terms of the range of scales where a face might be found as different positions in the image. The second source of constraint is the geometrical relationship between the camera and the scene, which allows us to project a model of the subject into the scene in order to eliminate those areas where faces are unlikely to be found. In order to test the proposed framework, a system based on the premises stated above was constructed. It is based on three different modules: a face/non-face classifier, a depth estimation module and a search module. The classifier is composed of a set of convolutional neural networks (CNN) that were trained to differentiate between face and non-face patterns, the depth estimation modules uses a multilevel algorithm to compute the scene depth map from a sequence of images captured the depth information and the subject model into the image where the search will be performed in order to constrain the search space. Finally, the proposed system was validated by running a set of experiments on the individual modules and then on the whole system.
5

Multi-scale convolutional neural networks for segmentation of pulmonary structures in computed tomography

Gerard, Sarah E. 01 December 2018 (has links)
Computed tomography (CT) is routinely used for diagnosing lung disease and developing treatment plans using images of intricate lung structure with submillimeter resolution. Automated segmentation of anatomical structures in such images is important to enable efficient processing in clinical and research settings. Convolution neural networks (ConvNets) are largely successful at performing image segmentation with the ability to learn discriminative abstract features that yield generalizable predictions. However, constraints in hardware memory do not allow deep networks to be trained with high-resolution volumetric CT images. Restricted by memory constraints, current applications of ConvNets on volumetric medical images use a subset of the full image; limiting the capacity of the network to learn informative global patterns. Local patterns, such as edges, are necessary for precise boundary localization, however, they suffer from low specificity. Global information can disambiguate structures that are locally similar. The central thesis of this doctoral work is that both local and global information is important for segmentation of anatomical structures in medical images. A novel multi-scale ConvNet is proposed that divides the learning task across multiple networks; each network learns features over different ranges of scales. It is hypothesized that multi-scale ConvNets will lead to improved segmentation performance, as no compromise needs to be made between image resolution, image extent, and network depth. Three multi-scale models were designed to specifically target segmentation of three pulmonary structures: lungs, fissures, and lobes. The proposed models were evaluated on a diverse datasets and compared to architectures that do not use both local and global features. The lung model was evaluated on humans and three animal species; the results demonstrated the multi-scale model outperformed single scale models at different resolutions. The fissure model showed superior performance compared to both a traditional Hessian filter and a standard U-Net architecture that is limited in global extent. The results demonstrated that multi-scale ConvNets improved pulmonary CT segmentation by incorporating both local and global features using multiple ConvNets within a constrained-memory system. Overall, the proposed pipeline achieved high accuracy and was robust to variations resulting from different imaging protocols, reconstruction kernels, scanners, lung volumes, and pathological alterations; demonstrating its potential for enabling high-throughput image analysis in clinical and research settings.
6

A study of semantics across different representations of language

Dharmaretnam, Dhanush 28 May 2018 (has links)
Semantics is the study of meaning and here we explore it through three major representations: brain, image and text. Researchers in the past have performed various studies to understand the similarities between semantic features across all the three representations. Distributional Semantic (DS) models or word vectors that are trained on text corpora have been widely used to study the convergence of semantic information in the human brain. Moreover, they have been incorporated into various NLP applications such as document categorization, speech to text and machine translation. Due to their widespread adoption by researchers and industry alike, it becomes imperative to test and evaluate the performance of di erent word vectors models. In this thesis, we publish the second iteration of BrainBench: a system designed to evaluate and benchmark word vectors using brain data by incorporating two new Italian brain datasets collected using fMRI and EEG technology. In the second half of the thesis, we explore semantics in Convolutional Neural Network (CNN). CNN is a computational model that is the state of the art technology for object recognition from images. However, these networks are currently considered a black-box and there is an apparent lack of understanding on why various CNN architectures perform better than the other. In this thesis, we also propose a novel method to understand CNNs by studying the semantic representation through its hierarchical layers. The convergence of semantic information in these networks is studied with the help of DS models following similar methodologies used to study semantics in the human brain. Our results provide substantial evidence that Convolutional Neural Networks do learn semantics from the images, and the features learned by the CNNs correlate to the semantics of the object in the image. Our methodology and results could potentially pave the way for improved design and debugging of CNNs. / Graduate
7

COMPARISON OF PRE-TRAINED CONVOLUTIONAL NEURAL NETWORK PERFORMANCE ON GLIOMA CLASSIFICATION

Unknown Date (has links)
Gliomas are an aggressive class of brain tumors that are associated with a better prognosis at a lower grade level. Effective differentiation and classification are imperative for early treatment. MRI scans are a popular medical imaging modality to detect and diagnosis brain tumors due to its capability to non-invasively highlight the tumor region. With the rise of deep learning, researchers have used convolution neural networks for classification purposes in this domain, specifically pre-trained networks to reduce computational costs. However, with various MRI modalities, MRI machines, and poor image scan quality cause different network structures to have different performance metrics. Each pre-trained network is designed with a different structure that allows robust results given specific problem conditions. This thesis aims to cover the gap in the literature to compare the performance of popular pre-trained networks on a controlled dataset that is different than the network trained domain. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection
8

Measuring the Functionality of Amazon Alexa and Google Home Applications

Wang, Jiamin 01 1900 (has links)
Voice Personal Assistant (VPA) is a software agent, which can interpret the user's voice commands and respond with appropriate information or action. The users can operate the VPA by voice to complete multiple tasks, such as read the message, order coffee, send an email, check the news, and so on. Although this new technique brings in interesting and useful features, they also pose new privacy and security risks. The current researches have focused on proof-of-concept attacks by pointing out the potential ways of launching the attacks, e.g., craft hidden voice commands to trigger malicious actions without noticing the user, fool the VPA to invoke the wrong applications. However, lacking a comprehensive understanding of the functionality of the skills and its commands prevents us from analyzing the potential threats of these attacks systematically. In this project, we developed convolutional neural networks with active learning and keyword-based approach to investigate the commands according to their capability (information retrieval or action injection) and sensitivity (sensitive or nonsensitive). Through these two levels of analysis, we will provide a complete view of VPA skills, and their susceptibility to the existing attacks. / M.S. / Voice Personal Assistant (VPA) is a software agent, which can interpret the users' voice commands and respond with appropriate information or action. The current popular VPAs are Amazon Alexa, Google Home, Apple Siri and Microsoft Cortana. The developers can build and publish third-party applications, called skills in Amazon Alex and actions in Google Homes on the VPA server. The users simply "talk" to the VPA devices to complete different tasks, like read the message, order coffee, send an email, check the news, and so on. Although this new technique brings in interesting and useful features, they also pose new potential security threats. Recent researches revealed that the vulnerabilities exist in the VPA ecosystems. The users can incorrectly invoke the malicious skill whose name has similar pronunciations to the user-intended skill. The inaudible voice triggers the unintended actions without noticing users. All the current researches focused on the potential ways of launching the attacks. The lack of a comprehensive understanding of the functionality of the skills and its commands prevents us from analyzing the potential consequences of these attacks systematically. In this project, we carried out an extensive analysis of third-party applications from Amazon Alexa and Google Home to characterize the attack surfaces. First, we developed a convolutional neural network with active learning framework to categorize the commands according to their capability, whether they are information retrieval or action injection commands. Second, we employed the keyword-based approach to classifying the commands into sensitive and nonsensitive classes. Through these two levels of analysis, we will provide a complete view of VPA skills' functionality, and their susceptibility to the existing attacks.
9

Image Steganography Using Deep Learning Techniques

Anthony Rene Guzman (12468519) 27 April 2022 (has links)
<p>Digital image steganography is the process of embedding information withina cover image in  a  secure,  imperceptible,  and  recoverable  way.The  three  main  methods  of  digital  image steganography  are  spatial,  transform,  and  neural  network  methods. Spatial  methods  modify  the pixel  valuesof  an  image  to  embed  information,  while  transform  methods embed  hidden information within the frequency of the image.Neural network-based methods use neural networks to perform the hiding process, which is the focus of the proposed methodology.</p> <p>This  research  explores  the  use  of  deep  convolutional  neural  networks  (CNNs)  in  digital image steganography. This work extends an existing implementation that used a two-dimensional CNN to perform the preparation, hiding, and extraction phases of the steganography process. The methodology proposed in this research, however, introduced changes into the structure of the CNN and   used   a   gain   function   based   on   several   image   similarity   metrics   to   maximize   the imperceptibility between a cover and steganographic image.</p> <p>The  performance  of  the  proposed  method  was  measuredusing  some  frequently  utilized image metrics such as structured similarity index measurement (SSIM), mean square error (MSE), and  peak  signal  to  noise  ratio  (PSNR).  The  results  showed  that  the  steganographic  images produced by the proposed methodology areimperceptible to the human eye, while still providing good  recoverability.  Comparingthe  results  of  the  proposed  methodologyto  theresults  of  theoriginalmethodologyrevealed  that  our  proposed  network  greatly  improved  over  the  base methodology in terms of SSIM andcompareswell to existing steganography methods.</p>
10

Deep Learning for Taxonomy Prediction

Ramesh, Shreyas 04 June 2019 (has links)
The last decade has seen great advances in Next-Generation Sequencing technologies, and, as a result, there has been a rise in the number of genomes sequenced each year. In 2017, there were as many as 10,000 new organisms sequenced and added into the RefSeq Database. Taxonomy prediction is a science involving the hierarchical classification of DNA fragments up to the rank species. In this research, we introduce Predicting Linked Organisms, Plinko, for short. Plinko is a fully-functioning, state-of-the-art predictive system that accurately captures DNA - Taxonomy relationships where other state-of-the-art algorithms falter. Plinko leverages multi-view convolutional neural networks and the pre-defined taxonomy tree structure to improve multi-level taxonomy prediction. In the Plinko strategy, each network takes advantage of different word usage patterns corresponding to different levels of evolutionary divergence. Plinko has the advantages of relatively low storage, GPGPU parallel training and inference, making the solution portable, and scalable with anticipated genome database growth. To the best of our knowledge, Plinko is the first to use multi-view convolutional neural networks as the core algorithm in a compositional,alignment-free approach to taxonomy prediction. / Master of Science / Taxonomy prediction is a science involving the hierarchical classification of DNA fragments up to the rank species. Given species diversity on Earth, taxonomy prediction gets challenging with (i) increasing number of species (labels) to classify and (ii) decreasing input (DNA) size. In this research, we introduce Predicting Linked Organisms, Plinko, for short. Plinko is a fully-functioning, state-of-the-art predictive system that accurately captures DNA - Taxonomy relationships where other state-of-the-art algorithms falter. Three major challenges in taxonomy prediction are (i) large dataset sizes (order of 109 sequences) (ii) large label spaces (order of 103 labels) and (iii) low resolution inputs (100 base pairs or less). Plinko leverages multi-view convolutional neural networks and the pre-defined taxonomy tree structure to improve multi-level taxonomy prediction for hard to classify sequences under the three conditions stated above. Plinko has the advantage of relatively low storage footprint, making the solution portable, and scalable with anticipated genome database growth. To the best of our knowledge, Plinko is the first to use multi-view convolutional neural networks as the core algorithm in a compositional, alignment-free approach to taxonomy prediction.

Page generated in 0.1173 seconds