Spelling suggestions: "subject:"convolutional beural networks"" "subject:"convolutional aneural networks""
71 |
Hand Detection and Pose Estimation using Convolutional Neural Networks / Handdetektering och pose-estimering med användning av faltande neuronnätKnutsson, Adam January 2015 (has links)
This thesis examines how convolutional neural networks can applied to the problem of hand detection and hand pose estimation. Two families of convolutional neural networks are trained, aimed at performing the task of classification or regression. The networks are trained on specialized data generated from publicly available datasets. The algorithms used to generate the specialized data are also disclosed. The main focus has been to investigate the different structural properties of convolutional neural networks, not building optimized hand detection, or hand pose estimation, systems. Experiments revealed, that classifier networks featuring a relatively high number of convolutions offers the highest performance on external validation data. Additionally, shallow classifier networks featuring a relatively low number of convolutions, yields a high classification accuracy on training and testing data, but a very low accuracy on the validation set. This effect uncovers one of the fundamental difficulties in building a hand detection system: The asymmetric classification problem. In further investigation, it is also remarked, that relatively shallow classifier networks probably becomes color sensitive. Furthermore, regressor networks featuring multiscale inputs typically yielded the lowest error, when tasked with computing key-point locations directly from data. It is also revealed, that color data implicitly contain more information, making it easier to compute key-point locations, especially in the image space. However, to be able to derive the color invariant features, deeper regressor networks are required. / I detta examensarbete undersöks hur faltande neuronnät kan användas för detektering av, samt skattning av pose hos, händer. Två familjer av neuronnät tränas, med syftet att utföra klassificering eller regression. Neuronnäten tränas med specialiserad data genererad ur publikt tillgängliga dataset. Algoritmerna för att generera den specialiserade datan presenteras även i sin helhet. Huvudsyftet med arbetet, har varit att undersöka neuronnätens strukturella egenskaper, samt relatera dessa till prestanda, och inte bygga ett färdigt system för handdetektering eller skattning av handpose. Experimenten visade, att neuronnät för klassificering med ett relativt stor antal faltningar ger högst prestanda på valideringsdata. Vidare, så verkar neuronnät för klassificering med relativt litet antal faltningar ge en god prestanda på träning- och testdata, men mycket dålig prestand på valideringsdata. Detta sambandet avslöjar en fundamental svårighet med att träna ett neuronnät för klassificering av händer, nämligen det kraftigt asymmetriska klassificeringsproblemet. I vidare undersökningar visar det sig också, att neuronnät för klassificering med ett relativt litet antal faltningar troligtvis enbart blir färgkänsliga. Experimenten visade också, att neuronnät för regression som använde sig av data i flera skalor gav lägst fel när de skulle beräkna positioner av handmarkörer direkt ur data. Slutligen framkom det, att färgdata, i konstrast till djupdata, implicit innehåller mer information, vilket gör det relativt sett lättare att beräkna markörer, framför allt i det tvådimensionella bildrummet. Dock, för att kunna få fram den implicita informationen, så krävs relativt djupa neuronnät.
|
72 |
Machine Learning on Acoustic Signals Applied to High-Speed Bridge Deck Defect DetectionChou, Yao 06 December 2019 (has links)
Machine learning techniques are being applied to many data-intensive problems because they can accurately provide classification of complex data using appropriate training. Often, the performance of machine learning can exceed the performance of traditional techniques because machine learning can take advantage of higher dimensionality than traditional algorithms. In this work, acoustic data sets taken using a rapid scanning technique on concrete bridge decks provided an opportunity to both apply machine learning algorithms to improve detection performance and also to investigate the ways that training of neural networks can be aided by data augmentation approaches. Early detection and repair can enhance safety and performance as well as reduce long-term maintenance costs of concrete bridges. In order to inspect for non-visible internal cracking (called delaminations) of concrete bridges, a rapid inspection method is needed. A six-channel acoustic impact-echo sounding apparatus is used to generate large acoustic data sets on concrete bridge decks at high speeds. A machine learning data processing architecture is described to accurately detect and map delaminations based on the acoustic responses. The machine learning approach achieves accurate results at speeds between 25 and 45 km/h across a bridge deck and successfully demonstrates the use of neural networks to analyze this type of acoustic data. In order to obtain excellent performance, model training generally requires large data sets. However, in many potentially interesting cases, such as bridge deck defect detection, acquiring enough data for training can be difficult. Data augmentation can be used to increase the effective size of the training data set. Acoustic signal data augmentation is demonstrated in conjunction with a machine learning model for acoustic defect detection on bridge decks. Four different augmentation methods are applied to data using two different augmentation strategies. This work demonstrates that a "goldilocks" data augmentation approach can be used to increase machine learning performance when only a limited data set is available. The major technical contributions of this work include application of machine learning to acoustic data sets relevant to bridge deck inspection, solving an important problem in the field of nondestructive evaluation, and a more generalized approach to data augmentation of limited acoustic data sets to expand the classes of acoustic problems that machine learning can successfully address.
|
73 |
COMPRESSED MOBILENET V3: AN EFFICIENT CNN FOR RESOURCE CONSTRAINED PLATFORMSKavyashree Pras Shalini Pradeep Prasad (10662020) 10 May 2021 (has links)
<p>Computer Vision is a mathematical
tool formulated to extend human vision to machines. This tool can perform
various tasks such as object classification, object tracking, motion
estimation, and image segmentation. These tasks find their use in many applications,
namely robotics, self-driving cars, augmented reality, and mobile applications.
However, opposed to the traditional technique of incorporating handcrafted
features to understand images, convolution neural networks are being used to
perform the same function. Computer vision applications widely use CNNs due to
their stellar performance in interpreting images. Over the years, there have
been numerous advancements in machine learning, particularly to CNNs. However,
the need to improve their accuracy, model size and complexity increased, making
their deployment in restricted environments a challenge. Many researchers
proposed techniques to reduce the size of CNN while still retaining its
accuracy. Few of these include network quantization, pruning, low rank, and
sparse decomposition and knowledge distillation. Some methods developed
efficient models from scratch. This thesis achieves a similar goal using design
space exploration techniques on the latest variant of MobileNets, MobileNet V3.
Using Depthwise Pointwise Depthwise (DPD) blocks, escalation in the number of
expansion filters in some layers and mish activation function MobileNet V3 is
reduced to 84.96% in size and made 0.2% more accurate. Furthermore, it is
deployed in NXP i.MX RT1060 for image classification on CIFAR-10 dataset.</p>
|
74 |
Sentiment Analysis of YouTube Public Videos based on their CommentsKvedaraite, Indre January 2021 (has links)
With the rise of social media and publicly available data, opinion mining is more accessible than ever. It is valuable for content creators, companies and advertisers to gain insights into what users think and feel. This work examines comments on YouTube videos, and builds a deep learning classifier to automatically determine their sentiment. Four Long Short-Term Memory-based models are trained and evaluated. Experiments are performed to determine which deep learning model performs with the best accuracy, recall, precision, F1 score and ROC curve on a labelled YouTube Comment dataset. The results indicate that a BiLSTM-based model has the overall best performance, with the accuracy of 89%. Furthermore, the four LSTM-based models are evaluated on an IMDB movie review dataset, achieving an average accuracy of 87%, showing that the models can predict the sentiment of different textual data. Finally, a statistical analysis is performed on the YouTube videos, revealing that videos with positive sentiment have a statistically higher number of upvotes and views. However, the number of downvotes is not significantly higher in videos with negative sentiment.
|
75 |
Efficient image based localization using machine learning techniquesElmougi, Ahmed 23 April 2021 (has links)
Localization is critical for self-awareness of any autonomous system and is an important part of the autonomous system stack which consists of many phases including sensing, perceiving, planning and control. In the sensing phase, data from on board sensors are collected, preprocessed and passed to the next phase. The perceiving phase is responsible for self awareness or localization and situational awareness which includes multi-objects detection and scene understanding. After the autonomous system is aware of where it is and what is around it, it can use this knowledge to plan for the path it can take and send control commands to pursue this path. In this proposal, we focus on the localization part of the autonomous stack using camera images. We deal with the localization problem from different perspectives including single images and videos.
Starting with the single image pose estimation, our approach is to propose systems that not only have good localization accuracy, but also have low space and time complexity. Firstly, we propose SurfCNN, a low cost indoor localization system that uses SURF descriptors instead of the original images to reduce the complexity of training convolutional neural networks (CNN) for indoor localization application. Given a single input image, the strongest SURF features descriptors are used as input to 5 convolutional layers to find its absolute position and orientation in arbitrary reference frame. The proposed system achieves comparable performance to the state of the art using only 300 features without the need for using the full image or complex neural networks architectures. Following, we propose SURF-LSTM, an extension to the idea of using SURF descriptors instead the original images. However, instead of CNN used in SurfCNN, we use long short term memory (LSTM) network which is one type of recurrent neural networks (RNN) to extract the sequential relation between SURF descriptors. Using SURF-LSTM, We only need 50 features to reach comparable or better results compared with SurfCNN that needs 300 features and other works that use full images with large neural networks.
In the following research phase, instead of using SURF descriptors as image features to reduce the training complexity, we study the effect of using features extracted from other CNN models that were pretrained on other image tasks like image classification without further training and fine tuning. To learn the pose from pretrained features, graph neural networks (GNN) are adopted to solve the single image localization problem (Pose-GNN) by using these features representations either as features of nodes in a graph (image as a node) or converted into a graph (image as a graph). The proposed models outperform the state of the art methods on indoor localization dataset and have comparable performance for outdoor scenes.
In the final stage of single image pose estimation research, we study if we can achieve good localization results without the need for training complex neural network. We propose (Linear-PoseNet) by which we can achieve similar results to the other methods based on neural networks with training a single linear regression layer on image features from pretrained ResNet50 in less than one second on CPU. Moreover, for outdoor scenes, we propose (Dense-PoseNet) that have only 3 fully connected layers trained on few minutes that reach comparable performance to other complex methods.
The second localization perspective is to find the relative poses between images in a video instead of absolute poses. We extend the idea used in SurfCNN and SURF-LSTM systems and use SURF descriptors as feature representation of the images in the video. Two systems are proposed to find the relative poses between images in the video using 3D-CNN and 2DCNN-RNN. We show that using 3D-CNN is better than using the combination of CNN-RNN for relative pose estimation. / Graduate
|
76 |
Thor: A Deep Learning Approach for Face Mask Detection to Prevent the COVID-19 PandemicSnyder, Shay E., Husari, Ghaith 10 March 2021 (has links)
With the rapid worldwide spread of Coronavirus (COVID-19 and COVID-20), wearing face masks in public becomes a necessity to mitigate the transmission of this or other pandemics. However, with the lack of on-ground automated prevention measures, depending on humans to enforce face mask-wearing policies in universities and other organizational buildings, is a very costly and time-consuming measure. Without addressing this challenge, mitigating highly airborne transmittable diseases will be impractical, and the time to react will continue to increase. Considering the high personnel traffic in buildings and the effectiveness of countermeasures, that is, detecting and offering unmasked personnel with surgical masks, our aim in this paper is to develop automated detection of unmasked personnel in public spaces in order to respond by providing a surgical mask to them to promptly remedy the situation. Our approach consists of three key components. The first component utilizes a deep learning architecture that integrates deep residual learning (ResNet-50) with Feature Pyramid Network (FPN) to detect the existence of human subjects in the videos (or video feed). The second component utilizes Multi-Task Convolutional Neural Networks (MT-CNN) to detect and extract human faces from these videos. For the third component, we construct and train a convolutional neural network classifier to detect masked and unmasked human subjects. Our techniques were implemented in a mobile robot, Thor, and evaluated using a dataset of videos collected by the robot from public spaces of an educational institute in the U.S. Our evaluation results show that Thor is very accurate achieving an F_{1} score of 87.7% with a recall of 99.2% in a variety of situations, a reasonable accuracy given the challenging dataset and the problem domain.
|
77 |
Detekcija bolesti biljaka tehnikama dubokog učenja / Plant disease detections using deep learning techniquesArsenović Marko 07 October 2020 (has links)
<p>Istraživanja predstavljena u disertaciji imala su za cilj razvoj nove metode bazirane na dubokim konvolucijskim neuoronskim mrežama u cilju detekcije bolesti biljaka na osnovu slike lista. U okviru eksperimentalnog dela rada prikazani su dosadašnji literaturno dostupni pristupi u automatskoj detekciji bolesti biljaka kao i ograničenja ovako dobijenih modela kada se koriste u prirodnim uslovima. U okviru disertacije uvedena je nova baza slika listova, trenutno najveća po broju slika u poređenju sa javno dostupnim bazama, potvrđeni su novi pristupi augmentacije bazirani na GAN arhitekturi nad slikama listova uz novi specijalizovani dvo-koračni pristup kao potencijalni odgovor na nedostatke postojećih rešenja.</p> / <p>The research presented in this thesis was aimed at developing a novel method based on deep convolutional neural networks for automated plant disease detection. Based on current available literature, specialized two-phased deep neural network method introduced in the experimental part of thesis solves the limitations of state-of-the-art plant disease detection methods and provides the possibility for a practical usage of the newly developed model. In addition, a new dataset was introduced, that has more images of leaves than other publicly available datasets, also GAN based augmentation approach on leaves images is experimentally confirmed.</p>
|
78 |
Advancing Video Compression With Error Resilience And Content AnalysisDi Chen (9234905) 13 August 2020 (has links)
<div>
<div>
<div>
<p>In this thesis, two aspects of video coding improvement are discussed, namely
error resilience and coding efficiency.
</p>
<p>With the increasing amount of videos being created and consumed, better video
compression tools are needed to provide reliable and fast transmission. Many popular
video coding standards such as VPx, H.26x achieve video compression by using spa-
tial and temporal dependencies in the source video signal. This makes the encoded
bitstream vulnerable to errors during transmission. In this thesis, we investigate an
error resilient video coding for the VP9 bitstreams using error resilience packets. An
error resilient packet consists of encoded keyframe contents and the prediction sig-
nals for each non-keyframe. Experimental results exhibit that our proposed method
is effective under typical packet loss conditions.
</p>
<p>In the second part of the thesis, we first present an automatic stillness feature
detection method for group of pictures. The encoder adaptively chooses the coding
structure for each group of pictures based on its stillness feature to optimize the
coding efficiency.
</p>
<p>Secondly, a content-based video coding method is proposed. Modern video codecs
including the newly developed AOM/AV1 utilize hybrid coding techniques to remove
spatial and temporal redundancy. However, the efficient exploitation of statistical
dependencies measured by a mean squared error (MSE) does not always produce the
best psychovisual result. One interesting approach is to only encode visually relevant
information and use a different coding method for “perceptually insignificant” regions
</p>
</div>
</div>
<div>
<div>
<p>xiv
</p>
</div>
</div>
</div>
<div>
<div>
<div>
<p>in the frame. In this thesis, we introduce a texture analyzer before encoding the input
sequences to identify detail irrelevant texture regions in the frame using convolutional
neural networks. The texture region is then reconstructed based on one set of motion
parameters. We show that for many standard test sets, the proposed method achieved
significant data rate reductions.
</p>
</div>
</div>
</div>
|
79 |
Compressed MobileNet V3: An efficient CNN for resource constrained platformsPrasad, S. P. Kavyashree 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Computer Vision is a mathematical tool formulated to extend human vision to machines.
This tool can perform various tasks such as object classification, object tracking, motion
estimation, and image segmentation. These tasks find their use in many applications, namely robotics, self-driving cars, augmented reality, and mobile applications. However, opposed to the traditional technique of incorporating handcrafted features to understand images, convolution neural networks are being used to perform the same function.
Computer vision applications widely use CNNs due to their stellar performance in interpreting images. Over the years, there have been numerous advancements in machine learning, particularly to CNNs.However, the need to improve their accuracy, model size and complexity increased, making their deployment in restricted environments a challenge.
Many researchers proposed techniques to reduce the size of CNN while still retaining
its accuracy. Few of these include network quantization, pruning, low rank, and sparse
decomposition and knowledge distillation. Some methods developed efficient models from
scratch. This thesis achieves a similar goal using design space exploration techniques on the latest variant of MobileNets, MobileNet V3. Using DPD blocks, escalation in the number of expansion filters in some layers and mish activation function MobileNet V3 is reduced to 84.96% in size and made 0.2% more accurate. Furthermore, it is deployed in NXP i.MX RT1060 for image classification on CIFAR-10 dataset.
|
80 |
Design Space Exploration of Convolutional Neural Networks for Image ClassificationShah, Prasham 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Computer vision is a domain which deals with the goal of making technology as efficient as human vision. To achieve that goal, after decades of research, researchers have developed algorithms that are able to work efficiently on resource constrained hardware like mobile or embedded devices for computer vision applications. Due to their constant efforts, such devices have become capable for tasks like Image Classification, Object Detection, Object Recognition, Semantic Segmentation, and many other applications. Autonomous systems like self-driving cars, Drones and UAVs, are being successfully developed because of these advances in AI.
Deep Learning, a part of AI, is a specific domain of Machine Learning which focuses on developing algorithms for such applications. Deep Learning deals with tasks like extracting features from raw image data, replacing pipelines of specialized models with single end-to-end models, making models usable for multiple tasks with superior performance. A major focus is on techniques to detect and extract features which provide better context for inference about an image or video stream. A deep hierarchy of rich features can be learned and automatically extracted from images, provided by the multiple deep layers of CNN models.
CNNs are the backbone of Computer Vision. The reason that CNNs are the focus of attention for deep learning models is that they were specifically designed for image data. They are complicated but very effective in extracting features from an image or a video stream. After AlexNet won the ILSVRC in 2012, there was a drastic increase in research related with CNNs. Many state-of-the-art architectures like VGG Net, GoogleNet, ResNet, Inception-v4, Inception-Resnet-v2, ShuffleNet, Xception, MobileNet, MobileNetV2, SqueezeNet, SqueezeNext and many more were introduced. The trend behind the research depicts an increase in the number of layers of CNN to make them more efficient but with that, the size of the model increased as well. This problem was fixed with the advent of new algorithms which resulted in a decrease in model size.
As a result, today we have CNN models, which are implemented on mobile devices. These mobile models are compact and have low latency, which in turn reduces the computational cost of the embedded system. This thesis resembles similar idea, it proposes two new CNN architectures, A-MnasNet and R-MnasNet, which have been derived from MnasNet by Design Space Exploration. These architectures outperform MnasNet in terms of model size and accuracy. They have been trained and tested on CIFAR-10 dataset. Furthermore, they were implemented on NXP Bluebox 2.0, an autonomous driving platform, for Image Classification.
|
Page generated in 0.1116 seconds