211 |
Improving the Self-Consistent Field Initial Guess Using a 3D Convolutional Neural NetworkZhang, Ziang 12 April 2021 (has links)
Most ab initio simulation packages based on Density Functional Theory (DFT) use the Superposition of Atomic Densities (SAD) as a starting point of the self-consistent fi eld (SCF) iteration. However, this trial charge density without modeling atomic iterations nonlinearly may lead to a relatively slow or even failed convergence.
This thesis proposes a machine learning-based scheme to improve the initial guess. We train a 3-Dimensional Convolutional Neural Network (3D CNN) to map the SAD initial guess to the corresponding converged charge density with simple structures. We show that the 3D CNN-processed charge density reduces the number of required SCF iterations at different unit cell complexity levels.
|
212 |
COMPRESSED MOBILENET V3: AN EFFICIENT CNN FOR RESOURCE CONSTRAINED PLATFORMSKavyashree Pras Shalini Pradeep Prasad (10662020) 10 May 2021 (has links)
<p>Computer Vision is a mathematical
tool formulated to extend human vision to machines. This tool can perform
various tasks such as object classification, object tracking, motion
estimation, and image segmentation. These tasks find their use in many applications,
namely robotics, self-driving cars, augmented reality, and mobile applications.
However, opposed to the traditional technique of incorporating handcrafted
features to understand images, convolution neural networks are being used to
perform the same function. Computer vision applications widely use CNNs due to
their stellar performance in interpreting images. Over the years, there have
been numerous advancements in machine learning, particularly to CNNs. However,
the need to improve their accuracy, model size and complexity increased, making
their deployment in restricted environments a challenge. Many researchers
proposed techniques to reduce the size of CNN while still retaining its
accuracy. Few of these include network quantization, pruning, low rank, and
sparse decomposition and knowledge distillation. Some methods developed
efficient models from scratch. This thesis achieves a similar goal using design
space exploration techniques on the latest variant of MobileNets, MobileNet V3.
Using Depthwise Pointwise Depthwise (DPD) blocks, escalation in the number of
expansion filters in some layers and mish activation function MobileNet V3 is
reduced to 84.96% in size and made 0.2% more accurate. Furthermore, it is
deployed in NXP i.MX RT1060 for image classification on CIFAR-10 dataset.</p>
|
213 |
Second-hand goods classification with CNNs : A proposal for a step towards a more sustainable fashion industryMalmgård, Torsten January 2021 (has links)
For some time now, the fashion industry has been a big contributor to humanity's carbon emissions. If we are to become a more sustainable society and cut down on our pollution, this industry needs to be reformed. The clothes we wear must be reused to a greater extent than today. Unfortunately, a big part of the Swedish population experiences a lack of available items on the second-hand market. This paper presents a proof-of-concept application that could be a possible solution. The application scans online second-hand websites and separates composite ads into new, separate, ads. This makes it easier for potential buyers to find the items they are looking for. The application uses a web scraper written in Java combined with a convolutional neural network for classification. The CNN is a modified version of the ResNet50 model which is trained on a dataset collected from a Swedish second-hand site. At the moment the network supports 5 types of clothing with an accuracy of 86%. Tests were performed to investigate the potential of scaling up the model. These experiments were made using a 3rd party dataset called deepFashion. This dataset consists of over 800,000 images of clothes in different settings. The tests indicate that given a larger dataset the model could handle up to 31 classes with an accuracy of at least 57% and possibly as high as 76%. This evolved model did not produce any meaning full results when tested on real second-hand images since the deepFashion network mostly consists of clothes worn by models. Further research could see this application evolve into one that could sort ads on not only type, but colour, material and other properties to provide even more exhaustive labels.
|
214 |
Use of Thermal Imagery for Robust Moving Object DetectionBergenroth, Hannah January 2021 (has links)
This work proposes a system that utilizes both infrared and visual imagery to create a more robust object detection and classification system. The system consists of two main parts: a moving object detector and a target classifier. The first stage detects moving objects in visible and infrared spectrum using background subtraction based on Gaussian Mixture Models. Low-level fusion is performed to combine the foreground regions in the respective domain. For the second stage, a Convolutional Neural Network (CNN), pre-trained on the ImageNet dataset is used to classify the detected targets into one of the pre-defined classes; human and vehicle. The performance of the proposed object detector is evaluated using multiple video streams recorded in different areas and under various weather conditions, which form a broad basis for testing the suggested method. The accuracy of the classifier is evaluated from experimentally generated images from the moving object detection stage supplemented with publicly available CIFAR-10 and CIFAR-100 datasets. The low-level fusion method shows to be more effective than using either domain separately in terms of detection results. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p>
|
215 |
Sentiment Analysis of YouTube Public Videos based on their CommentsKvedaraite, Indre January 2021 (has links)
With the rise of social media and publicly available data, opinion mining is more accessible than ever. It is valuable for content creators, companies and advertisers to gain insights into what users think and feel. This work examines comments on YouTube videos, and builds a deep learning classifier to automatically determine their sentiment. Four Long Short-Term Memory-based models are trained and evaluated. Experiments are performed to determine which deep learning model performs with the best accuracy, recall, precision, F1 score and ROC curve on a labelled YouTube Comment dataset. The results indicate that a BiLSTM-based model has the overall best performance, with the accuracy of 89%. Furthermore, the four LSTM-based models are evaluated on an IMDB movie review dataset, achieving an average accuracy of 87%, showing that the models can predict the sentiment of different textual data. Finally, a statistical analysis is performed on the YouTube videos, revealing that videos with positive sentiment have a statistically higher number of upvotes and views. However, the number of downvotes is not significantly higher in videos with negative sentiment.
|
216 |
Upscaling of pictures using convolutional neural networksNorée Palm, Caspar, Granström, Hugo January 2021 (has links)
The task of upscaling pictures is very ill-posed since it requires the creation of novel data. Any algorithm or model trying to perform this task will have to interpolate and guess the missing pixels in the pictures. Classical algorithms usually result in blurred or pixelated interpolations, especially visible around sharp edges. The reason it could be considered a good idea to use neural networks to upscale pictures is because they can infer context when upsampling different parts of an image. In this report, a special deep learning structure called U-Net is trained on reconstructing high-resolution images from the Div2k dataset. Multiple loss functions are tested and a combination of a GAN-based loss function, simple pixel loss and also a Sobel-based edge loss was used to get the best results. The proposed model scored a PSNR score of 33.11dB compared to Lanczos 30.23dB, one of the best classical algorithms, on the validation dataset.
|
217 |
DETECTION AND SEGMENTATION OF DEFECTS IN X-RAY COMPUTED TOMOGRAPHY IMAGE SLICES OF ADDITIVELY MANUFACTURED COMPONENT USING DEEP LEARNINGAcharya, Pradip 01 June 2021 (has links)
Additive manufacturing (AM) allows building complex shapes with high accuracy. The X-ray Computed Tomography (XCT) is one of the promising non-destructive evaluation techniques for the evaluation of subsurface defects in an additively manufactured component. Automatic defect detection and segmentation methods can assist part inspection for quality control. However, automatic detection and segmentation of defects in XCT data of AM possess challenges due to contrast, size, and appearance of defects. In this research different deep learning techniques have been applied on publicly available XCT image datasets of additively manufactured cobalt chrome samples produced by the National Institute of Standards and Technology (NIST). To assist the data labeling image processing techniques were applied which are median filtering, auto local thresholding using Bernsen’s algorithm, and contour detection. A convolutional neural network (CNN) based state-of-art object algorithm YOLOv5 was applied for defect detection. Defect segmentation in XCT slices was successfully achieved applying U-Net, a CNN-based network originally developed for biomedical image segmentation. Three different variants of YOLOv5 which are YOLOv5s, YOLOv5m, and YOLOV5l were implemented in this study. YOLOv5s achieved defect detection mean average precision (mAP) of 88.45 % at an intersection over union (IoU) threshold of 0.5. And mAP of 57.78% at IoU threshold 0.5 to 0.95 using YOLOv5M was achieved. Additionally, defect detection recall of 87.65% was achieved using YOLOv5s, whereas a precision of 71.61 % was found using YOLOv5l. YOLOv5 and U-Net show promising results for defect detection and segmentation respectively. Thus, it is found that deep learning techniques can improve the automatic defect detection and segmentation in XCT data of AM.
|
218 |
A Deep Learning Application for Traffic Sign RecognitionKondamari, Pramod Sai, Itha, Anudeep January 2021 (has links)
Background: Traffic Sign Recognition (TSR) is particularly useful for novice driversand self-driving cars. Driver Assistance Systems(DAS) involves automatic trafficsign recognition. Efficient classification of the traffic signs is required in DAS andunmanned vehicles for safe navigation. Convolutional Neural Networks(CNN) isknown for establishing promising results in the field of image classification, whichinspired us to employ this technique in our thesis. Computer vision is a process thatis used to understand the images and retrieve data from them. OpenCV is a Pythonlibrary used to detect traffic sign images in real-time. Objectives: This study deals with an experiment to build a CNN model which canclassify the traffic signs in real-time effectively using OpenCV. The model is builtwith low computational cost. The study also includes an experiment where variouscombinations of parameters are tuned to improve the model’s performance. Methods: The experimentation method involve building a CNN model based onmodified LeNet architecture with four convolutional layers, two max-pooling layersand two dense layers. The model is trained and tested with the German Traffic SignRecognition Benchmark (GTSRB) dataset. Parameter tuning with different combinationsof learning rate and epochs is done to improve the model’s performance.Later this model is used to classify the images introduced to the camera in real-time. Results: The graphs depicting the accuracy and loss of the model before and afterparameter tuning are presented. An experiment is done to classify the traffic signimage introduced to the camera by using the CNN model. High probability scoresare achieved during the process which is presented. Conclusions: The results show that the proposed model achieved 95% model accuracywith an optimum number of epochs, i.e., 30 and default optimum value oflearning rate, i.e., 0.001. High probabilities, i.e., above 75%, were achieved when themodel was tested using new real-time data.
|
219 |
Adversarial Example Transferabilty to Quantized ModelsKratzert, Ludvig January 2021 (has links)
Deep learning has proven to be a major leap in machine learning, allowing completely new problems to be solved. While flexible and powerful, neural networks have the disadvantage of being large and demanding high performance from the devices on which they are run. In order to deploy neural networks on more, and simpler, devices, techniques such as quantization, sparsification and tensor decomposition have been developed. These techniques have shown promising results, but their effects on model robustness against attacks remain largely unexplored. In this thesis, Universal Adversarial Perturbations (UAP) and the Fast Gradient Sign Method (FGSM) are tested against VGG-19 as well as versions of it compressed using 8-bit quantization, TensorFlows float16 quantization, and 8-bit and 4-bit single layer quantization as introduced in this thesis. The results show that UAP transfers well to all quantized models, while the transferability of FGSM is high to the float16 quantized model, lower to the 8-bit models, and high to the 4-bit SLQ model. We suggest that this disparity arises from the universal adversarial perturbations having been trained on multiple examples rather than just one, which has previously been shown to increase transferability. The results also show that quantizing a single layer, the first layer in this case, can have a disproportionate impact on transferability. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p>
|
220 |
Efficient image based localization using machine learning techniquesElmougi, Ahmed 23 April 2021 (has links)
Localization is critical for self-awareness of any autonomous system and is an important part of the autonomous system stack which consists of many phases including sensing, perceiving, planning and control. In the sensing phase, data from on board sensors are collected, preprocessed and passed to the next phase. The perceiving phase is responsible for self awareness or localization and situational awareness which includes multi-objects detection and scene understanding. After the autonomous system is aware of where it is and what is around it, it can use this knowledge to plan for the path it can take and send control commands to pursue this path. In this proposal, we focus on the localization part of the autonomous stack using camera images. We deal with the localization problem from different perspectives including single images and videos.
Starting with the single image pose estimation, our approach is to propose systems that not only have good localization accuracy, but also have low space and time complexity. Firstly, we propose SurfCNN, a low cost indoor localization system that uses SURF descriptors instead of the original images to reduce the complexity of training convolutional neural networks (CNN) for indoor localization application. Given a single input image, the strongest SURF features descriptors are used as input to 5 convolutional layers to find its absolute position and orientation in arbitrary reference frame. The proposed system achieves comparable performance to the state of the art using only 300 features without the need for using the full image or complex neural networks architectures. Following, we propose SURF-LSTM, an extension to the idea of using SURF descriptors instead the original images. However, instead of CNN used in SurfCNN, we use long short term memory (LSTM) network which is one type of recurrent neural networks (RNN) to extract the sequential relation between SURF descriptors. Using SURF-LSTM, We only need 50 features to reach comparable or better results compared with SurfCNN that needs 300 features and other works that use full images with large neural networks.
In the following research phase, instead of using SURF descriptors as image features to reduce the training complexity, we study the effect of using features extracted from other CNN models that were pretrained on other image tasks like image classification without further training and fine tuning. To learn the pose from pretrained features, graph neural networks (GNN) are adopted to solve the single image localization problem (Pose-GNN) by using these features representations either as features of nodes in a graph (image as a node) or converted into a graph (image as a graph). The proposed models outperform the state of the art methods on indoor localization dataset and have comparable performance for outdoor scenes.
In the final stage of single image pose estimation research, we study if we can achieve good localization results without the need for training complex neural network. We propose (Linear-PoseNet) by which we can achieve similar results to the other methods based on neural networks with training a single linear regression layer on image features from pretrained ResNet50 in less than one second on CPU. Moreover, for outdoor scenes, we propose (Dense-PoseNet) that have only 3 fully connected layers trained on few minutes that reach comparable performance to other complex methods.
The second localization perspective is to find the relative poses between images in a video instead of absolute poses. We extend the idea used in SurfCNN and SURF-LSTM systems and use SURF descriptors as feature representation of the images in the video. Two systems are proposed to find the relative poses between images in the video using 3D-CNN and 2DCNN-RNN. We show that using 3D-CNN is better than using the combination of CNN-RNN for relative pose estimation. / Graduate
|
Page generated in 0.0275 seconds