• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 332
  • 31
  • 18
  • 11
  • 8
  • 8
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 476
  • 242
  • 198
  • 186
  • 160
  • 136
  • 127
  • 112
  • 104
  • 102
  • 86
  • 85
  • 84
  • 81
  • 72
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
171

Convolutional Neural Networks for Named Entity Recognition in Images of Documents

van de Kerkhof, Jan January 2016 (has links)
This work researches named entity recognition (NER) with respect to images of documents with a domain-specific layout, by means of Convolutional Neural Networks (CNNs). Examples of such documents are receipts, invoices, forms and scientific papers, the latter of which are used in this work. An NER task is first performed statically, where a static number of entity classes is extracted per document. Networks based on the deep VGG-16 network are used for this task. Here, experimental evaluation shows that framing the task as a classification task, where the network classifies each bounding box coordinate separately, leads to the best network performance. Also, a multi-headed architecture is introduced, where the network has an independent fully-connected classification head per entity. VGG-16 achieves better performance with the multi-headed architecture than with its default, single-headed architecture. Additionally, it is shown that transfer learning does not improve performance of these networks. Analysis suggests that the networks trained for the static NER task learn to recognise document templates, rather than the entities themselves, and therefore do not generalize well to new, unseen templates. For a dynamic NER task, where the type and number of entity classes vary per document, experimental evaluation shows that, on large entities in the document, the Faster R-CNN object detection framework achieves comparable performance to the networks trained on the static task. Analysis suggests that Faster R-CNN generalizes better to new templates than the networks trained for the static task, as Faster R-CNN is trained on local features rather than the full document template. Finally, analysis shows that Faster R-CNN performs poorly on small entities in the image and suggestions are made to improve its performance.
172

Combining RGB and Depth Images for Robust Object Detection using Convolutional Neural Networks / Kombinera RGB- och djupbilder för robust objektdetektering med neurala faltningsnätverk

Thörnberg, Jesper January 2015 (has links)
We investigated the advantage of combining RGB images with depth data to get more robust object classifications and detections using pre-trained deep convolutional neural networks. We relied upon the raw images from publicly available datasets captured using Microsoft Kinect cameras. The raw images varied in size, and therefore required resizing to fit our network. We designed a resizing method called "bleeding edge" to avoid distorting the objects in the images. We present a novel method of interpolating the missing depth pixel values by comparing to similar RGB values. This method proved superior to the other methods tested. We showed that a simple colormap transformation of the depth image can provide close to state-of-art performance. Using our methods, we can present state-of-art performance on the Washington Object dataset and we provide some results on the Washington Scenes (V1) dataset. Specifically, for the detection, we used contours at different thresholds to find the likely object locations in the images. For the classification task we can report state-of-art results using only RGB and RGB-D images, depth data alone gave close to state-of-art results. For the detection task we found the RGB only detector to be superior to the other detectors.
173

A Deep Learning Approach to Brain Tracking of Sound

Hermansson, Oscar January 2022 (has links)
Objectives: Development of accurate auditory attention decoding (AAD) algorithms, capable of identifying the attended sound source from the speech evoked electroencephalography (EEG) responses, could lead to new solutions for hearing impaired listeners: neuro-steered hearing aids. Many of the existing AAD algorithms are either inaccurate or very slow. Therefore, there is a need to develop new EEG-based AAD methods. The first objective of this project was to investigate deep neural network (DNN) models for AAD and compare them to the state-of-the-art linear models. The second objective was to investigate whether generative adversarial networks (GANs) could be used for speech-evoked EEGdata augmentation to improve the AAD performance. Design: The proposed methods were tested in a dataset of 34 participants who performed an auditory attention task. They were instructed to attend to one of the two talkers in the front and ignore the talker on the other side and back-ground noise behind them, while high density EEG was recorded. Main Results: The linear models had an average attended vs ignored speech classification accuracy of 95.87% and 50% for ∼30 second and 8 seconds long time windows, respectively. A DNN model designed for AAD resulted in an average classification accuracy of 82.32% and 58.03% for ∼30 second and 8 seconds long time windows, respectively, when trained only on the real EEG data. The results show that GANs generated relatively realistic speech-evoked EEG signals. A DNN trained with GAN-generated data resulted in an average accuracy 90.25% for 8 seconds long time windows. On shorter trials the GAN-generated EEG data have shown to significantly improve classification performances, when compared to models only trained on real EEG data. Conclusion: The results suggest that DNN models can outperform linear models in AAD tasks, and that GAN-based EEG data augmentation can be used to further improve DNN performance. These results extend prior work and brings us closer to the use of EEG for decoding auditory attention in next-generation neuro-steered hearing aids.
174

HBONext: An Efficient Dnn for Light Edge Embedded Devices

Joshi, Sanket Ramesh 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Every year the most effective Deep learning models, CNN architectures are showcased based on their compatibility and performance on the embedded edge hardware, especially for applications like image classification. These deep learning models necessitate a significant amount of computation and memory, so they can only be used on high-performance computing systems like CPUs or GPUs. However, they often struggle to fulfill portable specifications due to resource, energy, and real-time constraints. Hardware accelerators have recently been designed to provide the computational resources that AI and machine learning tools need. These edge accelerators have high-performance hardware which helps maintain the precision needed to accomplish this mission. Furthermore, this classification dilemma that investigates channel interdependencies using either depth-wise or group-wise convolutional features, has benefited from the inclusion of Bottleneck modules. Because of its increasing use in portable applications, the classic inverted residual block, a well-known architecture technique, has gotten more recognition. This work takes it a step forward by introducing a design method for porting CNNs to lowresource embedded systems, essentially bridging the difference between deep learning models and embedded edge systems. To achieve these goals, we use closer computing strategies to reduce the computer’s computational load and memory usage while retaining excellent deployment efficiency. This thesis work introduces HBONext, a mutated version of Harmonious Bottlenecks (DHbneck) combined with a Flipped version of Inverted Residual (FIR), which outperforms the current HBONet architecture in terms of accuracy and model size miniaturization. Unlike the current definition of inverted residual, this FIR block performs identity mapping and spatial transformation at its higher dimensions. The HBO solution, on the other hand, focuses on two orthogonal dimensions: spatial (H/W) contraction-expansion and later channel (C) expansion-contraction, which are both organized in a bilaterally symmetric manner. HBONext is one of those versions that was designed specifically for embedded and mobile applications. In this research work, we also show how to use NXP Bluebox 2.0 to build a real-time HBONext image classifier. The integration of the model into this hardware has been a big hit owing to the limited model size of 3 MB. The model was trained and validated using CIFAR10 dataset, which performed exceptionally well due to its smaller size and higher accuracy. The validation accuracy of the baseline HBONet architecture is 80.97%, and the model is 22 MB in size. The proposed architecture HBONext variants, on the other hand, gave a higher validation accuracy of 89.70% and a model size of 3.00 MB measured using the number of parameters. The performance metrics of HBONext architecture and its various variants are compared in the following chapters.
175

Second-hand goods classification with CNNs : A proposal for a step towards a more sustainable fashion industry

Malmgård, Torsten January 2021 (has links)
For some time now, the fashion industry has been a big contributor to humanity's carbon emissions. If we are to become a more sustainable society and cut down on our pollution, this industry needs to be reformed. The clothes we wear must be reused to a greater extent than today. Unfortunately, a big part of the Swedish population experiences a lack of available items on the second-hand market. This paper presents a proof-of-concept application that could be a possible solution. The application scans online second-hand websites and separates composite ads into new, separate, ads. This makes it easier for potential buyers to find the items they are looking for. The application uses a web scraper written in Java combined with a convolutional neural network for classification. The CNN is a modified version of the ResNet50 model which is trained on a dataset collected from a Swedish second-hand site. At the moment the network supports 5 types of clothing with an accuracy of 86%. Tests were performed to investigate the potential of scaling up the model. These experiments were made using a 3rd party dataset called deepFashion. This dataset consists of over 800,000 images of clothes in different settings. The tests indicate that given a larger dataset the model could handle up to 31 classes with an accuracy of at least 57% and possibly as high as 76%. This evolved model did not produce any meaning full results when tested on real second-hand images since the deepFashion network mostly consists of clothes worn by models. Further research could see this application evolve into one that could sort ads on not only type, but colour, material and other properties to provide even more exhaustive labels.
176

A Deep Learning Application for Traffic Sign Recognition

Kondamari, Pramod Sai, Itha, Anudeep January 2021 (has links)
Background: Traffic Sign Recognition (TSR) is particularly useful for novice driversand self-driving cars. Driver Assistance Systems(DAS) involves automatic trafficsign recognition. Efficient classification of the traffic signs is required in DAS andunmanned vehicles for safe navigation. Convolutional Neural Networks(CNN) isknown for establishing promising results in the field of image classification, whichinspired us to employ this technique in our thesis. Computer vision is a process thatis used to understand the images and retrieve data from them. OpenCV is a Pythonlibrary used to detect traffic sign images in real-time. Objectives: This study deals with an experiment to build a CNN model which canclassify the traffic signs in real-time effectively using OpenCV. The model is builtwith low computational cost. The study also includes an experiment where variouscombinations of parameters are tuned to improve the model’s performance. Methods: The experimentation method involve building a CNN model based onmodified LeNet architecture with four convolutional layers, two max-pooling layersand two dense layers. The model is trained and tested with the German Traffic SignRecognition Benchmark (GTSRB) dataset. Parameter tuning with different combinationsof learning rate and epochs is done to improve the model’s performance.Later this model is used to classify the images introduced to the camera in real-time. Results: The graphs depicting the accuracy and loss of the model before and afterparameter tuning are presented. An experiment is done to classify the traffic signimage introduced to the camera by using the CNN model. High probability scoresare achieved during the process which is presented. Conclusions: The results show that the proposed model achieved 95% model accuracywith an optimum number of epochs, i.e., 30 and default optimum value oflearning rate, i.e., 0.001. High probabilities, i.e., above 75%, were achieved when themodel was tested using new real-time data.
177

Night Setback Identification of District Heating Substations

Gerima, Kassaye January 2021 (has links)
Energy efficiency of district heating systems is of great interest to energy stakeholders. However, it is not uncommon that district heating systems fail to achieve the expected performance due to inappropriate operations. Night setback is one control strategy, which has been proved to be not a suitable setting for well-insulated modern buildings in terms of both economic and energy efficiency. Therefore, identification of a night setback control is vital to district heating companies to smoothly manage their heat energy distribution to their customers. This study is motivated to automate this identification process. The method used in this thesis is a Convolutional Neural Network(CNN) approach using the concept of transfer learning. 133 substations in Oslo are used in this case study to design a machine learning model that can identify a substation as night setback or non-night setback series. The results show that the proposed method can classify the substations with approximately 97% accuracy and 91% F1-score. This shows that the proposed method has a high potential to be deployed and used in practice to identify a night setback control in district heating substations.
178

CLASSIFYING ANXIETY BASED ON A VOICERECORDING USING LEARNING ALGORITHMS

Sherlock, Oscar, Rönnbäck, Olle January 2022 (has links)
Anxiety is becoming more and more common, seeking help to evaluate your anxiety canfirst of all take a long time, secondly, many of the tests are self-report assessments that could cause incorrect results. It has been shown there are several voice characteristics that are affected in people with anxiety. Knowing this, we got the idea that an algorithm can be developed to classify the amount of anxiety based on a person's voice. Our goal is that the developed algorithm can be used in collaboration with today's evaluation methods to increase the validity of anxiety evaluation. The algorithm would, in our opinion, give a more objective result than self-report assessments. In this thesis we answer questions such as “Is it possible toclassify anxiety based on a speech recording?”, as well as if deep learning algorithms perform better than machine learning algorithms on such a task. To answer the research questions we compiled a data set containing samples of people speaking with a varying degree of anxiety applied to their voice. We then implemented two algorithms able to classify the samples from our data set. One of the algorithms was a machine learning algorithm (ANN) with manual feature extraction, and the other one was a deep learning model (CNN) with automatic feature extraction. The performance of the two models were compared, and it was concluded that ANN was the better algorithm. When evaluating the models a 5-fold cross validation was used with a data split of 80/20. Every fold contains 100 epochs meaning we train both the models for a total of 500 epochs. For every fold the accuracy, precision, and recall is calculated. From these metrics we have then calculated other metrics such as sensitivity and specificity to compare the models. The ANN model performed a lot better than the CNN model on every single metric that was measured: accuracy, sensitivity, precision, f1-score, recall andspecificity.
179

Applications of machine learning

Yuen, Brosnan 01 September 2020 (has links)
In this thesis, many machine learning algorithms were applied to electrocardiogram (ECG), spectral analysis, and Field Programmable Gate Arrays (FPGAs). In ECG, QRS complexes are useful for measuring the heart rate and for the segmentation of ECG signals. QRS complexes were detected using WaveletCNN Autoencoder filters and ConvLSTM detectors. The WaveletCNN Autoencoders filters the ECG signals using the wavelet filters, while the ConvLSTM detects the spatial temporal patterns of the QRS complexes. For the spectral analysis topic, the detection of chemical compounds using spectral analysis is useful for identifying unknown substances. However, spectral analysis algorithms require vast amounts of data. To solve this problem, B-spline neural networks were developed for the generation of infrared and ultraviolet/visible spectras. This allowed for the generation of large training datasets from a few experimental measurements. Graphical Processing Units (GPUs) are good for training and testing neural networks. However, using multiple GPUs together is hard because PCIe bus is not suited for scattering operations and reduce operations. FPGAs are more flexible as they can be arranged in a mesh or toroid or hypercube configuration on the PCB. These configurations provide higher data throughput and results in faster computations. A general neural network framework was written in VHDL for Xilinx FPGAs. It allows for any neural network to be trained or tested on FPGAs. / Graduate
180

Monitorovací systém laboratória založený na detekcii tváre

Gvizd, Peter January 2019 (has links)
In the last decades there has been such a fundamental development in the technologies including technologies focusing on face detection and identification supported by computer vision. Algorithm optimization has reached the point, when face detection is possible on mobile devices. At the outset, this work analy-ses common used algorithms for face detection and identification, for instance Haar features, LBP, EigenFaces and FisherFaces. Moreover, this work focuses on more up-to-date approaches of this topic, such as convolutional neural networks, or FaceNet from Google. The goal of this work is a design and its subsequent im-plementation of an automated, monitoring system designated for a lab, which is based on aforementioned algorithms. Within the design of the monitoring system, algorithms are compared with each other and their success rate and possible ap-plication in the final solution is evaluated.

Page generated in 0.0679 seconds