• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1895
  • 58
  • 57
  • 38
  • 37
  • 37
  • 20
  • 14
  • 14
  • 7
  • 4
  • 4
  • 3
  • 2
  • 1
  • Tagged with
  • 2752
  • 2752
  • 1125
  • 987
  • 849
  • 622
  • 581
  • 504
  • 497
  • 473
  • 455
  • 448
  • 421
  • 416
  • 389
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
171

On-the-fly visual category search in web-scale image collections

Chatfield, Ken January 2014 (has links)
This thesis tackles the problem of large-scale visual search for categories within large collections of images. Given a textual description of a visual category, such as 'car' or 'person', the objective is to retrieve images containing that category from the corpus quickly and accurately, and without the need for auxiliary meta-data or, crucially and in contrast to previous approaches, expensive pre-training. The general approach to identifying different visual categories within a dataset is to train classifiers over features extracted from a set of training images. The performance of such classifiers relies heavily on sufficiently discriminative image representations, and many methods have been proposed which involve the aggregating of local appearance features into rich bag-of-words encodings. We begin by conducting a comprehensive evaluation of the latest such encodings, identifying best-of-breed practices for training powerful visual models using these representations. We also contrast these methods with the latest breed of Convolutional Network (ConvNet) based features, thus developing a state-of-the-art architecture for large-scale image classification. Following this, we explore how a standard classification pipeline can be adapted for use in a real-time setting. One of the major issues, particularly with bag-of-words based methods, is the high dimensionality of the encodings, which causes ranking over large datasets to be prohibitively expensive. We therefore assess different methods for compressing such features, and further propose a novel cascade approach to ranking which both reduces ranking time and improves retrieval performance. Finally, we explore the problem of training visual models on-the-fly, making use of visual data dynamically collected from the web to train classifiers on demand. On this basis, we develop a novel GPU architecture for on-the-fly visual category search which is capable of retrieving previously unknown categories over unannonated datasets of millions of images in just a few seconds.
172

Supervised and unsupervised learning for plant and crop row detection in precision agriculture

Varshney, Varun January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / William H. Hsu / The goal of this research is to present a comparison between different clustering and segmentation techniques, both supervised and unsupervised, to detect plant and crop rows. Aerial images, taken by an Unmanned Aerial Vehicle (UAV), of a corn field at various stages of growth were acquired in RGB format through the Agronomy Department at the Kansas State University. Several segmentation and clustering approaches were applied to these images, namely K-Means clustering, Excessive Green (ExG) Index algorithm, Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and a deep learning approach based on Fully Convolutional Networks (FCN), to detect the plants present in the images. A Hough Transform (HT) approach was used to detect the orientation of the crop rows and rotate the images so that the rows became parallel to the x-axis. The result of applying different segmentation methods to the images was then used in estimating the location of crop rows in the images by using a template creation method based on Green Pixel Accumulation (GPA) that calculates the intensity profile of green pixels present in the images. Connected component analysis was then applied to find the centroids of the detected plants. Each centroid was associated with a crop row, and centroids lying outside the row templates were discarded as being weeds. A comparison between the various segmentation algorithms based on the Dice similarity index and average run-times is presented at the end of the work.
173

Composing Recommendations Using Computer Screen Images: A Deep Learning Recommender System for PC Users

Shapiro, Daniel January 2017 (has links)
A new way to train a virtual assistant with unsupervised learning is presented in this thesis. Rather than integrating with a particular set of programs and interfaces, this new approach involves shallow integration between the virtual assistant and computer through machine vision. In effect the assistant interprets the computer screen in order to produce helpful recommendations to assist the computer user. In developing this new approach, called AVRA, the following methods are described: an unsupervised learning algorithm which enables the system to watch and learn from user behavior, a method for fast filtering of the text displayed on the computer screen, a deep learning classifier used to recognize key onscreen text in the presence of OCR translation errors, and a recommendation filtering algorithm to triage the many possible action recommendations. AVRA is compared to a similar commercial state-of-the-art system, to highlight how this work adds to the state of the art. AVRA is a deep learning image processing and recommender system that can col- laborate with the computer user to accomplish various tasks. This document presents a comprehensive overview of the development and possible applications of this novel vir- tual assistant technology. It detects onscreen tasks based upon the context it perceives by analyzing successive computer screen images with neural networks. AVRA is a rec- ommender system, as it assists the user by producing action recommendations regarding onscreen tasks. In order to simplify the interaction between the user and AVRA, the system was designed to only produce action recommendations that can be accepted with a single mouse click. These action recommendations are produced without integration into each individual application executing on the computer. Furthermore, the action recommendations are personalized to the user’s interests utilizing a history of the user’s interaction.
174

Real-Time Instance and Semantic Segmentation Using Deep Learning

Kolhatkar, Dhanvin 10 June 2020 (has links)
In this thesis, we explore the use of Convolutional Neural Networks for semantic and instance segmentation, with a focus on studying the application of existing methods with cheaper neural networks. We modify a fast object detection architecture for the instance segmentation task, and study the concepts behind these modifications both in the simpler context of semantic segmentation and the more difficult context of instance segmentation. Various instance segmentation branch architectures are implemented in parallel with a box prediction branch, using its results to crop each instance's features. We negate the imprecision of the final box predictions and eliminate the need for bounding box alignment by using an enlarged bounding box for cropping. We report and study the performance, advantages, and disadvantages of each. We achieve fast speeds with all of our methods.
175

Automatic Programming Code Explanation Generation with Structured Translation Models

January 2020 (has links)
abstract: Learning programming involves a variety of complex cognitive activities, from abstract knowledge construction to structural operations, which include program design,modifying, debugging, and documenting tasks. In this work, the objective was to explore and investigate the barriers and obstacles that programming novice learners encountered and how the learners overcome them. Several lab and classroom studies were designed and conducted, the results showed that novice students had different behavior patterns compared to experienced learners, which indicates obstacles encountered. The studies also proved that proper assistance could help novices find helpful materials to read. However, novices still suffered from the lack of background knowledge and the limited cognitive load while learning, which resulted in challenges in understanding programming related materials, especially code examples. Therefore, I further proposed to use the natural language generator (NLG) to generate code explanations for educational purposes. The natural language generator is designed based on Long Short Term Memory (LSTM), a deep-learning translation model. To establish the model, a data set was collected from Amazon Mechanical Turks (AMT) recording explanations from human experts for programming code lines. To evaluate the model, a pilot study was conducted and proved that the readability of the machine generated (MG) explanation was compatible with human explanations, while its accuracy is still not ideal, especially for complicated code lines. Furthermore, a code-example based learning platform was developed to utilize the explanation generating model in programming teaching. To examine the effect of code example explanations on different learners, two lab-class experiments were conducted separately ii in a programming novices’ class and an advanced students’ class. The experiment result indicated that when learning programming concepts, the MG code explanations significantly improved the learning Predictability for novices compared to control group, and the explanations also extended the novices’ learning time by generating more material to read, which potentially lead to a better learning gain. Besides, a completed correlation model was constructed according to the experiment result to illustrate the connections between different factors and the learning effect. / Dissertation/Thesis / Doctoral Dissertation Engineering 2020
176

Limitations of Classical Tomographic Reconstructions from Restricted Measurements and Enhancing with Physically Constrained Machine Learning

January 2020 (has links)
abstract: This work is concerned with how best to reconstruct images from limited angle tomographic measurements. An introduction to tomography and to limited angle tomography will be provided and a brief overview of the many fields to which this work may contribute is given. The traditional tomographic image reconstruction approach involves Fourier domain representations. The classic Filtered Back Projection algorithm will be discussed and used for comparison throughout the work. Bayesian statistics and information entropy considerations will be described. The Maximum Entropy reconstruction method will be derived and its performance in limited angular measurement scenarios will be examined. Many new approaches become available once the reconstruction problem is placed within an algebraic form of Ax=b in which the measurement geometry and instrument response are defined as the matrix A, the measured object as the column vector x, and the resulting measurements by b. It is straightforward to invert A. However, for the limited angle measurement scenarios of interest in this work, the inversion is highly underconstrained and has an infinite number of possible solutions x consistent with the measurements b in a high dimensional space. The algebraic formulation leads to the need for high performing regularization approaches which add constraints based on prior information of what is being measured. These are constraints beyond the measurement matrix A added with the goal of selecting the best image from this vast uncertainty space. It is well established within this work that developing satisfactory regularization techniques is all but impossible except for the simplest pathological cases. There is a need to capture the "character" of the objects being measured. The novel result of this effort will be in developing a reconstruction approach that will match whatever reconstruction approach has proven best for the types of objects being measured given full angular coverage. However, when confronted with limited angle tomographic situations or early in a series of measurements, the approach will rely on a prior understanding of the "character" of the objects measured. This understanding will be learned by a parallel Deep Neural Network from examples. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2020
177

Physical layer security in emerging wireless transmission systems

Bao, Tingnan 06 July 2020 (has links)
Traditional cryptographic encryption techniques at higher layers require a certain form of information sharing between the transmitter and the legitimate user to achieve security. Besides, it also assumes that the eavesdropper has an insufficient computational capability to decrypt the ciphertext without the shared information. However, traditional cryptographic encryption techniques may be insufficient or even not suit- able in wireless communication systems. Physical layer security (PLS) can enhance the security of wireless communications by leveraging the physical nature of wireless transmission. Thus, in this thesis, we study the PLS performance in emerging wireless transmission systems. The thesis consists of two main parts. We first consider the PLS design and analysis for ground-based networks em- ploying random unitary beamforming (RUB) scheme at the transmitter. With RUB technique, the transmitter serves multiple users with pre-designed beamforming vectors, selected using limited channel state information (CSI). We study multiple-input single-output single-eavesdropper (MISOSE) transmission system, multi-user multiple-input multiple-output single-eavesdropper (MU-MIMOSE) transmission system, and massive multiple-input multiple-output multiple-eavesdropper (massive MI- MOME) transmission system. The closed-form expressions of ergodic secrecy rate and the secrecy outage probability (SOP) for these transmission scenarios are derived. Besides, the effect of artificial noise (AN) on secrecy performance of RUB-based transmission is also investigated. Numerical results are presented to illustrate the trade-off between performance and complexity of the resulting PLS design. We then investigate the PLS design and analysis for unmanned aerial vehicle (UAV)-based networks. We first study the secrecy performance of UAV-assisted relaying transmission systems in the presence of a single ground eavesdropper. We derive the closed-form expressions of ergodic secrecy rate and intercept probability. When multiple aerial and ground eavesdroppers are located in the UAV-assisted relaying transmission system, directional beamforming technique is applied to enhance the secrecy performance. Assuming the most general κ-μ shadowed fading channel, the SOP performance is obtained in the closed-form expression. Exploiting the derived expressions, we investigate the impact of different parameters on secrecy performance. Besides, we utilize a deep learning approach in UAV-based network analysis. Numerical results show that our proposed deep learning approach can predict secrecy performance with high accuracy and short running time. / Graduate
178

Deep learning for image compression / Apprentissage profond pour la compression d'image

Dumas, Thierry 07 June 2019 (has links)
Ces vingt dernières années, la quantité d’images et de vidéos transmises a augmenté significativement, ce qui est principalement lié à l’essor de Facebook et Netflix. Même si les capacités de transmission s’améliorent, ce nombre croissant d’images et de vidéos transmises exige des méthodes de compression plus efficaces. Cette thèse a pour but d’améliorer par l’apprentissage deux composants clés des standards modernes de compression d’image, à savoir la transformée et la prédiction intra. Plus précisément, des réseaux de neurones profonds sont employés car ils ont un grand pouvoir d’approximation, ce qui est nécessaire pour apprendre une approximation fidèle d’une transformée optimale (ou d’un filtre de prédiction intra optimal) appliqué à des pixels d’image. En ce qui concerne l’apprentissage d’une transformée pour la compression d’image via des réseaux de neurones, un défi est d’apprendre une transformée unique qui est efficace en termes de compromis débit-distorsion, à différents débits. C’est pourquoi deux approches sont proposées pour relever ce défi. Dans la première approche, l’architecture du réseau de neurones impose une contrainte de parcimonie sur les coefficients transformés. Le niveau de parcimonie offre un contrôle sur le taux de compression. Afin d’adapter la transformée à différents taux de compression, le niveau de parcimonie est stochastique pendant la phase d’apprentissage. Dans la deuxième approche, l’efficacité en termes de compromis débit-distorsion est obtenue en minimisant une fonction de débit-distorsion pendant la phase d’apprentissage. Pendant la phase de test, les pas de quantification sont progressivement agrandis selon un schéma afin de compresser à différents débits avec une unique transformée apprise. Concernant l’apprentissage d’un filtre de prédiction intra pour la compression d’image via des réseaux de neurones, le problème est d’obtenir un filtre appris qui s’adapte à la taille du bloc d’image à prédire, à l’information manquante dans le contexte de prédiction et au bruit de quantification variable dans ce contexte. Un ensemble de réseaux de neurones est conçu et entraîné de façon à ce que le filtre appris soit adaptatif à ces égards. / Over the last twenty years, the amount of transmitted images and videos has increased noticeably, mainly urged on by Facebook and Netflix. Even though broadcast capacities improve, this growing amount of transmitted images and videos requires increasingly efficient compression methods. This thesis aims at improving via learning two critical components of the modern image compression standards, which are the transform and the intra prediction. More precisely, deep neural networks are used for this task as they exhibit high power of approximation, which is needed for learning a reliable approximation of an optimal transform (or an optimal intra prediction filter) applied to image pixels. Regarding the learning of a transform for image compression via neural networks, a challenge is to learn an unique transform that is efficient in terms of rate-distortion while keeping this efficiency when compressing at different rates. That is why two approaches are proposed to take on this challenge. In the first approach, the neural network architecture sets a sparsity on the transform coefficients. The level of sparsity gives a direct control over the compression rate. To force the transform to adapt to different compression rates, the level of sparsity is stochastically driven during the training phase. In the second approach, the rate-distortion efficiency is obtained by minimizing a rate-distortion objective function during the training phase. During the test phase, the quantization step sizes are gradually increased according a scheduling to compress at different rates using the single learned transform. Regarding the learning of an intra prediction filter for image compression via neural networks, the issue is to obtain a learned filter that is adaptive with respect to the size of the image block to be predicted, with respect to missing information in the context of prediction, and with respect to the variable quantization noise in this context. A set of neural networks is designed and trained so that the learned prediction filter has this adaptibility.
179

Towards an Accurate ECG Biometric Authentication System with Low Acquisition Time

Arteaga Falconi, Juan Sebastian 31 January 2020 (has links)
Biometrics is the study of physical or behavioral traits that establishes the identity of a person. Forensics, physical security and cyber security are some of the main fields that use biometrics. Unlike traditional authentication systems—such as password based—biometrics cannot be lost, forgotten or shared. This is possible because biometrics establishes the identity of a person based on a physiological/behavioural characteristic rather than what the person possess or remembers. Biometrics has two modes of operation: identification and authentication. Identification finds the identity of a person among a group of persons. Authentication determines if the claimed identity of a person is truthful. Biometric person authentication is an alternative to passwords or graphical patterns. It prevents shoulder surfing attacks, i.e., people watching from a short distance. Nevertheless, biometric traits of conventional authentication techniques like fingerprints, face—and to some extend iris—are easy to capture and duplicate. This denotes a security risk for modern and future applications such as digital twins, where an attacker can copy and duplicate a biometric trait in order to spoof a biometric system. Researchers have proposed ECG as biometric authentication to solve this problem. ECG authentication conceals the biometric traits and reduces the risk of an attack by duplication of the biometric trait. However, current ECG authentication solutions require 10 or more seconds of an ECG signal in order to have accurate results. The accuracy is directly proportional to the ECG signal time-length for authentication. This is inconvenient to implement ECG authentication in an end-user product because a user cannot wait 10 or more seconds to gain access in a secure manner to their device. This thesis addresses the problem of spoofing by proposing an accurate and secure ECG biometric authentication system with relatively short ECG signal length for authentication. The system consists of an ECG acquisition from lead I (two electrodes), signal processing approaches for filtration and R-peak detection, a feature extractor and an authentication process. To evaluate this system, we developed a method to calculate the Equal Error Rate—EER—with non-normal distributed data. In the authentication process, we propose an approach based on Support Vector Machine—SVM—and achieve 4.5% EER with 4 seconds of ECG signal length for authentication. This approach opens the door for a deeper understanding of the signal and hence we enhanced it by applying a hybrid approach of Convolutional Neural Networks—CNN—combined with SVM. The purpose of this hybrid approach is to improve accuracy by automatically detect and extract features with Deep Learning—in this case CNN—and then take the output into a one-class SVM classifier—Authentication; which proved to outperform accuracy for one-class ECG classification. This hybrid approach reduces the EER to 2.84% with 4 seconds of ECG signal length for authentication. Furthermore, we investigated the combination of two different biometrics techniques and we improved the accuracy to 0.46% EER, while maintaining a short ECG signal length for authentication of 4 seconds. We fuse Fingerprint with ECG at the decision level. Decision level fusion requires information that is available from any biometric technique. Fusion at different levels—such as feature level fusion—requires information about features that are incompatible or hidden. Fingerprint minutiae are composed of information that differs from ECG peaks and valleys. Therefore fusion at the feature level is not possible unless the fusion algorithm provides a compatible conversion scheme. Proprietary biometric hardware does not provide information about the features or the algorithms; therefore, features are hidden and not accessible for feature level fusion; however, the result is always available for a decision level fusion.
180

Classification of Heart Views in Ultrasound Images

Pop, David January 2020 (has links)
In today’s society, we experience an increasing challenge to provide healthcare to everyone in need due to the increasing number of patients and the shortage of medical staff. Computers have contributed to mitigating this challenge by offloading the medical staff from some of the tasks. With the rise of deep learning, countless new possibilities have opened to help the medical staff even further. One domain where deep learning can be applied is analysis of ultrasound images. In this thesis we investigate the problem of classifying standard views of the heart in ultrasound images with the help of deep learning. We conduct mainly three experiments. First, we use NasNet mobile, InceptionV3, VGG16 and MobileNet, pre-trained on ImageNet, and finetune them to ultrasound heart images. We compare the accuracy of these networks to each other and to the baselinemodel, a CNN that was proposed in [23]. Then we assess a neural network’s capability to generalize to images from ultrasound machines that the network is not trained on. Lastly, we test how the performance of the networks degrades with decreasing amount of training data. Our first experiment shows that all networks considered in this study have very similar performance in terms of accuracy with Inception V3 being slightly better than the rest. The best performance is achieved when the whole network is finetuned to our problem instead of finetuning only apart of it, while gradually unlocking more layers for training. The generalization experiment shows that neural networks have the potential to generalize to images from ultrasound machines that they are not trained on. It also shows that having a mix of multiple ultrasound machines in the training data increases generalization performance. In our last experiment we compare the performance of the CNN proposed in [23] with MobileNet pre-trained on ImageNet and MobileNet randomly initialized. This shows that the performance of the baseline model suffers the least with decreasing amount of training data and that pre-training helps the performance drastically on smaller training datasets.

Page generated in 0.0947 seconds