51 |
Robust Visual Recognition Using Multilayer Generative Neural NetworksTang, Yichuan January 2010 (has links)
Deep generative neural networks such as the Deep Belief Network and Deep Boltzmann Machines have been used successfully to model high dimensional visual data. However, they are not robust to common variations such as occlusion and random noise. In this thesis, we explore two strategies for improving the robustness of DBNs. First, we show that a DBN with sparse connections in the first layer is more robust to variations that are not in the training set. Second, we develop a probabilistic denoising algorithm to determine a subset of the hidden layer nodes to unclamp. We show that this can be applied to any feedforward network classifier with localized first layer connections. By utilizing the already available generative model for denoising prior to recognition, we show significantly better performance over the standard DBN implementations for various sources of noise on the standard and Variations MNIST databases.
|
52 |
Robust Visual Recognition Using Multilayer Generative Neural NetworksTang, Yichuan January 2010 (has links)
Deep generative neural networks such as the Deep Belief Network and Deep Boltzmann Machines have been used successfully to model high dimensional visual data. However, they are not robust to common variations such as occlusion and random noise. In this thesis, we explore two strategies for improving the robustness of DBNs. First, we show that a DBN with sparse connections in the first layer is more robust to variations that are not in the training set. Second, we develop a probabilistic denoising algorithm to determine a subset of the hidden layer nodes to unclamp. We show that this can be applied to any feedforward network classifier with localized first layer connections. By utilizing the already available generative model for denoising prior to recognition, we show significantly better performance over the standard DBN implementations for various sources of noise on the standard and Variations MNIST databases.
|
53 |
Encouraging deep learning in a blended environment: A study of instructional design approacesGuay, Carol 23 August 2013 (has links)
This qualitative research study seeks to answer the question: Which instructional design approaches for blended learning encourage deep learning? This grounded theory research captures the lived experiences of instructional designers and faculty members in converting courses at the post-secondary level from traditional, face-to-face delivery to blended delivery using educational technology. Study results provide insight into the complexities involved in the design and development of blended delivery courses and shed light on the complications that can arise with course conversion. The study also opens a window into design approaches to foster deep learning, clarifying the importance of targeting high levels of learning in the course syllabus / outline, and then aligning every part of the course to the specific learning outcomes identified. Study results culminate in a set of recommended instructional design approaches that foster deep learning in a blended learning environment. / 2013-08
|
54 |
Face recognition enhancement through the use of depth maps and deep learningSaleh, Yaser January 2017 (has links)
Face recognition, although being a popular area of research for over a decade has still many open research challenges. Some of these challenges include the recognition of poorly illuminated faces, recognition under pose variations and also the challenge of capturing sufficient training data to enable recognition under pose/viewpoint changes. With the appearance of cheap and effective multimodal image capture hardware, such as the Microsoft Kinect device, new possibilities of research have been uncovered. One opportunity is to explore the potential use of the depth maps generated by the Kinect as an additional data source to recognize human faces under low levels of scene illumination, and to generate new images through creating a 3D model using the depth maps and visible-spectrum/RGB images that can then be used to enhance face recognition accuracy by improving the training phase of a classification task. With the goal of enhancing face recognition, this research first investigated how depth maps, since not affected by illumination, can improve face recognition, if algorithms traditionally used in face recognition were used. To this effect a number of popular benchmark face recognition algorithms are tested. It is proved that algorithms based on LBP and Eigenfaces are able to provide high level of accuracy in face recognition due to the significantly high resolution of the depth map images generated by the latest version of the Kinect device. To complement this work a novel algorithm named the Dense Feature Detector is presented and is proven to be effective in face recognition using depth map images, in particular under wellilluminated conditions. Another technique that was presented for the goal of enhancing face recognition is to be able to reconstruct face images in different angles, through the use of the data of one frontal RGB image and the corresponding depth map captured by the Kinect, using faster and effective 3D object reconstruction technique. Using the Overfeat network based on Convolutional Neural Networks for feature extraction and a SVM for classification it is shown that a technically unlimited number of multiple views can be created from the proposed 3D model that consists features of the face if captured real at similar angles. Thus these images can be used as real training images, thus removing the need to capture many examples of a facial image from different viewpoints for the training of the image classifier. Thus the proposed 3D model will save significant amount of time and effort in capturing sufficient training data that is essential in recognition of the human face under variations of pose/viewpoint. The thesis argues that the same approach can also be used as a novel approach to face recognition, which promises significantly high levels of face recognition accuracy base on depth images. Finally following the recent trends in replacing traditional face recognition algorithms with the effective use of deep learning networks, the thesis investigates the use of four popular networks, VGG-16, VGG-19, VGG-S and GoogLeNet in depth maps based face recognition and proposes the effective use of Transfer Learning to enhance the performance of such Deep Learning networks.
|
55 |
Structured deep neural networks for speech recognitionWu, Chunyang January 2018 (has links)
Deep neural networks (DNNs) and deep learning approaches yield state-of-the-art performance in a range of machine learning tasks, including automatic speech recognition. The multi-layer transformations and activation functions in DNNs, or related network variations, allow complex and difficult data to be well modelled. However, the highly distributed representations associated with these models make it hard to interpret the parameters. The whole neural network is commonly treated a ``black box''. The behaviours of activation functions and the meanings of network parameters are rarely controlled in the standard DNN training. Though a sensible performance can be achieved, the lack of interpretations to network structures and parameters causes better regularisation and adaptation on DNN models challenging. In regularisation, parameters have to be regularised universally and indiscriminately. For instance, the widely used L2 regularisation encourages all parameters to be zeros. In adaptation, it requires to re-estimate a large number of independent parameters. Adaptation schemes in this framework cannot be effectively performed when there are limited adaptation data. This thesis investigates structured deep neural networks. Special structures are explicitly designed, and they are imposed with desired interpretation to improve DNN regularisation and adaptation. For regularisation, parameters can be separately regularised based on their functions. For adaptation, parameters can be adapted in groups or partially adapted according to their roles in the network topology. Three forms of structured DNNs are proposed in this thesis. The contributions of these models are presented as follows. The first contribution of this thesis is the multi-basis adaptive neural network. This form of structured DNN introduces a set of parallel sub-networks with restricted connections. The design of restricted connectivity allows different aspects of data to be explicitly learned. Sub-network outputs are then combined, and this combination module is used as the speaker-dependent structure that can be robustly estimated for adaptation. The second contribution of this thesis is the stimulated deep neural network. This form of structured DNN relates and smooths activation functions in regions of the network. It aids the visualisation and interpretation of DNN models but also has the potential to reduce over-fitting. Novel adaptation schemes can be performed on it, taking advantages of the smooth property that the stimulated DNN offer. The third contribution of this thesis is the deep activation mixture model. Also, this form of structured DNN encourages the outputs of activation functions to achieve a smooth surface. The output of one hidden layer is explicitly modelled as the sum of a mixture model and a residual model. The mixture model forms an activation contour, and the residual model depicts fluctuations around this contour. The smoothness yielded by a mixture model helps to regularise the overall model and allows novel adaptation schemes.
|
56 |
Connectionist multivariate density-estimation and its application to speech synthesisUria, Benigno January 2016 (has links)
Autoregressive models factorize a multivariate joint probability distribution into a product of one-dimensional conditional distributions. The variables are assigned an ordering, and the conditional distribution of each variable modelled using all variables preceding it in that ordering as predictors. Calculating normalized probabilities and sampling has polynomial computational complexity under autoregressive models. Moreover, binary autoregressive models based on neural networks obtain statistical performances similar to that of some intractable models, like restricted Boltzmann machines, on several datasets. The use of autoregressive probability density estimators based on neural networks to model real-valued data, while proposed before, has never been properly investigated and reported. In this thesis we extend the formulation of neural autoregressive distribution estimators (NADE) to real-valued data; a model we call the real-valued neural autoregressive density estimator (RNADE). Its statistical performance on several datasets, including visual and auditory data, is reported and compared to that of other models. RNADE obtained higher test likelihoods than other tractable models, while retaining all the attractive computational properties of autoregressive models. However, autoregressive models are limited by the ordering of the variables inherent to their formulation. Marginalization and imputation tasks can only be solved analytically if the missing variables are at the end of the ordering. We present a new training technique that obtains a set of parameters that can be used for any ordering of the variables. By choosing a model with a convenient ordering of the dimensions at test time, it is possible to solve any marginalization and imputation tasks analytically. The same training procedure also makes it practical to train NADEs and RNADEs with several hidden layers. The resulting deep and tractable models display higher test likelihoods than the equivalent one-hidden-layer models for all the datasets tested. Ensembles of NADEs or RNADEs can be created inexpensively by combining models that share their parameters but differ in the ordering of the variables. These ensembles of autoregressive models obtain state-of-the-art statistical performances for several datasets. Finally, we demonstrate the application of RNADE to speech synthesis, and confirm that capturing the phone-conditional dependencies of acoustic features improves the quality of synthetic speech. Our model generates synthetic speech that was judged by naive listeners as being of higher quality than that generated by mixture density networks, which are considered a state-of-the-art synthesis technique.
|
57 |
Accessible Retail Shopping For The Visually Impaired Using Deep LearningJanuary 2020 (has links)
abstract: Over the past decade, advancements in neural networks have been instrumental in achieving remarkable breakthroughs in the field of computer vision. One of the applications is in creating assistive technology to improve the lives of visually impaired people by making the world around them more accessible. A lot of research in convolutional neural networks has led to human-level performance in different vision tasks including image classification, object detection, instance segmentation, semantic segmentation, panoptic segmentation and scene text recognition. All the before mentioned tasks, individually or in combination, have been used to create assistive technologies to improve accessibility for the blind.
This dissertation outlines various applications to improve accessibility and independence for visually impaired people during shopping by helping them identify products in retail stores. The dissertation includes the following contributions; (i) A dataset containing images of breakfast-cereal products and a classifier using a deep neural (ResNet) network; (ii) A dataset for training a text detection and scene-text recognition model; (iii) A model for text detection and scene-text recognition to identify product images using a user-controlled camera; (iv) A dataset of twenty thousand products with product information and related images that can be used to train and test a system designed to identify products. / Dissertation/Thesis / Masters Thesis Computer Science 2020
|
58 |
THE APPLICATION OF CONVOLUTIONAL NEURAL NETWORKS TO CLASSIFY PAINT DEFECTSHoumadi, Sherri F 01 May 2020 (has links)
AN ABSTRACT OF THE DISSERTATION OFSherri Houmadi, for the Doctor of Philosophy degree in Engineering Science, presented on March 27, 2020, at Southern Illinois University Carbondale. TITLE: THE APPLICATION OF CONVOLUTIONAL NEURAL NETWORKS TO CLASSIFY PAINT DEFECTSMAJOR PROFESSOR: Dr. Julie DunstonDespite all of the technological advancements in computer vision, many companies still utilize human visual inspection to determine whether parts are good or bad. It is particularly challenging for humans to inspect parts in a fast-moving manufacturing environment. Such is the case at Aisin Manufacturing Illinois where this study will be testing the use of convolutional neural networks (CNNs) to classify paint defects on painted outside door handles and caps for automobiles. Widespread implementation of vision systems has resulted in advancements in machine learning. As the field of artificial intelligence (AI) evolves and improvement are made, diverse industries are adopting AI models for use in their applications. Medical imaging classification using neural networks has exploded in recent years. Convolutional neural networks have proven to scale very well for image classification models by extracting various features from the images. A goal of this study is to create a low-cost machine learning model that is able to quickly classify paint defects in order to identify rework parts that can be repaired and shipped. The central thesis of this doctoral work is to test a machine learning model that can classify the paint defects based on a very small dataset of images, where the images are taken with a smartphone camera in a manufacturing setting. The end goal is to train the model for an overall accuracy rate of at least 80%. By using transfer learning and balancing the class datasets, the model was trained to achieve an overall accuracy rate of 82%.
|
59 |
Automatic Firearm Detection by Deep LearningKambhatla, Akhila 01 May 2020 (has links)
Surveillance cameras are a great support in crime investigation and proximity alarms and play a vital role in public safety. However current surveillance systems require continuous human supervision for monitoring. The primary goal of the thesis is to prevent firearm-related violence and injuries. Automatic firearm detection enhances security and safety among people. Therefore, introducing a Deep Learning Object Detection model to detect Firearms and alert the corresponding police department is the main motivation. Visual Object Detection is a fundamental recognition problem in computer vision that aims to find objects of certain target classes with precise localization of input image and assign it to the corresponding label. However, there are some challenges arising to the wide variations in shape, size, appearance, and occlusions by the weapon carrier. There are other objections to the selection of best object detection model. So, three deep learning models are selected, explained and shown the differences in detecting the firearms. The dataset in this thesis is the customized selection of different categories of firearm collection like the pistol, revolver, handgun, bullet, rifle along with human detection. The entire dataset is annotated manually in pascalvoc format. Date augmentation technique has been used to enlarge our dataset and facilitate in detecting firearms that re deformed and having occlusion properties.. To detect firearms this thesis developed and practiced unified networks like SSD and two-stage object detectors like faster RCNN. SSD is easy to understand and detect objects however it fails to detect smaller objects. Faster RCNN are efficient and able to detect smaller firearms in the scene. Each class has attained more than 90% of confidence score.
|
60 |
New method of all-sky searches for continuous gravitational waves / 連続重力波の新たな全天探索手法Yamamoto, Takahiro S. 24 May 2021 (has links)
京都大学 / 新制・課程博士 / 博士(理学) / 甲第23361号 / 理博第4732号 / 新制||理||1679(附属図書館) / 京都大学大学院理学研究科物理学・宇宙物理学専攻 / (主査)教授 田中 貴浩, 准教授 久徳 浩太郎, 教授 萩野 浩一 / 学位規則第4条第1項該当 / Doctor of Science / Kyoto University / DFAM
|
Page generated in 0.0819 seconds