Global ETD Search

31	Towards robust conversational speech recognition and understanding Weng, Chao 12 January 2015 (has links) While significant progress has been made in automatic speech recognition (ASR) during the last few decades, recognizing and understanding unconstrained conversational speech remains a challenging problem. In this dissertation, five methods/systems are proposed towards a robust conversational speech recognition and understanding system. I. A non-uniform minimum classification error (MCE) approach is proposed which can achieve consistent and significant keyword spotting performance gains on both English and Mandarin large-scale spontaneous conversational speech tasks (Switchboard and HKUST Mandarin CTS). II. A hybrid recurrent DNN-HMM system is proposed for robust acoustic modeling and a new way of backpropagation through time (BPTT) is introduced. The proposed system achieves state-of-the-art performances on two benchmark datasets, the 2nd CHiME challenge (track 2) and Aurora-4, without front-end preprocessing, speaker adaptive training or multiple decoding passes. III. To study the specific case of conversational speech recognition in the presence of competing talkers, several multi-style training setups of DNNs are investigated and a joint decoder operating on multi-talker speech is introduced. The proposed combined system improves upon the previous state-of-the-art IBM superhuman system by 2.8% absolute on the 2006 speech separation challenge dataset. IV. Latent semantic rational kernels (LSRKs) are proposed for spotting the semantic notions on conversational speech. The proposed framework is generalized using tf-idf weighting, latent semantic analysis, WordNet, probabilistic topic models and neural network learned representations and is shown to achieve substantial topic spotting performance gains on two conversational speech tasks, Switchboard and AT&T HMIHY initial collection. V. Non-uniform sequential discriminative training (DT) of DNNs with LSRKs is proposed which directly links the information of the proposed LSRK framework to the objective function of the DT. The experimental results on the subset of Switchboard show the proposed method can lead the acoustic modeling to a more robust system with respect to the semantic decoder. ASR WFSTs Robust speech recognition Conversational speech Speech understanding Topic spotting Deep neural networks
32	Approximate Neural Networks for Speech Applications in Resource-Constrained Environments January 2016 (has links) abstract: Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance, they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the memory and computation cost of keyword detection and speech recognition networks (or DNNs) are presented. The first technique is based on representing all weights and biases by a small number of bits and mapping all nodal computations into fixed-point ones with minimal degradation in the accuracy. Experiments conducted on the Resource Management (RM) database show that for the keyword detection neural network, representing the weights by 5 bits results in a 6 fold reduction in memory compared to a floating point implementation with very little loss in performance. Similarly, for the speech recognition neural network, representing the weights by 6 bits results in a 5 fold reduction in memory while maintaining an error rate similar to a floating point implementation. Additional reduction in memory is achieved by a technique called weight pruning, where the weights are classified as sensitive and insensitive and the sensitive weights are represented with higher precision. A combination of these two techniques helps reduce the memory footprint by 81 - 84% for speech recognition and keyword detection networks respectively. Further reduction in memory size is achieved by judiciously dropping connections for large blocks of weights. The corresponding technique, termed coarse-grain sparsification, introduces hardware-aware sparsity during DNN training, which leads to efficient weight memory compression and significant reduction in the number of computations during classification without loss of accuracy. Keyword detection and speech recognition DNNs trained with 75% of the weights dropped and classified with 5-6 bit weight precision effectively reduced the weight memory requirement by ~95% compared to a fully-connected network with double precision, while showing similar performance in keyword detection accuracy and word error rate. / Dissertation/Thesis / Masters Thesis Computer Science 2016 Artificial intelligence Deep Neural Networks Keyword Detection Memory Compression Speech Recognition
33	Study of Knowledge Transfer Techniques For Deep Learning on Edge Devices January 2018 (has links) abstract: With the emergence of edge computing paradigm, many applications such as image recognition and augmented reality require to perform machine learning (ML) and artificial intelligence (AI) tasks on edge devices. Most AI and ML models are large and computational heavy, whereas edge devices are usually equipped with limited computational and storage resources. Such models can be compressed and reduced in order to be placed on edge devices, but they may loose their capability and may not generalize and perform well compared to large models. Recent works used knowledge transfer techniques to transfer information from a large network (termed teacher) to a small one (termed student) in order to improve the performance of the latter. This approach seems to be promising for learning on edge devices, but a thorough investigation on its effectiveness is lacking. The purpose of this work is to provide an extensive study on the performance (both in terms of accuracy and convergence speed) of knowledge transfer, considering different student-teacher architectures, datasets and different techniques for transferring knowledge from teacher to student. A good performance improvement is obtained by transferring knowledge from both the intermediate layers and last layer of the teacher to a shallower student. But other architectures and transfer techniques do not fare so well and some of them even lead to negative performance impact. For example, a smaller and shorter network, trained with knowledge transfer on Caltech 101 achieved a significant improvement of 7.36\% in the accuracy and converges 16 times faster compared to the same network trained without knowledge transfer. On the other hand, smaller network which is thinner than the teacher network performed worse with an accuracy drop of 9.48\% on Caltech 101, even with utilization of knowledge transfer. / Dissertation/Thesis / Masters Thesis Computer Science 2018 Artificial intelligence Computer science Cloud Computing Deep Learning Deep neural networks Edge Computing Knowledge Transfer
34	Hluboké neuronové sítě pro předpovídání prodejů / Deep Neural Networks for Sales Forecasting Tyrpáková, Natália January 2016 (has links) Sales forecasting is an essential part of supply chain management. In retail business, accurate sales forecasts lead to significant cost reductions. Statistical methods that are commonly used for sales forecasting often overlook important aspects unique for the sales time series, which lowers the forecast accuracy. In this thesis we explore whether it is possible to improve short-term sales forecasting by employing deep neural networks. This thesis analyzes performance of various traditional deep neural network designs and proposes a novel architecture. It also explores several data preprocessing methods, both traditional and non-traditional, which turns out to be a crucial part of sales forecasting using deep neural networks. The best methods of deep neural network approach that we found are then compared to other forecasting methods such as traditional neural networks or exponential smoothing. Powered by TCPDF (www.tcpdf.org)
35	Applications of Tropical Geometry in Deep Neural Networks Alfarra, Motasem 04 1900 (has links) This thesis tackles the problem of understanding deep neural network with piece- wise linear activation functions. We leverage tropical geometry, a relatively new field in algebraic geometry to characterize the decision boundaries of a single hidden layer neural network. This characterization is leveraged to understand, and reformulate three interesting applications related to deep neural network. First, we give a geo- metrical demonstration of the behaviour of the lottery ticket hypothesis. Moreover, we deploy the geometrical characterization of the decision boundaries to reformulate the network pruning problem. This new formulation aims to prune network pa- rameters that are not contributing to the geometrical representation of the decision boundaries. In addition, we propose a dual view of adversarial attack that tackles both designing perturbations to the input image, and the equivalent perturbation to the decision boundaries. Deep Learning Deep Neural Networks Tropical Geometry Network Pruning Lottery Ticket Hypothesis Adversarial Attacks
36	Informatics Approaches for Understanding Human Facial Attractiveness Perception and Visual Attention / 人間の顔の魅力知覚と視覚的注意の情報学的アプローチによる解明 Tong, Song 24 May 2021 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23398号 / 情博第767号 / 新制\|\|情\|\|131(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授熊田孝恒, 教授西田眞也, 教授齋木潤, 准教授延原章平 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Facial attractiveness perception Visual attention Deep neural networks Social media platforms 007
37	Rozpoznávání pojmenovaných entit v biomedicínské doméně / Named entity recognition in the biomedical domain Williams, Shadasha January 2021 (has links) Thesis Title: Named Entity Recognition in the Biomedical Domain Named entity recognition (NER) is the task of information extraction that attempts to recognize and extract particular entities in a text. One of the issues that stems from NER is that its models are domain specific. The goal of the thesis is to focus on entities strictly from the biomedical domain. The other issue with NER comes the synonymous terms that may be linked to one entity, moreover they lead to issue of disambiguation of the entities. Due to the popularity of neural networks and their success in NLP tasks, the work should use a neural network architecture for the task of named entity disambiguation, which is described in the paper by Eshel et al [1]. One of the subtasks of the thesis is to map the words and entities to a vector space using word embeddings, which attempts to provide textual context similarity, and coherence [2]. The main output of the thesis will be a model that attempts to disambiguate entities of the biomedical domain, using scientific journals (PubMed and Embase) as the documents of our interest.
38	Human Understandable Interpretation of Deep Neural Networks Decisions Using Generative Models Alabdallah, Abdallah January 2019 (has links) Deep Neural Networks have long been considered black box systems, where their interpretability is a concern when applied in safety critical systems. In this work, a novel approach of interpreting the decisions of DNNs is proposed. The approach depends on exploiting generative models and the interpretability of their latent space. Three methods for ranking features are explored, two of which depend on sensitivity analysis, and the third one depends on Random Forest model. The Random Forest model was the most successful to rank the features, given its accuracy and inherent interpretability. Explainable AI Deep Neural Networks Interpretability Disentangled Representation Representation Learning Engineering and Technology Teknik och teknologier
39	Inferential GANs and Deep Feature Selection with Applications Yao Chen (8892395) 15 June 2020 (has links) Deep nueral networks (DNNs) have become popular due to their predictive power and flexibility in model fitting. In unsupervised learning, variational autoencoders (VAEs) and generative adverarial networks (GANs) are two most popular and successful generative models. How to provide a unifying framework combining the best of VAEs and GANs in a principled way is a challenging task. In supervised learning, the demand for high-dimensional data analysis has grown significantly, especially in the applications of social networking, bioinformatics, and neuroscience. How to simultaneously approximate the true underlying nonlinear system and identify relevant features based on high-dimensional data (typically with the sample size smaller than the dimension, a.k.a. small-n-large-p) is another challenging task.<div><br></div><div>In this dissertation, we have provided satisfactory answers for these two challenges. In addition, we have illustrated some promising applications using modern machine learning methods.<br></div><div><br></div><div>In the first chapter, we introduce a novel inferential Wasserstein GAN (iWGAN) model, which is a principled framework to fuse auto-encoders and WGANs. GANs have been impactful on many problems and applications but suffer from unstable training. The Wasserstein GAN (WGAN) leverages the Wasserstein distance to avoid the caveats in the minmax two-player training of GANs but has other defects such as mode collapse and lack of metric to detect the convergence. The iWGAN model jointly learns an encoder network and a generator network motivated by the iterative primal dual optimization process. The encoder network maps the observed samples to the latent space and the generator network maps the samples from the latent space to the data space. We establish the generalization error bound of iWGANs to theoretically justify the performance of iWGANs. We further provide a rigorous probabilistic interpretation of our model under the framework of maximum likelihood estimation. The iWGAN, with a clear stopping criteria, has many advantages over other autoencoder GANs. The empirical experiments show that the iWGAN greatly mitigates the symptom of mode collapse, speeds up the convergence, and is able to provide a measurement of quality check for each individual sample. We illustrate the ability of iWGANs by obtaining a competitive and stable performance with state-of-the-art for benchmark datasets. <br></div><div><br></div><div>In the second chapter, we present a general framework for high-dimensional nonlinear variable selection using deep neural networks under the framework of supervised learning. The network architecture includes both a selection layer and approximation layers. The problem can be cast as a sparsity-constrained optimization with a sparse parameter in the selection layer and other parameters in the approximation layers. This problem is challenging due to the sparse constraint and the nonconvex optimization. We propose a novel algorithm, called Deep Feature Selection, to estimate both the sparse parameter and the other parameters. Theoretically, we establish the algorithm convergence and the selection consistency when the objective function has a Generalized Stable Restricted Hessian. This result provides theoretical justifications of our method and generalizes known results for high-dimensional linear variable selection. Simulations and real data analysis are conducted to demonstrate the superior performance of our method.<br></div><div><br></div><div><div>In the third chapter, we develop a novel methodology to classify the electrocardiograms (ECGs) to normal, atrial fibrillation and other cardiac dysrhythmias as defined by the Physionet Challenge 2017. More specifically, we use piecewise linear splines for the feature selection and a gradient boosting algorithm for the classifier. In the algorithm, the ECG waveform is fitted by a piecewise linear spline, and morphological features related to the piecewise linear spline coefficients are extracted. XGBoost is used to classify the morphological coefficients and heart rate variability features. The performance of the algorithm was evaluated by the PhysioNet Challenge database (3658 ECGs classified by experts). Our algorithm achieves an average F1 score of 81% for a 10-fold cross validation and also achieved 81% for F1 score on the independent testing set. This score is similar to the top 9th score (81%) in the official phase of the Physionet Challenge 2017.</div></div><div><br></div><div>In the fourth chapter, we introduce a novel region-selection penalty in the framework of image-on-scalar regression to impose sparsity of pixel values and extract active regions simultaneously. This method helps identify regions of interest (ROI) associated with certain disease, which has a great impact on public health. Our penalty combines the Smoothly Clipped Absolute Deviation (SCAD) regularization, enforcing sparsity, and the SCAD of total variation (TV) regularization, enforcing spatial contiguity, into one group, which segments contiguous spatial regions against zero-valued background. Efficient algorithm is based on the alternative direction method of multipliers (ADMM) which decomposes the non-convex problem into two iterative optimization problems with explicit solutions. Another virtue of the proposed method is that a divide and conquer learning algorithm is developed, thereby allowing scaling to large images. Several examples are presented and the experimental results are compared with other state-of-the-art approaches. <br></div> Statistics Generative Adversarial Network Deep neural networks high dimensional statistics machine learning-based Sparse learning
40	Porovnání hlubokých neuronových sítí a standardních metod pro detekci dopravního značení / Comparison of deep learning and classical methods for traffic signs detection Geiger, Petr January 2019 (has links) The goal of this thesis is to explore and evaluate classic and deep neural network computer vision methods in the task of detection position of a level crossing barrier. This thesis is based on an initial detection algorithm using a Stable Wave Detector. The initial algorithm is optimized both in performance and quality of the results. Both is crucial, because the best method should be suitable as a component of the real-time level crossing safety system. Then an another approach is implemented using deep neural networks and optimized in the same manner. Throughout the work several datasets are created for both training and testing of the algorithms. Both approaches are finally evaluated on the same test datasets and the results are compared.

Search results