Global ETD Search

21	Deep neural networks for music tagging Choi, Keunwoo January 2018 (has links) In this thesis, I present my hypothesis, experiment results, and discussion that are related to various aspects of deep neural networks for music tagging. Music tagging is a task to automatically predict the suitable semantic label when music is provided. Generally speaking, the input of music tagging systems can be any entity that constitutes music, e.g., audio content, lyrics, or metadata, but only the audio content is considered in this thesis. My hypothesis is that we can fi nd effective deep learning practices for the task of music tagging task that improves the classi fication performance. As a computational model to realise a music tagging system, I use deep neural networks. Combined with the research problem, the scope of this thesis is the understanding, interpretation, optimisation, and application of deep neural networks in the context of music tagging systems. The ultimate goal of this thesis is to provide insight that can help to improve deep learning-based music tagging systems. There are many smaller goals in this regard. Since using deep neural networks is a data-driven approach, it is crucial to understand the dataset. Selecting and designing a better architecture is the next topic to discuss. Since the tagging is done with audio input, preprocessing the audio signal becomes one of the important research topics. After building (or training) a music tagging system, fi nding a suitable way to re-use it for other music information retrieval tasks is a compelling topic, in addition to interpreting the trained system. The evidence presented in the thesis supports that deep neural networks are powerful and credible methods for building a music tagging system.
22	Chromosome 3D Structure Modeling and New Approaches For General Statistical Inference Rongrong Zhang (5930474) 03 January 2019 (has links) <div>This thesis consists of two separate topics, which include the use of piecewise helical models for the inference of 3D spatial organizations of chromosomes and new approaches for general statistical inference. The recently developed Hi-C technology enables a genome-wide view of chromosome</div><div>spatial organizations, and has shed deep insights into genome structure and genome function. However, multiple sources of uncertainties make downstream data analysis and interpretation challenging. Specically, statistical models for inferring three-dimensional (3D) chromosomal structure from Hi-C data are far from their maturity. Most existing methods are highly over-parameterized, lacking clear interpretations, and sensitive to outliers. We propose a parsimonious, easy to interpret, and robust piecewise helical curve model for the inference of 3D chromosomal structures</div><div>from Hi-C data, for both individual topologically associated domains and whole chromosomes. When applied to a real Hi-C dataset, the piecewise helical model not only achieves much better model tting than existing models, but also reveals that geometric properties of chromatin spatial organization are closely related to genome function.</div><div><br></div><div><div>For potential applications in big data analytics and machine learning, we propose to use deep neural networks to automate the Bayesian model selection and parameter estimation procedures. Two such frameworks are developed under different scenarios. First, we construct a deep neural network-based Bayes estimator for the parameters of a given model. The neural Bayes estimator mitigates the computational challenges faced by traditional approaches for computing Bayes estimators. When applied to the generalized linear mixed models, the neural Bayes estimator</div><div>outperforms existing methods implemented in R packages and SAS procedures. Second, we construct a deep convolutional neural networks-based framework to perform</div><div>simultaneous Bayesian model selection and parameter estimation. We refer to the neural networks for model selection and parameter estimation in the framework as the</div><div>neural model selector and parameter estimator, respectively, which can be properly trained using labeled data systematically generated from candidate models. Simulation</div><div>study shows that both the neural selector and estimator demonstrate excellent performances.</div></div><div><br></div><div><div>The theory of Conditional Inferential Models (CIMs) has been introduced to combine information for efficient inference in the Inferential Models framework for priorfree</div><div>and yet valid probabilistic inference. While the general theory is subject to further development, the so-called regular CIMs are simple. We establish and prove a</div><div>necessary and sucient condition for the existence and identication of regular CIMs. More specically, it is shown that for inference based on a sample from continuous</div><div>distributions with unknown parameters, the corresponding CIM is regular if and only if the unknown parameters are generalized location and scale parameters, indexing</div><div>the transformations of an affine group.</div></div> Statistics big data analytics machine learning deep neural networks bayesian model selection
23	Object Recognition in Videos Utilizing Hierarchical and Temporal Objectness with Deep Neural Networks Peng, Liang 01 May 2017 (has links) This dissertation develops a novel system for object recognition in videos. The input of the system is a set of unconstrained videos containing a known set of objects. The output is the locations and categories for each object in each frame across all videos. Initially, a shot boundary detection algorithm is applied to the videos to divide them into multiple sequences separated by the identified shot boundaries. Since each of these sequences still contains moderate content variations, we further use a cost optimization-based key frame extraction method to select key frames in each sequence and use these key frames to divide the videos into shorter sub-sequences with little content variations. Next, we learn object proposals on the first frame of each sub-sequence. Building upon the state-of-the-art object detection algorithms, we develop a tree-based hierarchical model to improve the object detection. Using the learned object proposals as the initial object positions in the first frame of each sub-sequence, we apply the SPOT tracker to track the object proposals and re-rank them using the proposed temporal objectness to obtain object proposals tubes by removing unlikely objects. Finally, we employ the deep Convolution Neural Network (CNN) to perform classification on these tubes. Experiments show that the proposed system significantly improves the object detection rate of the learned proposals when comparing with some state-of-the-art object detectors. Due to the improvement in object detection, the proposed system also achieves higher mean average precision at the stage of proposal classification than the state-of-the-art methods. Object Detection Object Recognition Hierarchical Temporal Deep Neural Networks Computer Sciences
24	Contribution au développement de l’apprentissage profond dans les systèmes distribués / Contribution to the development of deep learning in distributed systems Hardy, Corentin 08 April 2019 (has links) L'apprentissage profond permet de développer un nombre de services de plus en plus important. Il nécessite cependant de grandes bases de données d'apprentissage et beaucoup de puissance de calcul. Afin de réduire les coûts de cet apprentissage profond, nous proposons la mise en œuvre d'un apprentissage collaboratif. Les futures utilisateurs des services permis par l'apprentissage profond peuvent ainsi participer à celui-ci en mettant à disposition leurs machines ainsi que leurs données sans déplacer ces dernières sur le cloud. Nous proposons différentes méthodes afin d'apprendre des réseaux de neurones profonds dans ce contexte de système distribué. / Deep learning enables the development of a growing number of services. However, it requires large training databases and a lot of computing power. In order to reduce the costs of this deep learning, we propose a distributed computing setup to enable collaborative learning. Future users can participate with their devices and their data without moving private data in datacenters. We propose methods to train deep neural network in this distibuted system context. Réseaux de neurones profonds Apprentissage automatique Calcul distribué Deep neural networks Machine learning Distributed computing
25	Towards robust conversational speech recognition and understanding Weng, Chao 12 January 2015 (has links) While significant progress has been made in automatic speech recognition (ASR) during the last few decades, recognizing and understanding unconstrained conversational speech remains a challenging problem. In this dissertation, five methods/systems are proposed towards a robust conversational speech recognition and understanding system. I. A non-uniform minimum classification error (MCE) approach is proposed which can achieve consistent and significant keyword spotting performance gains on both English and Mandarin large-scale spontaneous conversational speech tasks (Switchboard and HKUST Mandarin CTS). II. A hybrid recurrent DNN-HMM system is proposed for robust acoustic modeling and a new way of backpropagation through time (BPTT) is introduced. The proposed system achieves state-of-the-art performances on two benchmark datasets, the 2nd CHiME challenge (track 2) and Aurora-4, without front-end preprocessing, speaker adaptive training or multiple decoding passes. III. To study the specific case of conversational speech recognition in the presence of competing talkers, several multi-style training setups of DNNs are investigated and a joint decoder operating on multi-talker speech is introduced. The proposed combined system improves upon the previous state-of-the-art IBM superhuman system by 2.8% absolute on the 2006 speech separation challenge dataset. IV. Latent semantic rational kernels (LSRKs) are proposed for spotting the semantic notions on conversational speech. The proposed framework is generalized using tf-idf weighting, latent semantic analysis, WordNet, probabilistic topic models and neural network learned representations and is shown to achieve substantial topic spotting performance gains on two conversational speech tasks, Switchboard and AT&T HMIHY initial collection. V. Non-uniform sequential discriminative training (DT) of DNNs with LSRKs is proposed which directly links the information of the proposed LSRK framework to the objective function of the DT. The experimental results on the subset of Switchboard show the proposed method can lead the acoustic modeling to a more robust system with respect to the semantic decoder. ASR WFSTs Robust speech recognition Conversational speech Speech understanding Topic spotting Deep neural networks
26	Approximate Neural Networks for Speech Applications in Resource-Constrained Environments January 2016 (has links) abstract: Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance, they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the memory and computation cost of keyword detection and speech recognition networks (or DNNs) are presented. The first technique is based on representing all weights and biases by a small number of bits and mapping all nodal computations into fixed-point ones with minimal degradation in the accuracy. Experiments conducted on the Resource Management (RM) database show that for the keyword detection neural network, representing the weights by 5 bits results in a 6 fold reduction in memory compared to a floating point implementation with very little loss in performance. Similarly, for the speech recognition neural network, representing the weights by 6 bits results in a 5 fold reduction in memory while maintaining an error rate similar to a floating point implementation. Additional reduction in memory is achieved by a technique called weight pruning, where the weights are classified as sensitive and insensitive and the sensitive weights are represented with higher precision. A combination of these two techniques helps reduce the memory footprint by 81 - 84% for speech recognition and keyword detection networks respectively. Further reduction in memory size is achieved by judiciously dropping connections for large blocks of weights. The corresponding technique, termed coarse-grain sparsification, introduces hardware-aware sparsity during DNN training, which leads to efficient weight memory compression and significant reduction in the number of computations during classification without loss of accuracy. Keyword detection and speech recognition DNNs trained with 75% of the weights dropped and classified with 5-6 bit weight precision effectively reduced the weight memory requirement by ~95% compared to a fully-connected network with double precision, while showing similar performance in keyword detection accuracy and word error rate. / Dissertation/Thesis / Masters Thesis Computer Science 2016 Artificial intelligence Deep Neural Networks Keyword Detection Memory Compression Speech Recognition
27	Study of Knowledge Transfer Techniques For Deep Learning on Edge Devices January 2018 (has links) abstract: With the emergence of edge computing paradigm, many applications such as image recognition and augmented reality require to perform machine learning (ML) and artificial intelligence (AI) tasks on edge devices. Most AI and ML models are large and computational heavy, whereas edge devices are usually equipped with limited computational and storage resources. Such models can be compressed and reduced in order to be placed on edge devices, but they may loose their capability and may not generalize and perform well compared to large models. Recent works used knowledge transfer techniques to transfer information from a large network (termed teacher) to a small one (termed student) in order to improve the performance of the latter. This approach seems to be promising for learning on edge devices, but a thorough investigation on its effectiveness is lacking. The purpose of this work is to provide an extensive study on the performance (both in terms of accuracy and convergence speed) of knowledge transfer, considering different student-teacher architectures, datasets and different techniques for transferring knowledge from teacher to student. A good performance improvement is obtained by transferring knowledge from both the intermediate layers and last layer of the teacher to a shallower student. But other architectures and transfer techniques do not fare so well and some of them even lead to negative performance impact. For example, a smaller and shorter network, trained with knowledge transfer on Caltech 101 achieved a significant improvement of 7.36\% in the accuracy and converges 16 times faster compared to the same network trained without knowledge transfer. On the other hand, smaller network which is thinner than the teacher network performed worse with an accuracy drop of 9.48\% on Caltech 101, even with utilization of knowledge transfer. / Dissertation/Thesis / Masters Thesis Computer Science 2018 Artificial intelligence Computer science Cloud Computing Deep Learning Deep neural networks Edge Computing Knowledge Transfer
28	Applications of Tropical Geometry in Deep Neural Networks Alfarra, Motasem 04 1900 (has links) This thesis tackles the problem of understanding deep neural network with piece- wise linear activation functions. We leverage tropical geometry, a relatively new field in algebraic geometry to characterize the decision boundaries of a single hidden layer neural network. This characterization is leveraged to understand, and reformulate three interesting applications related to deep neural network. First, we give a geo- metrical demonstration of the behaviour of the lottery ticket hypothesis. Moreover, we deploy the geometrical characterization of the decision boundaries to reformulate the network pruning problem. This new formulation aims to prune network pa- rameters that are not contributing to the geometrical representation of the decision boundaries. In addition, we propose a dual view of adversarial attack that tackles both designing perturbations to the input image, and the equivalent perturbation to the decision boundaries. Deep Learning Deep Neural Networks Tropical Geometry Network Pruning Lottery Ticket Hypothesis Adversarial Attacks
29	Informatics Approaches for Understanding Human Facial Attractiveness Perception and Visual Attention / 人間の顔の魅力知覚と視覚的注意の情報学的アプローチによる解明 Tong, Song 24 May 2021 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23398号 / 情博第767号 / 新制\|\|情\|\|131(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授熊田孝恒, 教授西田眞也, 教授齋木潤, 准教授延原章平 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Facial attractiveness perception Visual attention Deep neural networks Social media platforms 007
30	Rozpoznávání pojmenovaných entit v biomedicínské doméně / Named entity recognition in the biomedical domain Williams, Shadasha January 2021 (has links) Thesis Title: Named Entity Recognition in the Biomedical Domain Named entity recognition (NER) is the task of information extraction that attempts to recognize and extract particular entities in a text. One of the issues that stems from NER is that its models are domain specific. The goal of the thesis is to focus on entities strictly from the biomedical domain. The other issue with NER comes the synonymous terms that may be linked to one entity, moreover they lead to issue of disambiguation of the entities. Due to the popularity of neural networks and their success in NLP tasks, the work should use a neural network architecture for the task of named entity disambiguation, which is described in the paper by Eshel et al [1]. One of the subtasks of the thesis is to map the words and entities to a vector space using word embeddings, which attempts to provide textual context similarity, and coherence [2]. The main output of the thesis will be a model that attempts to disambiguate entities of the biomedical domain, using scientific journals (PubMed and Embase) as the documents of our interest.

Search results