Global ETD Search

31	基於圖像資訊之音樂資訊檢索研究 / A study of image-based music information retrieval 夏致群 Unknown Date (has links) 以往的音樂資訊檢索方法多使用歌詞、曲風、演奏的樂器或一段音頻訊號來當作查詢的媒介，然而，在某些情況下，使用者沒有辦法清楚描述他們想要尋找的歌曲，如：情境式的音樂檢索。本論文提出了一種基於圖像的情境式音樂資訊檢索方法，可以透過輸入圖片來找尋相應的音樂。此方法中我們使用了卷積神經網絡（Convolutional Neural Network）技術來處理圖片，將其轉為低維度的表示法。為了將異質性的多媒體訊息映射到同一個向量空間，資訊網路表示法學習（Network Embedding）技術也被使用，如此一來，可以使用距離計算找回和輸入圖片有關的多媒體訊息。我們相信這樣的方法可以改善異質性資訊間的隔閡（Heterogeneous Gap），也就是指不同種類的多媒體檔案之間無法互相轉換或詮釋。在實驗與評估方面，首先利用從歌詞與歌名得到的關鍵字來搜尋大量圖片當作訓練資料集，接著實作提出的檢索方法，並針對實驗結果做評估。除了對此方法的有效性做測試外，使用者的回饋也顯示此檢索方法和其他方法相比是有效的。同時我們也實作了一個網路原型，使用者可以上傳圖片並得到檢索後的歌曲，實際的使用案例也將在本論文中被展示與介紹。 / Listening to music is indispensable to everyone. Music information retrieval systems help users find their favorite music. A common scenario of music information retrieval systems is to search songs based on user's query. Most existing methods use descriptions (e.g., genre, instrument and lyric) or audio signal of music as the query; then the songs related to the query will be retrieved. The limitation of this scenario is that users might be difficult to describe what they really want to search for. In this paper, we propose a novel method, called ``image2song,'' which allows users to input an image to retrieve the related songs. The proposed method consists of three modules: convolutional neural network (CNN) module, network embedding module, and similarity calculation module. For the processing of the images, in our work the CNN is adopted to learn the representations for images. To map each entity (e.g., image, song, and keyword) into a same embedding space, the heterogeneous representation is learned by network embedding algorithm from the information graph. This method is flexible because it is easy to join other types of multimedia data into the information graph. In similarity calculation module, the Euclidean distance and cosine distance is used as our criterion to compare the similarity. Then we can retrieve the most relevant songs according to the similarity calculation. The experimental results show that the proposed method has a good performance. Furthermore, we also build an online image-based music information retrieval prototype system, which can showcase some examples of our experiments. 音樂資訊檢索跨多媒體檢索卷積神經網絡資訊網路表示法學習 Music information retrieval Cross-media retrieva Convolution neural network Network embedding
32	Analyse de la réduction du chatoiement sur les images radar polarimétrique à l'aide des réseaux neuronaux à convolutions Beaulieu, Mario 04 1900 (has links) En raison de la nature cohérente du signal RADAR à synthèse d’ouverture (RSO), les images RSO polarimétriques (RSOPOL) sont affectées par le bruit de chatoiement. L’effet du chatoiement peut être sévère au point de rendre inutilisable la donnée RSOPOL. Ceci est particulièrement vrai pour les données à une vue qui souffrent d’un chatoiement très intense.Un filtrage du bruit est nécessaire pour améliorer l’estimation des paramètres polarimétriques pouvant être calculés à partir de ce type de données. Cette opération constitue une étape importante dans le traitement et l’analyse des images RSOPOL. Récemment une nouvelle approche est apparue en traitement de données visant la solution d’une multitude de problèmes dont le filtrage, la restauration d’images, la reconnaissance de la parole, la classification ou la segmentation d’images. Cette approche est l’apprentissage profond et les réseaux de neurones à convolution (RNC). Des travaux récents montrent que les RNC sont une alternative prometteuse pour le filtrages des images RSO. En effet par leur capacité d’apprendre un modèle optimal de filtrage, ils tendent à surpasser les approches classiques du filtrage sur les images RSO. L’objectif de cette présente étude est d’analyser et d’évaluer l’efficacité du filtrage par RNC sur des données RSOPOL simulées et sur des images satellitaires RSOPOL RADARSAT-2, ALOS/PalSAR et GaoFen-3 acquises sur la région urbaine de San Francisco (Californie). Des modèles inspirés de l’architecture d’un RNC utilisé notamment en Super-résolution ont été adaptés pour le filtrage de la matrice de cohérence polarimétrique. L’effet de différents paramètres structuraux de l’architecture des RNC sur le filtrage ont été analysés, parmi ceux-ci on retrouve entre autres la profondeur du réseau (le nombre de couches empilées), la largeur du réseau (le nombre de filtres par couches convolutives) et la taille des filtres de la première couche convolutive. L’apprentissage des modèles a été effectué par la rétropropagation du gradient de l’erreur en utilisant 3 ensembles de données qui simulent la polarimétrie une vue des diffuseurs selon les classes de Cloude-Pottier. Le premier ensemble ne comporte que des zones homogènes.Les deux derniers ensembles sont composés de simulations en patchwork dont l’intensité locale est simulée par des images de texture et de cibles ponctuelles ajoutées au patchwork dans le cas du dernier ensemble. Les performances des différents filtres par RNC ont été mesurées par des indicateurs comprenant l’erreur relative sur l’estimation de signatures polarimétriques et des paramètres de décomposition ainsi que des mesures de distorsion sur la récupération des détails importants et sur la conservation des cibles ponctuelles. Les résultats montrent que le filtrage par RNC des données polarimétriques est soit équivalent ou nettement supérieur aux filtres conventionnellement utilisées en polarimétrie.Les résultats des modèles les plus profonds obtiennent les meilleures performances pour tous les indicateurs sur l’ensemble des données homogènes simulées. Dans le cas des données en patchwork, les résultats pour la restauration des détails sont nettement favorables au filtrage par RNC les plus profonds.L’application du filtrage par RNC sur les images satellitaires RADARSAT-2,ALOS/PalSAR ainsi GaoFen-3 montre des résultats comparables ou supérieurs aux filtres conventionnels. Les meilleurs résultats ont été obtenus par le modèle à 5 couches cachées(si on ne compte pas la couche d’entrée et de sortie), avec 8 filtres 3×3 par couche convolutive, sauf pour la couche d’entrée où la taille des filtres étaient de 9×9. Par contre,les données d’apprentissage doivent être bien ajustées à l’étendue des statistiques des images polarimétriques réelles pour obtenir de bon résultats. Ceci est surtout vrai au niveau de la modélisation des cibles ponctuelles dont la restauration semblent plus difficiles. / Due to the coherent nature of the Synthetic Aperture Radar (SAR) signal, polarimetric SAR(POLSAR) images are affected by speckle noise. The effect of speckle can be so severe as to render the POLSAR data unusable. This is especially true for single-look data that suffer from very intense speckle. Noise filtering is necessary to improve the estimation of polarimetric parameters that can be computed from this type of data. This is an important step in the processing and analysis of POLSAR images. Recently, a new approach has emerged in data processing aimed at solving a multi-tude of problems including filtering, image restoration, speech recognition, classification orimage segmentation. This approach is deep learning and convolutional neural networks(CONVNET). Recent works show that CONVNET are a promising alternative for filtering SAR images. Indeed, by their ability to learn an optimal filtering model only from the data, they tend to outperform classical approaches to filtering on SAR images. The objective of this study is to analyze and evaluate the effectiveness of CONVNET filtering on simulated POLSAR data and on RADARSAT-2, ALOS/PalSAR and GaoFen-3 satellite images acquired over the San Francisco urban area (California). Models inspired by the architecture of a CONVNET used in particular in super-resolution have been adapted for the filtering of the polarimetric coherency matrix. The effect of different structural parameters of theCONVNET architecture on filtering were analyzed, among which are the depth of the neural network (the number of stacked layers), the width of the neural network (the number of filters per convoluted layer) and the size of the filters of the first convolution layer. The models were learned by backpropagation of the error gradient using 3 datasets that simulate single-look polarimetry of the scatterers according to Cloude-Pottier classes. The first dataset contains only homogeneous areas. The last two datasets consist of patchwork simulations where local intensity is simulated by texture images and point target are added to the patchwork in the case of the last dataset. The performance of the different filters by CONVNET was measured by indicators including relative error on the estimation of polarimetric signatures and decomposition parameters as well as distortion measurements on the recovery of major details and on the conservation of point targets.The results show that CONVNET filtering of polarimetric data is either equivalent or significantly superior to conventional polarimetric filters. The results of the deepest models obtain the best performance for all indicators over the simulated homogeneous dataset. Inthe case of patchwork dataset, the results for detail restoration are clearly favourable to the deepest CONVNET filtering. The application of CONVNET filtering on RADARSAT-2, ALOS/PalSAR andGaoFen-3 satellite images shows results comparable or superior to conventional filters. The best results were obtained by the 5 hidden layers model (not counting the input and outputlayers), with 8 filters 3×3 per convolutional layer, except for the input layer where the filtersize was 9×9. On the other hand, the training data must be well adjusted to the statistical range of the real polarimetric images to obtain good results. This is especially true when modeling point targets that appear to be more difficult to restore. Apprentissage automatique Réseau de neurones à convolution Polarimétrie RADAR à synthèse d'ouvertures Filtrage Chatoiement Estimation polarimétrique San Francisco Deeplearning Convolution neural network Polarimetry Synthetic aperture RADAR Filtering Polarimetric estimation
33	Word2vec modely s přidanou kontextovou informací / Word2vec Models with Added Context Information Šůstek, Martin January 2017 (has links) This thesis is concerned with the explanation of the word2vec models. Even though word2vec was introduced recently (2013), many researchers have already tried to extend, understand or at least use the model because it provides surprisingly rich semantic information. This information is encoded in N-dim vector representation and can be recall by performing some operations over the algebra. As an addition, I suggest a model modifications in order to obtain different word representation. To achieve that, I use public picture datasets. This thesis also includes parts dedicated to word2vec extension based on convolution neural network.
34	Enhancing failure prediction from timeseries histogram data : through fine-tuned lower-dimensional representations Jayaraman, Vijay January 2023 (has links) Histogram data are widely used for compressing high-frequency time-series signals due to their ability to capture distributional informa-tion. However, this compression comes at the cost of increased di-mensionality and loss of contextual details from the original features.This study addresses the challenge of effectively capturing changesin distributions over time and their contribution to failure prediction.Specifically, we focus on the task of predicting Time to Event (TTE) forturbocharger failures.In this thesis, we propose a novel approach to improve failure pre-diction by fine-tuning lower-dimensional representations of bi-variatehistograms. The goal is to optimize these representations in a waythat enhances their ability to predict component failure. Moreover, wecompare the performance of our learned representations with hand-crafted histogram features to assess the efficacy of both approaches.We evaluate the different representations using the Weibull Time ToEvent - Recurrent Neural Network (WTTE-RNN) framework, which isa popular choice for TTE prediction tasks. By conducting extensive ex-periments, we demonstrate that the fine-tuning approach yields supe-rior results compared to general lower-dimensional learned features.Notably, our approach achieves performance levels close to state-of-the-art results.This research contributes to the understanding of effective failureprediction from time series histogram data. The findings highlightthe significance of fine-tuning lower-dimensional representations forimproving predictive capabilities in real-world applications. The in-sights gained from this study can potentially impact various indus-tries, where failure prediction is crucial for proactive maintenanceand reliability enhancement. time-series prediction time-series histogram analysis Convolution Neural Network (CNN) Autoencoder Weibull Time-to-Event (WTTE) Recurrent Neural Network (RNN) Engine turbo charger failure prediciton Engineering and Technology Teknik och teknologier Computer Systems Datorsystem
35	Hybridní hluboké metody pro automatické odpovídání na otázky / Hybrid Deep Question Answering Aghaebrahimian, Ahmad January 2019 (has links) Title: Hybrid Deep Question Answering Author: Ahmad Aghaebrahimian Institute: Institute of Formal and Applied Linguistics Supervisor: RNDr. Martin Holub, Ph.D., Institute of Formal and Applied Lin- guistics Abstract: As one of the oldest tasks of Natural Language Processing, Question Answering is one of the most exciting and challenging research areas with lots of scientific and commercial applications. Question Answering as a discipline in the conjunction of computer science, statistics, linguistics, and cognitive science is concerned with building systems that automatically retrieve answers to ques- tions posed by humans in a natural language. This doctoral dissertation presents the author's research carried out in this discipline. It highlights his studies and research toward a hybrid Question Answering system consisting of two engines for Question Answering over structured and unstructured data. The structured engine comprises a state-of-the-art Question Answering system based on knowl- edge graphs. The unstructured engine consists of a state-of-the-art sentence-level Question Answering system and a word-level Question Answering system with results near to human performance. This work introduces a new Question An- swering dataset for answering word- and sentence-level questions as well. Start- ing from a...
36	Reconnaissance de postures humaines par fusion de la silhouette et de l'ombre dans l'infrarouge Gouiaa, Rafik 01 1900 (has links) Les systèmes multicaméras utilisés pour la vidéosurveillance sont complexes, lourds et coûteux. Pour la surveillance d'une pièce, serait-il possible de les remplacer par un système beaucoup plus simple utilisant une seule caméra et une ou plusieurs sources lumineuses en misant sur les ombres projetées pour obtenir de l'information 3D ? Malgré les résultats intéressants offerts par les systèmes multicaméras, la quantité d'information à traiter et leur complexité limitent grandement leur usage. Dans le même contexte, nous proposons de simplifier ces systèmes en remplaçant une caméra par une source lumineuse. En effet, une source lumineuse peut être vue comme une caméra qui génère une image d'ombre révélant l'objet qui bloque la lumière. Notre système sera composé par une seule caméra et une ou plusieurs sources lumineuses infrarouges (invisibles à l'oeil). Malgré les difficultés prévues quant à l'extraction de l'ombre et la déformation et l'occultation de l'ombre par des obstacles (murs, meubles...), les gains sont multiples en utilisant notre système. En effet, on peut éviter ainsi les problèmes de synchronisation et de calibrage de caméras et réduire le coût en remplaçant des caméras par de simples sources infrarouges. Nous proposons deux approches différentes pour automatiser la reconnaissance de postures humaines. La première approche reconstruit la forme 3D d'une personne pour faire la reconnaissance de la posture en utilisant des descripteurs de forme. La deuxième approche combine directement l'information 2D (ombre+silhouette) pour faire la reconnaissance de postures. Scientifiquement, nous cherchons à prouver que l'information offerte par une silhouette et l'ombre générée par une source lumineuse est suffisante pour permettre la reconnaissance de postures humaines élémentaires (p.ex. debout, assise, couchée, penchée, etc.). Le système proposé peut être utilisé pour la vidéosurveillance d'endroits non encombrés tels qu'un corridor dans une résidence de personnes âgées (pour la détection des chutes p. ex.) ou d'une compagnie (pour la sécurité). Son faible coût permettrait un plus grand usage de la vidéosurveillance au bénéfice de la société. Au niveau scientifique, la démonstration théorique et pratique d'un tel système est originale et offre un grand potentiel pour la vidéosurveillance. / Human posture recognition (HPR) from video sequences is one of the major active research areas of computer vision. It is one step of the global process of human activity recognition (HAR) for behaviors analysis. Many HPR application systems have been developed including video surveillance, human-machine interaction, and the video retrieval. Generally, applications related to HPR can be achieved using mainly two approaches : single camera or multi-cameras. Despite the interesting performance achieved by multi-camera systems, their complexity and the huge information to be processed greatly limit their widespread use for HPR. The main goal of this thesis is to simplify the multi-camera system by replacing a camera by a light source. In fact, a light source can be seen as a virtual camera, which generates a cast shadow image representing the silhouette of the person that blocks the light. Our system will consist of a single camera and one or more infrared light sources. Despite some technical difficulties in cast shadow segmentation and cast shadow deformation because of walls and furniture, different advantages can be achieved by using our system. Indeed, we can avoid the synchronization and calibration problems of multiple cameras, reducing the cost of the system and the amount of processed data by replacing a camera by one light source. We introduce two different approaches in order to automatically recognize human postures. The first approach directly combines the person’s silhouette and cast shadow information, and uses 2D silhouette descriptor in order to extract discriminative features useful for HPR. The second approach is inspired from the shape from silhouette technique to reconstruct the visual hull of the posture using a set of cast shadow silhouettes, and extract informative features through 3D shape descriptor. Using these approaches, our goal is to prove the utility of the combination of person’s silhouette and cast shadow information for recognizing elementary human postures (stand, bend, crouch, fall,...) The proposed system can be used for video surveillance of uncluttered areas such as a corridor in a senior’s residence (for example, for the detection of falls) or in a company (for security). Its low cost may allow greater use of video surveillance for the benefit of society. Reconnaissance de postures humaines Capteur infrarouge Calibrage de caméra Transfert d'apprentissage Apprentissage machine Reconstruction 3D Images synthétiques Réseau de neurones convolutionnel. Human posture recognition Infrared sensor Camera calibration Transfer learning Machine learning 3D reconstruction Synthetic images Convolution neural network
37	Principy a aplikace neuroevoluce / Neuroevolution Principles and Applications Herec, Jan January 2018 (has links) The theoretical part of this work deals with evolutionary algorithms (EA), neural networks (NN) and their synthesis in the form of neuroevolution. From a practical point of view, the aim of the work is to show the application of neuroevolution on two different tasks. The first task is the evolutionary design of the convolutional neural network (CNN) architecture that would be able to classify handwritten digits (from the MNIST dataset) with a high accurancy. The second task is the evolutionary optimization of neurocontroller for a simulated Falcon 9 rocket landing. Both tasks are computationally demanding and therefore have been solved on a supercomputer. As a part of the first task, it was possible to design such architectures which, when properly trained, achieve an accuracy of 99.49%. It turned out that it is possible to automate the design of high-quality architectures with the use of neuroevolution. Within the second task, the neuro-controller weights have been optimized so that, for defined initial conditions, the model of the Falcon booster can successfully land. Neuroevolution succeeded in both tasks.
38	SkeMo: A Web Application for Real-time Sketch-based Software Modeling Sharma Chapai, Alisha 19 July 2023 (has links) No description available. Engineering Computer Science model-driven software engineering machine learning convolution neural network image recognition sketch recognition class diagrams classifiers interface design touch interface MDSE UML IML modeling tool informal sketching formal modeling modeling tool
39	ACCELERATING SPARSE MACHINE LEARNING INFERENCE Ashish Gondimalla (14214179) 17 May 2024 (has links) <p>Convolutional neural networks (CNNs) have become important workloads due to their<br> impressive accuracy in tasks like image classification and recognition. Convolution operations<br> are compute intensive, and this cost profoundly increases with newer and better CNN models.<br> However, convolutions come with characteristics such as sparsity which can be exploited. In<br> this dissertation, we propose three different works to capture sparsity for faster performance<br> and reduced energy. </p> <p><br></p> <p>The first work is an accelerator design called <em>SparTen</em> for improving two-<br> sided sparsity (i.e, sparsity in both filters and feature maps) convolutions with fine-grained<br> sparsity. <em>SparTen</em> identifies efficient inner join as the key primitive for hardware acceleration<br> of sparse convolution. In addition, <em>SparTen</em> proposes load balancing schemes for higher<br> compute unit utilization. <em>SparTen</em> performs 4.7x, 1.8x and 3x better than dense architecture,<br> one-sided architecture and SCNN, the previous state of the art accelerator. The second work<br> <em>BARISTA</em> scales up SparTen (and SparTen like proposals) to large-scale implementation<br> with as many compute units as recent dense accelerators (e.g., Googles Tensor processing<br> unit) to achieve full speedups afforded by sparsity. However at such large scales, buffering,<br> on-chip bandwidth, and compute utilization are highly intertwined where optimizing for<br> one factor strains another and may invalidate some optimizations proposed in small-scale<br> implementations. <em>BARISTA</em> proposes novel techniques to balance the three factors in large-<br> scale accelerators. <em>BARISTA</em> performs 5.4x, 2.2x, 1.7x and 2.5x better than dense, one-<br> sided, naively scaled two-sided and an iso-area two-sided architecture, respectively. The last<br> work, <em>EUREKA</em> builds an efficient tensor core to execute dense, structured and unstructured<br> sparsity with losing efficiency. <em>EUREKA</em> achieves this by proposing novel techniques to<br> improve compute utilization by slightly tweaking operand stationarity. <em>EUREKA</em> achieves a<br> speedup of 5x, 2.5x, along with 3.2x and 1.7x energy reductions over Dense and structured<br> sparse execution respectively. <em>EUREKA</em> only incurs area and power overheads of 6% and<br> 11.5%, respectively, over Ampere</p> Digital processor architectures Energy-efficient computing High performance computing Deep neural networks sparsity exploitation convolution neural network Machine learning inference Machine learning accelerators GPUs tensor cores Computer Engineering Computer Architecture ASIC Computer systems organization Special purpose systems Sparse tensors Sparse matrix multiplication

Search results