• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 16
  • 7
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 41
  • 41
  • 41
  • 23
  • 14
  • 14
  • 10
  • 10
  • 7
  • 7
  • 6
  • 6
  • 5
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Ljudklassificering med Tensorflow och IOT-enheter : En teknisk studie

Karlsson, David January 2020 (has links)
Artificial Inteligens and machine learning has started to get established as reco- gnizable terms to the general masses in their daily lives. Applications such as voice recognicion and image recognicion are used widely in mobile phones and autonomous systems such as self-drivning cars. This study examines how one can utilize this technique to classify sound as a complement to videosurveillan- ce in different settings, for example a busstation or other areas that might need monitoring. To be able to do this a technique called Convolution Neural Ne- twork has been used since this is a popular architecture to use when it comes to image classification. In this model every sound has a visual representation in form of a spectogram that showes frequencies over time. One of the main goals of this study has been to be able to apply this technique on so called IOT units to be able to classify sounds in real time, this because of the fact that these units are relativly affordable and requires little resources. A Rasberry Pi was used to run a prototype version using tensorflow & keras as base api ́s. The studys re- sults show which parts that are important to consider to be able to get a good and reliable system, for example which hardware and software that is needed to get started. The results also shows what factors is important to be able to stream live sound and get reliable results, a classification models architecture is very important where different layers and parameters can have a large impact on the end result. / Termer som Artificiell Intelligens och maskininlärning har under de senaste åren börjat etablera sig hos den breda massan och är numera någonting som på- verkar nästan alla människors vardagliga liv i någon form. Vanliga använd- ningsområden är röststyrning och bildigenkänning som bland annat används i mobiltelefoner och autonoma system som självkörande bilar med mera. Den här studien utforskar hur man kan använda sig av denna teknik för att kunna klassi- ficera ljud som ett komplement till videoövervakning i olika miljöer, till exem- pel på en busstation eller andra övervakningsobjekt. För att göra detta har en teknik kallad Convolution Neural Network använts, vilket är en mycket populär arkitektur att använda vid klassificering av bilder. I denna modell har varje ljud fått en visuell representation i form av ett spektogram som visar frekvenser över tid. Ett av huvudmålen med denna studie har varit att kunna applicera denna teknik på så kallade IOT-enheter för att klassificera ljud i realtid. Dessa är rela- tivt billiga och resurssnåla enheter vilket gör dem till ett attraktivt alternativ för detta ändamål. I denna studie används en Raspberry Pi för att köra en prototyp- version med Tensorflow & Keras som grund APIer. Studien visar bland annat på vilka moment och delar som är viktiga att tänka på för att få igång ett smidigt och pålitligt system, till exempel vilken hårdvara och mjukvara som krävs för att starta. Den visar också på vilka faktorer som spelar in för att kunna streama ljud med bra resultat, detta då en klassifikationsmodells arkitektur och upp- byggnad kan ha stor påverkan på slutresultatet.
32

Compressed Convolutional Neural Network for Autonomous Systems

Durvesh Pathak (5931110) 17 January 2019 (has links)
The word “Perception” seems to be intuitive and maybe the most straightforward problem for the human brain because as a child we have been trained to classify images, detect objects, but for computers, it can be a daunting task. Giving intuition and reasoning to a computer which has mere capabilities to accept commands and process those commands is a big challenge. However, recent leaps in hardware development, sophisticated software frameworks, and mathematical techniques have made it a little less daunting if not easy. There are various applications built around to the concept of “Perception”. These applications require substantial computational resources, expensive hardware, and some sophisticated software frameworks. Building an application for perception for the embedded system is an entirely different ballgame. Embedded system is a culmination of hardware, software and peripherals developed for specific tasks with imposed constraints on memory and power. Therefore, the applications developed should keep in mind the memory and power constraints imposed due to the nature of these systems.Before 2012, the problems related to “Perception” such as classification, object detection were solved using algorithms with manually engineered features. However, in recent years, instead of manually engineering the features, these features are learned through learning algorithms. The game-changing architecture of Convolution Neural Networks proposed in 2012 by Alex K, provided a tremendous momentum in the direction of pushing Neural networks for perception. This thesis is an attempt to develop a convolution neural network architecture for embedded systems, i.e. an architecture that has a small model size and competitive accuracy. Recreate state-of-the-art architectures using fire module’s concept to reduce the model size of the architecture. The proposed compact models are feasible for deployment on embedded devices such as the Bluebox 2.0. Furthermore, attempts are made to integrate the compact Convolution Neural Network with object detection pipelines.
33

基於圖像資訊之音樂資訊檢索研究 / A study of image-based music information retrieval

夏致群 Unknown Date (has links)
以往的音樂資訊檢索方法多使用歌詞、曲風、演奏的樂器或一段音頻訊號來當作查詢的媒介,然而,在某些情況下,使用者沒有辦法清楚描述他們想要尋找的歌曲,如:情境式的音樂檢索。本論文提出了一種基於圖像的情境式音樂資訊檢索方法,可以透過輸入圖片來找尋相應的音樂。此方法中我們使用了卷積神經網絡(Convolutional Neural Network)技術來處理圖片,將其轉為低維度的表示法。為了將異質性的多媒體訊息映射到同一個向量空間,資訊網路表示法學習(Network Embedding)技術也被使用,如此一來,可以使用距離計算找回和輸入圖片有關的多媒體訊息。我們相信這樣的方法可以改善異質性資訊間的隔閡(Heterogeneous Gap),也就是指不同種類的多媒體檔案之間無法互相轉換或詮釋。在實驗與評估方面,首先利用從歌詞與歌名得到的關鍵字來搜尋大量圖片當作訓練資料集,接著實作提出的檢索方法,並針對實驗結果做評估。除了對此方法的有效性做測試外,使用者的回饋也顯示此檢索方法和其他方法相比是有效的。同時我們也實作了一個網路原型,使用者可以上傳圖片並得到檢索後的歌曲,實際的使用案例也將在本論文中被展示與介紹。 / Listening to music is indispensable to everyone. Music information retrieval systems help users find their favorite music. A common scenario of music information retrieval systems is to search songs based on user's query. Most existing methods use descriptions (e.g., genre, instrument and lyric) or audio signal of music as the query; then the songs related to the query will be retrieved. The limitation of this scenario is that users might be difficult to describe what they really want to search for. In this paper, we propose a novel method, called ``image2song,'' which allows users to input an image to retrieve the related songs. The proposed method consists of three modules: convolutional neural network (CNN) module, network embedding module, and similarity calculation module. For the processing of the images, in our work the CNN is adopted to learn the representations for images. To map each entity (e.g., image, song, and keyword) into a same embedding space, the heterogeneous representation is learned by network embedding algorithm from the information graph. This method is flexible because it is easy to join other types of multimedia data into the information graph. In similarity calculation module, the Euclidean distance and cosine distance is used as our criterion to compare the similarity. Then we can retrieve the most relevant songs according to the similarity calculation. The experimental results show that the proposed method has a good performance. Furthermore, we also build an online image-based music information retrieval prototype system, which can showcase some examples of our experiments.
34

Analyse de la réduction du chatoiement sur les images radar polarimétrique à l'aide des réseaux neuronaux à convolutions

Beaulieu, Mario 04 1900 (has links)
En raison de la nature cohérente du signal RADAR à synthèse d’ouverture (RSO), les images RSO polarimétriques (RSOPOL) sont affectées par le bruit de chatoiement. L’effet du chatoiement peut être sévère au point de rendre inutilisable la donnée RSOPOL. Ceci est particulièrement vrai pour les données à une vue qui souffrent d’un chatoiement très intense.Un filtrage du bruit est nécessaire pour améliorer l’estimation des paramètres polarimétriques pouvant être calculés à partir de ce type de données. Cette opération constitue une étape importante dans le traitement et l’analyse des images RSOPOL. Récemment une nouvelle approche est apparue en traitement de données visant la solution d’une multitude de problèmes dont le filtrage, la restauration d’images, la reconnaissance de la parole, la classification ou la segmentation d’images. Cette approche est l’apprentissage profond et les réseaux de neurones à convolution (RNC). Des travaux récents montrent que les RNC sont une alternative prometteuse pour le filtrages des images RSO. En effet par leur capacité d’apprendre un modèle optimal de filtrage, ils tendent à surpasser les approches classiques du filtrage sur les images RSO. L’objectif de cette présente étude est d’analyser et d’évaluer l’efficacité du filtrage par RNC sur des données RSOPOL simulées et sur des images satellitaires RSOPOL RADARSAT-2, ALOS/PalSAR et GaoFen-3 acquises sur la région urbaine de San Francisco (Californie). Des modèles inspirés de l’architecture d’un RNC utilisé notamment en Super-résolution ont été adaptés pour le filtrage de la matrice de cohérence polarimétrique. L’effet de différents paramètres structuraux de l’architecture des RNC sur le filtrage ont été analysés, parmi ceux-ci on retrouve entre autres la profondeur du réseau (le nombre de couches empilées), la largeur du réseau (le nombre de filtres par couches convolutives) et la taille des filtres de la première couche convolutive. L’apprentissage des modèles a été effectué par la rétropropagation du gradient de l’erreur en utilisant 3 ensembles de données qui simulent la polarimétrie une vue des diffuseurs selon les classes de Cloude-Pottier. Le premier ensemble ne comporte que des zones homogènes.Les deux derniers ensembles sont composés de simulations en patchwork dont l’intensité locale est simulée par des images de texture et de cibles ponctuelles ajoutées au patchwork dans le cas du dernier ensemble. Les performances des différents filtres par RNC ont été mesurées par des indicateurs comprenant l’erreur relative sur l’estimation de signatures polarimétriques et des paramètres de décomposition ainsi que des mesures de distorsion sur la récupération des détails importants et sur la conservation des cibles ponctuelles. Les résultats montrent que le filtrage par RNC des données polarimétriques est soit équivalent ou nettement supérieur aux filtres conventionnellement utilisées en polarimétrie.Les résultats des modèles les plus profonds obtiennent les meilleures performances pour tous les indicateurs sur l’ensemble des données homogènes simulées. Dans le cas des données en patchwork, les résultats pour la restauration des détails sont nettement favorables au filtrage par RNC les plus profonds.L’application du filtrage par RNC sur les images satellitaires RADARSAT-2,ALOS/PalSAR ainsi GaoFen-3 montre des résultats comparables ou supérieurs aux filtres conventionnels. Les meilleurs résultats ont été obtenus par le modèle à 5 couches cachées(si on ne compte pas la couche d’entrée et de sortie), avec 8 filtres 3×3 par couche convolutive, sauf pour la couche d’entrée où la taille des filtres étaient de 9×9. Par contre,les données d’apprentissage doivent être bien ajustées à l’étendue des statistiques des images polarimétriques réelles pour obtenir de bon résultats. Ceci est surtout vrai au niveau de la modélisation des cibles ponctuelles dont la restauration semblent plus difficiles. / Due to the coherent nature of the Synthetic Aperture Radar (SAR) signal, polarimetric SAR(POLSAR) images are affected by speckle noise. The effect of speckle can be so severe as to render the POLSAR data unusable. This is especially true for single-look data that suffer from very intense speckle. Noise filtering is necessary to improve the estimation of polarimetric parameters that can be computed from this type of data. This is an important step in the processing and analysis of POLSAR images. Recently, a new approach has emerged in data processing aimed at solving a multi-tude of problems including filtering, image restoration, speech recognition, classification orimage segmentation. This approach is deep learning and convolutional neural networks(CONVNET). Recent works show that CONVNET are a promising alternative for filtering SAR images. Indeed, by their ability to learn an optimal filtering model only from the data, they tend to outperform classical approaches to filtering on SAR images. The objective of this study is to analyze and evaluate the effectiveness of CONVNET filtering on simulated POLSAR data and on RADARSAT-2, ALOS/PalSAR and GaoFen-3 satellite images acquired over the San Francisco urban area (California). Models inspired by the architecture of a CONVNET used in particular in super-resolution have been adapted for the filtering of the polarimetric coherency matrix. The effect of different structural parameters of theCONVNET architecture on filtering were analyzed, among which are the depth of the neural network (the number of stacked layers), the width of the neural network (the number of filters per convoluted layer) and the size of the filters of the first convolution layer. The models were learned by backpropagation of the error gradient using 3 datasets that simulate single-look polarimetry of the scatterers according to Cloude-Pottier classes. The first dataset contains only homogeneous areas. The last two datasets consist of patchwork simulations where local intensity is simulated by texture images and point target are added to the patchwork in the case of the last dataset. The performance of the different filters by CONVNET was measured by indicators including relative error on the estimation of polarimetric signatures and decomposition parameters as well as distortion measurements on the recovery of major details and on the conservation of point targets.The results show that CONVNET filtering of polarimetric data is either equivalent or significantly superior to conventional polarimetric filters. The results of the deepest models obtain the best performance for all indicators over the simulated homogeneous dataset. Inthe case of patchwork dataset, the results for detail restoration are clearly favourable to the deepest CONVNET filtering. The application of CONVNET filtering on RADARSAT-2, ALOS/PalSAR andGaoFen-3 satellite images shows results comparable or superior to conventional filters. The best results were obtained by the 5 hidden layers model (not counting the input and outputlayers), with 8 filters 3×3 per convolutional layer, except for the input layer where the filtersize was 9×9. On the other hand, the training data must be well adjusted to the statistical range of the real polarimetric images to obtain good results. This is especially true when modeling point targets that appear to be more difficult to restore.
35

Word2vec modely s přidanou kontextovou informací / Word2vec Models with Added Context Information

Šůstek, Martin January 2017 (has links)
This thesis is concerned with the explanation of the word2vec models. Even though word2vec was introduced recently (2013), many researchers have already tried to extend, understand or at least use the model because it provides surprisingly rich semantic information. This information is encoded in N-dim vector representation and can be recall by performing some operations over the algebra. As an addition, I suggest a model modifications in order to obtain different word representation. To achieve that, I use public picture datasets. This thesis also includes parts dedicated to word2vec extension based on convolution neural network.
36

Enhancing failure prediction from timeseries histogram data : through fine-tuned lower-dimensional representations

Jayaraman, Vijay January 2023 (has links)
Histogram data are widely used for compressing high-frequency time-series signals due to their ability to capture distributional informa-tion. However, this compression comes at the cost of increased di-mensionality and loss of contextual details from the original features.This study addresses the challenge of effectively capturing changesin distributions over time and their contribution to failure prediction.Specifically, we focus on the task of predicting Time to Event (TTE) forturbocharger failures.In this thesis, we propose a novel approach to improve failure pre-diction by fine-tuning lower-dimensional representations of bi-variatehistograms. The goal is to optimize these representations in a waythat enhances their ability to predict component failure. Moreover, wecompare the performance of our learned representations with hand-crafted histogram features to assess the efficacy of both approaches.We evaluate the different representations using the Weibull Time ToEvent - Recurrent Neural Network (WTTE-RNN) framework, which isa popular choice for TTE prediction tasks. By conducting extensive ex-periments, we demonstrate that the fine-tuning approach yields supe-rior results compared to general lower-dimensional learned features.Notably, our approach achieves performance levels close to state-of-the-art results.This research contributes to the understanding of effective failureprediction from time series histogram data. The findings highlightthe significance of fine-tuning lower-dimensional representations forimproving predictive capabilities in real-world applications. The in-sights gained from this study can potentially impact various indus-tries, where failure prediction is crucial for proactive maintenanceand reliability enhancement.
37

Hybridní hluboké metody pro automatické odpovídání na otázky / Hybrid Deep Question Answering

Aghaebrahimian, Ahmad January 2019 (has links)
Title: Hybrid Deep Question Answering Author: Ahmad Aghaebrahimian Institute: Institute of Formal and Applied Linguistics Supervisor: RNDr. Martin Holub, Ph.D., Institute of Formal and Applied Lin- guistics Abstract: As one of the oldest tasks of Natural Language Processing, Question Answering is one of the most exciting and challenging research areas with lots of scientific and commercial applications. Question Answering as a discipline in the conjunction of computer science, statistics, linguistics, and cognitive science is concerned with building systems that automatically retrieve answers to ques- tions posed by humans in a natural language. This doctoral dissertation presents the author's research carried out in this discipline. It highlights his studies and research toward a hybrid Question Answering system consisting of two engines for Question Answering over structured and unstructured data. The structured engine comprises a state-of-the-art Question Answering system based on knowl- edge graphs. The unstructured engine consists of a state-of-the-art sentence-level Question Answering system and a word-level Question Answering system with results near to human performance. This work introduces a new Question An- swering dataset for answering word- and sentence-level questions as well. Start- ing from a...
38

Reconnaissance de postures humaines par fusion de la silhouette et de l'ombre dans l'infrarouge

Gouiaa, Rafik 01 1900 (has links)
Les systèmes multicaméras utilisés pour la vidéosurveillance sont complexes, lourds et coûteux. Pour la surveillance d'une pièce, serait-il possible de les remplacer par un système beaucoup plus simple utilisant une seule caméra et une ou plusieurs sources lumineuses en misant sur les ombres projetées pour obtenir de l'information 3D ? Malgré les résultats intéressants offerts par les systèmes multicaméras, la quantité d'information à traiter et leur complexité limitent grandement leur usage. Dans le même contexte, nous proposons de simplifier ces systèmes en remplaçant une caméra par une source lumineuse. En effet, une source lumineuse peut être vue comme une caméra qui génère une image d'ombre révélant l'objet qui bloque la lumière. Notre système sera composé par une seule caméra et une ou plusieurs sources lumineuses infrarouges (invisibles à l'oeil). Malgré les difficultés prévues quant à l'extraction de l'ombre et la déformation et l'occultation de l'ombre par des obstacles (murs, meubles...), les gains sont multiples en utilisant notre système. En effet, on peut éviter ainsi les problèmes de synchronisation et de calibrage de caméras et réduire le coût en remplaçant des caméras par de simples sources infrarouges. Nous proposons deux approches différentes pour automatiser la reconnaissance de postures humaines. La première approche reconstruit la forme 3D d'une personne pour faire la reconnaissance de la posture en utilisant des descripteurs de forme. La deuxième approche combine directement l'information 2D (ombre+silhouette) pour faire la reconnaissance de postures. Scientifiquement, nous cherchons à prouver que l'information offerte par une silhouette et l'ombre générée par une source lumineuse est suffisante pour permettre la reconnaissance de postures humaines élémentaires (p.ex. debout, assise, couchée, penchée, etc.). Le système proposé peut être utilisé pour la vidéosurveillance d'endroits non encombrés tels qu'un corridor dans une résidence de personnes âgées (pour la détection des chutes p. ex.) ou d'une compagnie (pour la sécurité). Son faible coût permettrait un plus grand usage de la vidéosurveillance au bénéfice de la société. Au niveau scientifique, la démonstration théorique et pratique d'un tel système est originale et offre un grand potentiel pour la vidéosurveillance. / Human posture recognition (HPR) from video sequences is one of the major active research areas of computer vision. It is one step of the global process of human activity recognition (HAR) for behaviors analysis. Many HPR application systems have been developed including video surveillance, human-machine interaction, and the video retrieval. Generally, applications related to HPR can be achieved using mainly two approaches : single camera or multi-cameras. Despite the interesting performance achieved by multi-camera systems, their complexity and the huge information to be processed greatly limit their widespread use for HPR. The main goal of this thesis is to simplify the multi-camera system by replacing a camera by a light source. In fact, a light source can be seen as a virtual camera, which generates a cast shadow image representing the silhouette of the person that blocks the light. Our system will consist of a single camera and one or more infrared light sources. Despite some technical difficulties in cast shadow segmentation and cast shadow deformation because of walls and furniture, different advantages can be achieved by using our system. Indeed, we can avoid the synchronization and calibration problems of multiple cameras, reducing the cost of the system and the amount of processed data by replacing a camera by one light source. We introduce two different approaches in order to automatically recognize human postures. The first approach directly combines the person’s silhouette and cast shadow information, and uses 2D silhouette descriptor in order to extract discriminative features useful for HPR. The second approach is inspired from the shape from silhouette technique to reconstruct the visual hull of the posture using a set of cast shadow silhouettes, and extract informative features through 3D shape descriptor. Using these approaches, our goal is to prove the utility of the combination of person’s silhouette and cast shadow information for recognizing elementary human postures (stand, bend, crouch, fall,...) The proposed system can be used for video surveillance of uncluttered areas such as a corridor in a senior’s residence (for example, for the detection of falls) or in a company (for security). Its low cost may allow greater use of video surveillance for the benefit of society.
39

Principy a aplikace neuroevoluce / Neuroevolution Principles and Applications

Herec, Jan January 2018 (has links)
The theoretical part of this work deals with evolutionary algorithms (EA), neural networks (NN) and their synthesis in the form of neuroevolution. From a practical point of view, the aim of the work is to show the application of neuroevolution on two different tasks. The first task is the evolutionary design of the convolutional neural network (CNN) architecture that would be able to classify handwritten digits (from the MNIST dataset) with a high accurancy. The second task is the evolutionary optimization of neurocontroller for a simulated Falcon 9 rocket landing. Both tasks are computationally demanding and therefore have been solved on a supercomputer. As a part of the first task, it was possible to design such architectures which, when properly trained, achieve an accuracy of 99.49%. It turned out that it is possible to automate the design of high-quality architectures with the use of neuroevolution. Within the second task, the neuro-controller weights have been optimized so that, for defined initial conditions, the model of the Falcon booster can successfully land. Neuroevolution succeeded in both tasks.
40

SkeMo: A Web Application for Real-time Sketch-based Software Modeling

Sharma Chapai, Alisha 19 July 2023 (has links)
No description available.

Page generated in 0.1378 seconds