Spelling suggestions: "subject:"deep beural betworks."" "subject:"deep beural conetworks.""
81 |
Learning to Map the Visual and Auditory WorldSalem, Tawfiq 01 January 2019 (has links)
The appearance of the world varies dramatically not only from place to place but also from hour to hour and month to month. Billions of images that capture this complex relationship are uploaded to social-media websites every day and often are associated with precise time and location metadata. This rich source of data can be beneficial to improve our understanding of the globe. In this work, we propose a general framework that uses these publicly available images for constructing dense maps of different ground-level attributes from overhead imagery. In particular, we use well-defined probabilistic models and a weakly-supervised, multi-task training strategy to provide an estimate of the expected visual and auditory ground-level attributes consisting of the type of scenes, objects, and sounds a person can experience at a location. Through a large-scale evaluation on real data, we show that our learned models can be used for applications including mapping, image localization, image retrieval, and metadata verification.
|
82 |
[en] ENHANCEMENT AND CONTINUOUS SPEECH RECOGNITION IN ADVERSE ENVIRONMENTS / [pt] REALCE E RECONHECIMENTO DE VOZ CONTÍNUA EM AMBIENTES ADVERSOSCHRISTIAN DAYAN ARCOS GORDILLO 13 June 2018 (has links)
[pt] Esta tese apresenta e examina contribuições inovadoras no front-end dos sistemas de reconhecimento automático de voz (RAV) para o realce e reconhecimento de voz em ambientes adversos. A primeira proposta consiste em aplicar um filtro de mediana sobre a função de distribuição de probabilidade de cada coeficiente cepstral antes de utilizar uma transformação para um domínio invariante às distorções, com o objetivo de adaptar a voz ruidosa ao ambiente limpo de referência através da modificação de histogramas. Fundamentadas nos resultados de estudos psicofísicos do sistema auditivo humano, que utiliza como princípio o fato de que o som que atinge o ouvido é sujeito a um processo chamado Análise de Cena Auditiva (ASA), o qual examina como o sistema auditivo separa as fontes de som que compõem a entrada acústica, três novas abordagens aplicadas independentemente foram propostas para realce e reconhecimento de voz. A primeira aplica a estimativa de uma nova máscara no domínio espectral usando o conceito da transformada de Fourier de tempo curto (STFT). A máscara proposta aplica a técnica Local Binary Pattern (LBP) à relação sinal ruído (SNR) de cada unidade de tempo-frequência (T-F) para estimar
uma máscara de vizinhança ideal (INM). Continuando com essa abordagem, propõe-se em seguida nesta tese o mascaramento usando as transformadas wavelet com base nos LBP para realçar os espectros temporais dos coeficientes wavelet nas altas frequências. Finalmente, é proposto um novo método de estimação da máscara INM, utilizando um algoritmo de aprendizagem supervisionado das Deep Neural Networks (DNN) com o objetivo de realizar a classificação de unidades T-F obtidas da saída dos bancos de
filtros pertencentes a uma mesma fonte de som (ou predominantemente voz ou predominantemente ruído). O desempenho é comparado com as técnicas de máscara tradicionais IBM e IRM, tanto em termos de qualidade objetiva da voz, como através de taxas de erro de palavra. Os resultados das técnicas
propostas evidenciam as melhoras obtidas em ambientes ruidosos, com diferenças significativamente superiores às abordagens convencionais. / [en] This thesis presents and examines innovative contributions in frontend of the automatic speech recognition systems (ASR) for enhancement and speech recognition in adverse environments. The first proposal applies
a median filter on the probability distribution function of each cepstral coefficient before using a transformation to a distortion-invariant domain, to adapt the corrupted voice to the clean reference environment by modifying histograms. Based on the results of psychophysical studies of the human
auditory system, which uses as a principle the fact that sound reaching the ear is subjected to a process called Auditory Scene Analysis (ASA), which examines how the auditory system separates the sound sources that make up the acoustic input, three new approaches independently applied were proposed for enhancement and speech recognition. The first applies the estimation of a new mask in the spectral domain using the short-time Fourier Transform (STFT) concept. The proposed mask applies the Local Binary Pattern (LBP) technique to the Signal-to-Noise Ratio (SNR) of each time-frequency unit (T-F) to estimate an Ideal Neighborhood Mask (INM). Continuing with this approach, the masking using LBP-based wavelet
transforms to highlight the temporal spectra of wavelet coefficients at high frequencies is proposed in this thesis. Finally, a new method of estimation of the INM mask is proposed, using a supervised learning algorithm of Deep Neural Network (DNN) to classify the T-F units obtained from the output of
the filter banks belonging to a same source of sound (or predominantly voice or predominantly noise). The performance is compared with traditional IBM and IRM mask techniques, both regarding objective voice quality and through word error rates. The results of the proposed methods show the improvements obtained in noisy environments, with differences significantly superior to the conventional approaches.
|
83 |
Modèle joint pour le traitement automatique de la langue : perspectives au travers des réseaux de neurones / Join model for NLP : a DNN frameworkTafforeau, Jérémie 20 November 2017 (has links)
Les recherches en Traitement Automatique des Langues (TAL) ont identifié différents niveaux d'analyse lexicale, syntaxique et sémantique. Il en découle un découpage hiérarchique des différentes tâches à réaliser afin d'analyser un énoncé. Les systèmes classiques du TAL reposent sur des analyseurs indépendants disposés en cascade au sein de chaînes de traitement (pipelines). Cette approche présente un certain nombre de limitations : la dépendance des modèles à la sélection empirique des traits, le cumul des erreurs dans le pipeline et la sensibilité au changement de domaine. Ces limitations peuvent conduire à des pertes de performances particulièrement importantes lorsqu'il existe un décalage entre les conditions d'apprentissage des modèles et celles d'utilisation. Un tel décalage existe lors de l'analyse de transcriptions automatiques de parole spontanée comme par exemple les conversations téléphoniques enregistrées dans des centres d'appels. En effet l'analyse d'une langue non-canonique pour laquelle il existe peu de données d'apprentissage, la présence de disfluences et de constructions syntaxiques spécifiques à l'oral ainsi que la présence d'erreurs de reconnaissance dans les transcriptions automatiques mènent à une détérioration importante des performances des systèmes d'analyse. C'est dans ce cadre que se déroule cette thèse, en visant à mettre au point des systèmes d'analyse à la fois robustes et flexibles permettant de dépasser les limitations des systèmes actuels à l'aide de modèles issus de l'apprentissage par réseaux de neurones profonds. / NLP researchers has identified different levels of linguistic analysis. This lead to a hierarchical division of the various tasks performed in order to analyze a text statement. The traditional approach considers task-specific models which are subsequently arranged in cascade within processing chains (pipelines). This approach has a number of limitations: the empirical selection of models features, the errors accumulation in the pipeline and the lack of robusteness to domain changes. These limitations lead to particularly high performance losses in the case of non-canonical language with limited data available such as transcriptions of conversations over phone. Disfluencies and speech-specific syntactic schemes, as well as transcription errors in automatic speech recognition systems, lead to a significant drop of performances. It is therefore necessary to develop robust and flexible systems. We intend to perform a syntactic and semantic analysis using a deep neural network multitask model while taking into account the variations of domain and/or language registers within the data.
|
84 |
Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural NetworksJanuary 2018 (has links)
abstract: Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.
To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model.
This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs. / Dissertation/Thesis / Masters Thesis Computer Engineering 2018
|
85 |
Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data / Architectures neuronales profondes pour l'apprentissage de représentation multimodales de données multimédiasVukotic, Verdran 26 September 2017 (has links)
La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et de mouvement : nous proposons une architecture permettant d'apprendre une représentation d'une image représentant une action humaine afin de prédire l'évolution du mouvement dans une vidéo ; l'originalité du modèle proposé réside dans sa capacité à prédire des images à une distance arbitraire dans une vidéo. 3) Encodeurs bidirectionnels multimodaux : le résultat majeur de la thèse concerne la proposition d'un réseau bidirectionnel permettant de traduire une modalité en une autre, offrant ainsi la possibilité de représenter conjointement plusieurs modalités. L'approche été étudiée principalement en structuration de collections de vidéos, dons le cadre d'évaluations internationales où l'approche proposée s'est imposée comme l'état de l'art. 4) Réseaux adverses pour la fusion multimodale: la thèse propose d'utiliser les architectures génératives adverses pour apprendre des représentations multimodales en offrant la possibilité de visualiser les représentations dans l'espace des images. / In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain.
|
86 |
Reduction of Temperature Forecast Errors with Deep Neural Networks / Reducering av temperaturprognosfel med djupa neuronnätverkIsaksson, Robin January 2018 (has links)
Deep artificial neural networks is a type of machine learning which can be used to find and utilize patterns in data. One of their many applications is as method for regression analysis. In this thesis deep artificial neural networks were implemented in the application of estimating the error of surface temperature forecasts as produced by a numerical weather prediction model. An ability to estimate the error of forecasts is synonymous with the ability to reduce forecast errors as the estimated error can be offset from the actual forecast. Six years of forecast data from the period 2010--2015 produced by the European Centre for Medium-Range Weather Forecasts' (ECMWF) numerical weather prediction model together with data from fourteen meteorological observational stations were used to train and evaluate error-predicting deep neural networks. The neural networks were able to reduce the forecast errors for all the locations that were tested to a varying extent. The largest reduction in error was by 83.0\% of the original error or a 16.7\degcs decrease in the mean-square error. The performance of the neural networks' error reduction ability was compared with that of a contemporary Kalman filter as implemented by the Swedish Meteorological and Hydrological Institute (SMHI). It was shown that the neural network implementation had superior performance for six out of seven of the evaluated stations where the Kalman filter had marginally better performance at one station.
|
87 |
Learning Compact Architectures for Deep Neural NetworksSrinivas, Suraj January 2017 (has links) (PDF)
Deep neural networks with millions of parameters are at the heart of many state of the art computer vision models. However, recent works have shown that models with much smaller number of parameters can often perform just as well. A smaller model has the advantage of being faster to evaluate and easier to store - both of which are crucial for real-time and embedded applications. While prior work on compressing neural networks have looked at methods based on sparsity, quantization and factorization of neural network layers, we look at the alternate approach of pruning neurons.
Training Neural Networks is often described as a kind of `black magic', as successful training requires setting the right hyper-parameter values (such as the number of neurons in a layer, depth of the network, etc ). It is often not clear what these values should be, and these decisions often end up being either ad-hoc or driven through extensive experimentation. It would be desirable to automatically set some of these hyper-parameters for the user so as to minimize trial-and-error. Combining this objective with our earlier preference for smaller models, we ask the following question - for a given task, is it possible to come up with small neural network architectures automatically? In this thesis, we propose methods to achieve the same.
The work is divided into four parts. First, given a neural network, we look at the problem of identifying important and unimportant neurons. We look at this problem in a data-free setting, i.e; assuming that the data the neural network was trained on, is not available. We propose two rules for identifying wasteful neurons and show that these suffice in such a data-free setting. By removing neurons based on these rules, we are able to reduce model size without significantly affecting accuracy.
Second, we propose an automated learning procedure to remove neurons during the process of training. We call this procedure ‘Architecture-Learning’, as this automatically discovers the optimal width and depth of neural networks. We empirically show that this procedure is preferable to trial-and-error based Bayesian Optimization procedures for selecting neural network architectures.
Third, we connect ‘Architecture-Learning’ to a popular regularize called ‘Dropout’, and propose a novel regularized which we call ‘Generalized Dropout’. From a Bayesian viewpoint, this method corresponds to a hierarchical extension of the Dropout algorithm. Empirically, we observe that Generalized Dropout corresponds to a more flexible version of Dropout, and works in scenarios where Dropout fails.
Finally, we apply our procedure for removing neurons to the problem of removing weights in a neural network, and achieve state-of-the-art results in scarifying neural networks.
|
88 |
Deep Neural Networks and Image Analysis for Quantitative MicroscopySadanandan, Sajith Kecheril January 2017 (has links)
Understanding biology paves the way for discovering drugs targeting deadly diseases like cancer, and microscopy imaging is one of the most informative ways to study biology. However, analysis of large numbers of samples is often required to draw statistically verifiable conclusions. Automated approaches for analysis of microscopy image data makes it possible to handle large data sets, and at the same time reduce the risk of bias. Quantitative microscopy refers to computational methods for extracting measurements from microscopy images, enabling detection and comparison of subtle changes in morphology or behavior induced by varying experimental conditions. This thesis covers computational methods for segmentation and classification of biological samples imaged by microscopy. Recent increase in computational power has enabled the development of deep neural networks (DNNs) that perform well in solving real world problems. This thesis compares classical image analysis algorithms for segmentation of bacteria cells and introduces a novel method that combines classical image analysis and DNNs for improved cell segmentation and detection of rare phenotypes. This thesis also demonstrates a novel DNN for segmentation of clusters of cells (spheroid), with varying sizes, shapes and textures imaged by phase contrast microscopy. DNNs typically require large amounts of training data. This problem is addressed by proposing an automated approach for creating ground truths by utilizing multiple imaging modalities and classical image analysis. The resulting DNNs are applied to segment unstained cells from bright field microscopy images. In DNNs, it is often difficult to understand what image features have the largest influence on the final classification results. This is addressed in an experiment where DNNs are applied to classify zebrafish embryos based on phenotypic changes induced by drug treatment. The response of the trained DNN is tested by ablation studies, which revealed that the networks do not necessarily learn the features most obvious at visual examination. Finally, DNNs are explored for classification of cervical and oral cell samples collected for cancer screening. Initial results show that the DNNs can respond to very subtle malignancy associated changes. All the presented methods are developed using open-source tools and validated on real microscopy images.
|
89 |
Efficient and Robust Deep Learning through Approximate ComputingSanchari Sen (9178400) 28 July 2020 (has links)
<p>Deep
Neural Networks (DNNs) have greatly advanced the state-of-the-art in a wide range
of machine learning tasks involving image, video, speech and text analytics,
and are deployed in numerous widely-used products and services. Improvements in
the capabilities of hardware platforms such as Graphics Processing Units (GPUs)
and specialized accelerators have been instrumental in enabling these advances
as they have allowed more complex and accurate networks to be trained and
deployed. However, the enormous computational and memory demands of DNNs
continue to increase with growing data size and network complexity, posing a
continuing challenge to computing system designers. For instance,
state-of-the-art image recognition DNNs require hundreds of millions of
parameters and hundreds of billions of multiply-accumulate operations while
state-of-the-art language models require hundreds of billions of parameters and
several trillion operations to process a single input instance. Another major
obstacle in the adoption of DNNs, despite their impressive accuracies on a range
of datasets, has been their lack of robustness. Specifically, recent efforts
have demonstrated that small, carefully-introduced input perturbations can
force a DNN to behave in unexpected and erroneous ways, which can have to
severe consequences in several safety-critical DNN applications like healthcare
and autonomous vehicles. In this dissertation, we explore approximate computing
as an avenue to improve the speed and energy efficiency of DNNs, as well as
their robustness to input perturbations.</p>
<p> </p>
<p>Approximate
computing involves executing selected computations of an application in an
approximate manner, while generating favorable trade-offs between computational
efficiency and output quality. The intrinsic error resilience of machine learning
applications makes them excellent candidates for approximate computing, allowing
us to achieve execution time and energy reductions with minimal effect on the
quality of outputs. This dissertation performs a comprehensive analysis of
different approximate computing techniques for improving the execution efficiency
of DNNs. Complementary to generic approximation techniques like quantization,
it identifies approximation opportunities based on the specific characteristics
of three popular classes of networks - Feed-forward Neural Networks (FFNNs),
Recurrent Neural Networks (RNNs) and Spiking Neural Networks (SNNs), which vary
considerably in their network structure and computational patterns.</p>
<p> </p>
<p>First, in
the context of feed-forward neural networks, we identify sparsity, or the presence
of zero values in the data structures (activations, weights, gradients and errors),
to be a major source of redundancy and therefore, an easy target for
approximations. We develop lightweight micro-architectural and instruction set
extensions to a general-purpose processor core that enable it to dynamically
detect zero values when they are loaded and skip future instructions that are
rendered redundant by them. Next, we explore LSTMs (the most widely used class
of RNNs), which map sequences from an input space to an output space. We
propose hardware-agnostic approximations that dynamically skip redundant
symbols in the input sequence and discard redundant elements in the state
vector to achieve execution time benefits. Following that, we consider SNNs,
which are an emerging class of neural networks that represent and process
information in the form of sequences of binary spikes. Observing that spike-triggered
updates along synaptic connections are the dominant operation in SNNs, we
propose hardware and software techniques to identify connections that can be
minimally impact the output quality and deactivate them dynamically, skipping any
associated updates.</p>
<p> </p>
<p>The
dissertation also delves into the efficacy of combining multiple approximate computing
techniques to improve the execution efficiency of DNNs. In particular, we focus
on the combination of quantization, which reduces the precision of DNN data-structures,
and pruning, which introduces sparsity in them. We observe that the ability of
pruning to reduce the memory demands of quantized DNNs decreases with precision
as the overhead of storing non-zero locations alongside the values starts to
dominate in different sparse encoding schemes. We analyze this overhead and the
overall compression of three different sparse formats across a range of
sparsity and precision values and propose a hybrid compression scheme that
identifies that optimal sparse format for a pruned low-precision DNN.</p>
<p> </p>
<p>Along with
improved execution efficiency of DNNs, the dissertation explores an additional
advantage of approximate computing in the form of improved robustness. We
propose ensembles of quantized DNN models with different numerical precisions as
a new approach to increase robustness against adversarial attacks. It is based on
the observation that quantized neural networks often demonstrate much higher robustness
to adversarial attacks than full precision networks, but at the cost of a substantial
loss in accuracy on the original (unperturbed) inputs. We overcome this limitation
to achieve the best of both worlds, i.e., the higher unperturbed accuracies of
the full precision models combined with the higher robustness of the low
precision models, by composing them in an ensemble.</p>
<p> </p>
<p><br></p><p>In
summary, this dissertation establishes approximate computing as a promising direction
to improve the performance, energy efficiency and robustness of neural networks.</p>
|
90 |
Hluboké neuronové sítě pro klasifikaci objektů v obraze / Deep Neural Networks for Classifying Objects in an ImageMlynarič, Tomáš January 2018 (has links)
This paper deals with classifying objects using deep neural networks. Whole scene segmentation was used as main algorithm for the classification purpose which works with video sequences and obtains information between two video frames. Optical flow was used for getting information from the video frames, based on which features maps of a~neural network are warped. Two neural network architectures were adjusted to work with videos and experimented with. Results of the experiments show, that using videos for image segmentation improves accuracy (IoU) compared to the same architecture working with images.
|
Page generated in 0.0733 seconds