Global ETD Search

21	Algoritmy pro rozpoznávání pojmenovaných entit / Algorithms for named entities recognition Winter, Luca January 2017 (has links) The aim of this work is to find out which algorithm is the best at recognizing named entities in e-mail messages. The theoretical part explains the existing tools in this field. The practical part describes the design of two tools specifically designed to create new models capable of recognizing named entities in e-mail messages. The first tool is based on a neural network and the second tool uses a CRF graph model. The existing and newly created tools and their ability to generalize are compared on a subset of e-mail messages provided by Kiwi.com.
22	Réseaux de neurones récurrents pour le traitement automatique de la parole / Speech processing using recurrent neural networks Gelly, Grégory 22 September 2017 (has links) Le domaine du traitement automatique de la parole regroupe un très grand nombre de tâches parmi lesquelles on trouve la reconnaissance de la parole, l'identification de la langue ou l'identification du locuteur. Ce domaine de recherche fait l'objet d'études depuis le milieu du vingtième siècle mais la dernière rupture technologique marquante est relativement récente et date du début des années 2010. C'est en effet à ce moment qu'apparaissent des systèmes hybrides utilisant des réseaux de neurones profonds (DNN) qui améliorent très notablement l'état de l'art. Inspirés par le gain de performance apporté par les DNN et par les travaux d'Alex Graves sur les réseaux de neurones récurrents (RNN), nous souhaitions explorer les capacités de ces derniers. En effet, les RNN nous semblaient plus adaptés que les DNN pour traiter au mieux les séquences temporelles du signal de parole. Dans cette thèse, nous nous intéressons tout particulièrement aux RNN à mémoire court-terme persistante (Long Short Term Memory (LSTM) qui permettent de s'affranchir d'un certain nombre de difficultés rencontrées avec des RNN standards. Nous augmentons ce modèle et nous proposons des processus d'optimisation permettant d'améliorer les performances obtenues en segmentation parole/non-parole et en identification de la langue. En particulier, nous introduisons des fonctions de coût dédiées à chacune des deux tâches: un simili-WER pour la segmentation parole/non-parole dans le but de diminuer le taux d'erreur d'un système de reconnaissance de la parole et une fonction de coût dite de proximité angulaire pour les problèmes de classification multi-classes tels que l'identification de la langue parlée. / Automatic speech processing is an active field of research since the 1950s. Within this field the main area of research is automatic speech recognition but simpler tasks such as speech activity detection, language identification or speaker identification are also of great interest to the community. The most recent breakthrough in speech processing appeared around 2010 when speech recognition systems using deep neural networks drastically improved the state-of-the-art. Inspired by this gains and the work of Alex Graves on recurrent neural networks (RNN), we decided to explore the possibilities brought by these models on realistic data for two different tasks: speech activity detection and spoken language identification. In this work, we closely look at a specific model for the RNNs: the Long Short Term Memory (LSTM) which mitigates a lot of the difficulties that can arise when training an RNN. We augment this model and introduce optimization methods that lead to significant performance gains for speech activity detection and language identification. More specifically, we introduce a WER-like loss function to train a speech activity detection system so as to minimize the word error rate of a downstream speech recognition system. We also introduce two different methods to successfully train a multiclass classifier based on neural networks for tasks such as LID. The first one is based on a divide-and-conquer approach and the second one is based on an angular proximity loss function. Both yield performance gains but also speed up the training process. Réseaux de neurones récurrents Reconnaissance de la parole LSTM Recurrent neural networks Speech recognition LSTM
23	Learning Long Temporal Sequences in Spiking Networks by Multiplexing Neural Oscillations Vincent-Lamarre, Philippe 17 December 2019 (has links) Many living organisms have the ability to execute complex behaviors and cognitive processes that are reliable. In many cases, such tasks are generated in the absence of an ongoing external input that could drive the activity on their underlying neural populations. For instance, writing the word "time" requires a precise sequence of muscle contraction in the hand and wrist. There has to be some patterns of activity in the areas of the brain responsible for this behaviour that are endogenously generated every time an individual performs this action. Whereas the question of how such neural code is transformed in the target motor sequence is a question of its own, their origin is perhaps even more puzzling. Most models of cortical and sub-cortical circuits suggest that many of their neural populations are chaotic. This means that very small amounts of noise, such as an additional action potential in a neuron of a network, can lead to completely different patterns of activity. Reservoir computing is one of the first frameworks that provided an efficient solution for biologically relevant neural networks to learn complex temporal tasks in the presence of chaos. We showed that although reservoirs (i.e. recurrent neural networks) are robust to noise, they are extremely sensitive to some forms of structural perturbations, such as removing one neuron out of thousands. We proposed an alternative to these models, where the source of autonomous activity is no longer originating from the reservoir, but from a set of oscillating networks projecting to the reservoir. In our simulations, we show that this solution produce rich patterns of activity and lead to networks that are both resistant to noise and structural perturbations. The model can learn a wide variety of temporal tasks such as interval timing, motor control, speech production and spatial navigation. Reservoir computing Neural oscillations Temporal processing Chaotic networks Recurrent neural networks
24	Deep learning pro doporučování založené na implicitní zpětné vazbě / Deep Learning For Implicit Feedback-based Recommender Systems Yöş, Kaan January 2020 (has links) The research aims to focus on Recurrent Neural Networks (RNN) and its application to the session-aware recommendations empowered by implicit user feedback and content-based metadata. To investigate the promising architecture of RNN, we implement seven different models utilizing various types of implicit feedback and content information. Our results showed that using RNN with complex implicit feedback increases the next-item prediction comparing the baseline models like Cosine Similarity, Doc2Vec, and Item2Vec.
25	Identifying dyslectic gaze pattern : Comparison of methods for identifying dyslectic readers based on eye movement patterns Lustig, Joakim January 2016 (has links) Dyslexia affects between 5-17% of all school children, mak-ing it the most common learning disability. It has beenfound to severely affect learning ability in school subjectsas well as limit the choice of further education and occupa-tion. Since research has shown that early intervention andsupport can mitigate the negative effects of dyslexia, it iscrucial that the diagnosis of dyslexia is easily available andaimed at the right children. To make sure children whoare experiencing problems reading and potentially could bedyslectic are investigated for dyslexia an easy access, sys-tematic, and unbiased screening method would be helpful.This thesis therefore investigates the use of machine learn-ing methods to analyze eye movement patterns for dyslexiaclassification.The results showed that it was possible to separatedyslectic from non-dyslectic readers to 83% accuracy, us-ing non-sequential feature based machine learning methods.Equally good results for lower sample frequencies indicatedthat consumer grade eye trackers can be used for the pur-pose. Furthermore a sequential approach using RecurrentNeural Networks was also investigated, reaching an accu-racy of 78%. The thesis is intended to be an introduction to whatmethods could be viable for identifying dyslexia and as aninspiration for researchers aiming to do larger studies in thearea. dyslexia machine learning neural networks recurrent neural networks Computer Sciences Datavetenskap (datalogi)
26	Sentiment Analysis of YouTube Public Videos based on their Comments Kvedaraite, Indre January 2021 (has links) With the rise of social media and publicly available data, opinion mining is more accessible than ever. It is valuable for content creators, companies and advertisers to gain insights into what users think and feel. This work examines comments on YouTube videos, and builds a deep learning classifier to automatically determine their sentiment. Four Long Short-Term Memory-based models are trained and evaluated. Experiments are performed to determine which deep learning model performs with the best accuracy, recall, precision, F1 score and ROC curve on a labelled YouTube Comment dataset. The results indicate that a BiLSTM-based model has the overall best performance, with the accuracy of 89%. Furthermore, the four LSTM-based models are evaluated on an IMDB movie review dataset, achieving an average accuracy of 87%, showing that the models can predict the sentiment of different textual data. Finally, a statistical analysis is performed on the YouTube videos, revealing that videos with positive sentiment have a statistically higher number of upvotes and views. However, the number of downvotes is not significantly higher in videos with negative sentiment. Sentiment analysis Sentiment classification LSTM BiLSTM Recurrent neural networks Convolutional neural networks Software Engineering Programvaruteknik
27	Efficient image based localization using machine learning techniques Elmougi, Ahmed 23 April 2021 (has links) Localization is critical for self-awareness of any autonomous system and is an important part of the autonomous system stack which consists of many phases including sensing, perceiving, planning and control. In the sensing phase, data from on board sensors are collected, preprocessed and passed to the next phase. The perceiving phase is responsible for self awareness or localization and situational awareness which includes multi-objects detection and scene understanding. After the autonomous system is aware of where it is and what is around it, it can use this knowledge to plan for the path it can take and send control commands to pursue this path. In this proposal, we focus on the localization part of the autonomous stack using camera images. We deal with the localization problem from different perspectives including single images and videos. Starting with the single image pose estimation, our approach is to propose systems that not only have good localization accuracy, but also have low space and time complexity. Firstly, we propose SurfCNN, a low cost indoor localization system that uses SURF descriptors instead of the original images to reduce the complexity of training convolutional neural networks (CNN) for indoor localization application. Given a single input image, the strongest SURF features descriptors are used as input to 5 convolutional layers to find its absolute position and orientation in arbitrary reference frame. The proposed system achieves comparable performance to the state of the art using only 300 features without the need for using the full image or complex neural networks architectures. Following, we propose SURF-LSTM, an extension to the idea of using SURF descriptors instead the original images. However, instead of CNN used in SurfCNN, we use long short term memory (LSTM) network which is one type of recurrent neural networks (RNN) to extract the sequential relation between SURF descriptors. Using SURF-LSTM, We only need 50 features to reach comparable or better results compared with SurfCNN that needs 300 features and other works that use full images with large neural networks. In the following research phase, instead of using SURF descriptors as image features to reduce the training complexity, we study the effect of using features extracted from other CNN models that were pretrained on other image tasks like image classification without further training and fine tuning. To learn the pose from pretrained features, graph neural networks (GNN) are adopted to solve the single image localization problem (Pose-GNN) by using these features representations either as features of nodes in a graph (image as a node) or converted into a graph (image as a graph). The proposed models outperform the state of the art methods on indoor localization dataset and have comparable performance for outdoor scenes. In the final stage of single image pose estimation research, we study if we can achieve good localization results without the need for training complex neural network. We propose (Linear-PoseNet) by which we can achieve similar results to the other methods based on neural networks with training a single linear regression layer on image features from pretrained ResNet50 in less than one second on CPU. Moreover, for outdoor scenes, we propose (Dense-PoseNet) that have only 3 fully connected layers trained on few minutes that reach comparable performance to other complex methods. The second localization perspective is to find the relative poses between images in a video instead of absolute poses. We extend the idea used in SurfCNN and SURF-LSTM systems and use SURF descriptors as feature representation of the images in the video. Two systems are proposed to find the relative poses between images in the video using 3D-CNN and 2DCNN-RNN. We show that using 3D-CNN is better than using the combination of CNN-RNN for relative pose estimation. / Graduate SLAM deep learning graph neural networks convolutional neural networks recurrent neural networks computer vision
28	Recurrent neural networks for deception detection in videos Rodriguez-Meza, Bryan, Vargas-Lopez-Lavalle, Renzo, Ugarte, Willy 01 January 2022 (has links) Deception detection has always been of subject of interest. After all, determining if a person is telling the truth or not could be detrimental in many real-world cases. Current methods to discern deceptions require expensive equipment that need specialists to read and interpret them. In this article, we carry out an exhaustive comparison between 9 different facial landmark recognition based recurrent deep learning models trained on a recent man-made database used to determine lies, comparing them by accuracy and AUC. We also propose two new metrics that represent the validity of each prediction. The results of a 5-fold cross validation show that out of all the tested models, the Stacked GRU neural model has the highest AUC of.9853 and the highest accuracy of 93.69% between the trained models. Then, a comparison is done between other machine and deep learning methods and our proposed Stacked GRU architecture where the latter surpasses them in the AUC metric. These results indicate that we are not that far away from a future where deception detection could be accessible throughout computers or smart devices. / Revisión por pares Deception detection Deep learning Facial landmarks recognition Recurrent neural networks Video database
29	On the Softmax Bottleneck of Word-Level Recurrent Language Models Parthiban, Dwarak Govind 06 November 2020 (has links) For different input contexts (sequence of previous words), to predict the next word, a neural word-level language model outputs a probability distribution over all the words in the vocabulary using a softmax function. When the log of probability outputs for all such contexts are stacked together, the resulting matrix is a log probability matrix which can be denoted as Q_theta, where theta denotes the model parameters. When language modeling is formulated as a matrix factorization problem, the matrix to be factorized Q_theta is expected to be high-rank as natural language is highly context-dependent. But existing softmax based word-level language models have a limitation of not being able to produce such matrices; this is known as the softmax bottleneck. There are several works that attempted to overcome the limitations introduced by softmax bottleneck, such as the models that can produce high-rank Q_theta. During the process of reproducing the results of these works, we observed that the rank of Q_theta does not always positively correlate with better performance (i.e., lower test perplexity). This puzzling observation triggered us to conduct a systematic investigation to check the influence of rank of Q_theta on better performance of a language model. We first introduce a new family of activation functions called the Generalized SigSoftmax (GSS). By controlling the parameters of GSS, we were able to construct language models that can produce Q_theta with diverse ranks (i.e., low, medium, and high ranks). For models that use GSS with different parameters, we observe that rank does not have a strong positive correlation with perplexity on the test data, reinforcing the support of our initial observation. By inspecting the top-5 predictions made by different models for a selected set of input contexts, we observe that a high-rank Q_theta does not guarantee a strong qualitative performance. Then, we conduct experiments to check if there are any other additional benefits in having models that can produce high-rank Q_theta. We expose that Q_theta rather suffers from the phenomenon of fast singular value decay. Additionally, we also propose an alternative metric to denote the rank of any matrix known as epsilon-effective rank, which can be useful to approximately quantify the singular value distribution when different values for epsilon are used. We conclude by showing that it is the regularization which has played a positive role in the performance of these high-rank models in comparison to the chosen baselines, and there is no single model yet which truly gains improved expressiveness just because of breaking the softmax bottleneck. Language Models Softmax Bottleneck Recurrent Neural Networks AWD-LSTM Generalized SigSoftmax
30	The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNet Raptis, Konstantinos 28 November 2016 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Action recognition has been an active research topic for over three decades. There are various applications of action recognition, such as surveillance, human-computer interaction, and content-based retrieval. Recently, research focuses on movies, web videos, and TV shows datasets. The nature of these datasets make action recognition very challenging due to scene variability and complexity, namely background clutter, occlusions, viewpoint changes, fast irregular motion, and large spatio-temporal search space (articulation configurations and motions). The use of local space and time image features shows promising results, avoiding the cumbersome and often inaccurate frame-by-frame segmentation (boundary estimation). We focus on two state of the art methods for the action classification problem: dense trajectories and recurrent neural networks (RNN). Dense trajectories use typical supervised training (e.g., with Support Vector Machines) of features such as 3D-SIFT, extended SURF, HOG3D, and local trinary patterns; the main idea is to densely sample these features in each frame and track them in the sequence based on optical flow. On the other hand, the deep neural network uses the input frames to detect action and produce part proposals, i.e., estimate information on body parts (shapes and locations). We compare qualitatively and numerically these two approaches, indicative to what is used today, and describe our conclusions with respect to accuracy and efficiency. Action Recognition Dense Trajectories R-CNN LSTM RNN Convolution Neural Networks Recurrent Neural Networks

Search results