• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 306
  • 96
  • 41
  • 24
  • 17
  • 11
  • 9
  • 6
  • 5
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 614
  • 318
  • 204
  • 170
  • 140
  • 115
  • 102
  • 101
  • 88
  • 77
  • 65
  • 56
  • 55
  • 55
  • 54
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
421

Aplicación web para la detección de mentiras utilizando redes neuronales recurrentes y micro-expresiones / Web application for lie detection using recurrent neural networks and micro-expressions

Rodriguez Meza, Bryan Alberto, Vargas Lopez-Lavalle, Renzo Nicolas 21 January 2021 (has links)
En la vida cotidiana, detectar una falacia puede tener importantes implicaciones en distintas situaciones sociales. Descifrar mentiras, puede ser determinante en situaciones que impliquen consecuencias graves o moderadas; como el caso de investigaciones policiales. El trabajo expuesto en las siguientes paginas tiene como fin la realización de un sistema de detección de mentiras que utilice una cámara web como medio único para la detección. Además de esto, se busca realizar la investigación correspondiente a las subáreas relacionadas al problema. Estas son la de detección de mentiras, Deep learning y visión computacional. En este trabajo expuesto, se asumirá al acto de mentir como cualquier acto que busque comunicar información falsa o trastornada, de forma deliberada con la finalidad de engañar a otros. La investigación realizada, se hará presente en el desarrollo de un proyecto cuyo alcance consiste en la creación de una aplicación capaz de detectar si una persona dice la verdad a partir de su reconocimiento facial. Para ello, se utilizarán técnicas de visión computacional y machine learning con el fin de dar otra opción más económica y accesible ante las otras metodologías (polígrafo, ERPs, fMRI) que se basan en analizar el estado cerebral requieren de maquinaria extremadamente costosa y tienden a tener la misma precisión que el uso de polígrafos. / In everyday life, detecting a fallacy can have important implications in different social situations. Deciphering lies can be decisive in situations that involve serious or moderate consequences, as in the case of police investigations. The work presented in the following pages is aimed at the realization of a lie detection system that uses a web camera as the only means for detection. In addition to this, it seeks to carry out the investigation corresponding to the subareas related to the problem. These subareas are lie detection, deep learning, and computer vision. In this exposed work, the act of lying will be assumed as any act that seeks to communicate false or disturbed information, deliberately with the purpose of deceiving others. The research carried out will be present in the development of a project whose scope consists of the creation of an application capable of detecting if a person is telling the truth from their facial recognition. To do this, computer vision and machine learning techniques will be used in order to provide another cheaper and more accessible option compared to other methodologies (polygraph, ERPs, fMRI) that are based on analyzing the brain state, require extremely expensive machinery and tend to have the same precision as the use of polygraphs. / Trabajo de investigación
422

Predicting Road Rut with a Multi-time-series LSTM Model

Backer-Meurke, Henrik, Polland, Marcus January 2021 (has links)
Road ruts are depressions or grooves worn into a road. Increases in rut depth are highly undesirable due to the heightened risk of hydroplaning. Accurately predicting increases in road rut depth is important for maintenance planning within the Swedish Transport Administration. At the time of writing this paper, the agency utilizes a linear regression model and is developing a feed-forward neural network for road rut predictions. The aim of the study was to evaluate the possibility of using a Recurrent Neural Network to predict road rut. Through design science research, an artefact in the form of a LSTM model was designed, developed, and evaluated.The dataset consisted of multiple-multivariate short time series where research was limited. Case studies were conducted which inspired the conceptual design of the model. The baseline LSTM model proposed in this paper utilizes the full dataset in combination with time-series individualization through an added index feature. Additional features thought to correlate with rut depth was also studied through multiple training set variations. The model was evaluated by calculating the Root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE) for each training set variation. The baseline model predicted rut depth with a MAE of 0.8110 (mm) and a RMSE of 1.124 (mm) outperforming a control set without the added index. The feature with the highest correlation to rut depth was curvature with a MAEof 0.8031 and a RMSE of 1.1093. Initial finding shows that there is a possibility of utilizing an LSTM model trained on multiple-multivariate time series to predict rut depth. Time series individualization through an added index feature yielded better results than control, indicating that it had the desired effect on model performance.
423

Deep Learning Approach for Extracting Heart Rate Variability from a Photoplethysmographic Signal

Odinsdottir, Gudny Björk, Larsson, Jesper January 2020 (has links)
Photoplethysmography (PPG) is a method to detect blood volume changes in every heartbeat. The peaks in the PPG signal corresponds to the electrical impulses sent by the heart. The duration between each heartbeat varies, and these variances are better known as heart rate variability (HRV). Thus, finding peaks correctly from PPG signals provides the opportunity to measure an accurate HRV. Additional research indicates that deep learning approaches can extract HRV from a PPG signal with significantly greater accuracy compared to other traditional methods. In this study, deep learning classifiers were built to detect peaks in a noise-contaminated PPG signal and to recognize the performed activity during the data recording. The dataset used in this study is provided by the PhysioBank database consisting of synchronized PPG-, acceleration- and gyro data. The models investigated in this study were limited toa one-layer LSTM network with six varying numbers of neurons and four different window sizes. The most accurate model for the peak classification was the model consisting of 256 neurons and a window size of 15 time steps, with a Matthews correlation coefficient (MCC) of 0.74. The model consisted of64 neurons and a window duration of 1.25 seconds resulted in the most accurate activity classification, with an MCC score of 0.63. Concludingly, more optimization of a deep learning approach could lead to promising accuracy on peak detection and thus an accurate measurement of HRV. The probable cause for the low accuracy of the activity classification problem is the limited data used in this study.
424

Learning with Recurrent Neural Networks / Lernen mit Rekurrenten Neuronalen Netzen

Hammer, Barbara 15 September 2000 (has links)
This thesis examines so called folding neural networks as a mechanism for machine learning. Folding networks form a generalization of partial recurrent neural networks such that they are able to deal with tree structured inputs instead of simple linear lists. In particular, they can handle classical formulas - they were proposed originally for this purpose. After a short explanation of the neural architecture we show that folding networks are well suited as a learning mechanism in principle. This includes three parts: the proof of their universal approximation ability, the aspect of information theoretical learnability, and the examination of the complexity of training. Approximation ability: It is shown that any measurable function can be approximated in probability. Explicit bounds on the number of neurons result if only a finite number of points is dealt with. These bounds are new results in the case of simple recurrent networks, too. Several restrictions occur if a function is to be approximated in the maximum norm. Afterwards, we consider briefly the topic of computability. It is shown that a sigmoidal recurrent neural network can compute any mapping in exponential time. However, if the computation is subject to noise almost the capability of tree automata arises. Information theoretical learnability: This part contains several contributions to distribution dependent learnability: The notation of PAC and PUAC learnability, consistent PAC/ PUAC learnability, and scale sensitive versions are considered. We find equivalent characterizations of these terms and examine their respective relation answering in particular an open question posed by Vidyasagar. It is shown at which level learnability only because of an encoding trick is possible. Two approaches from the literature which can guarantee distribution dependent learnability if the VC dimension of the concept class is infinite are generalized to function classes: The function class is stratified according to the input space or according to a so-called luckiness function which depends on the output of the learning algorithm and the concrete training data. Afterwards, the VC, pseudo-, and fat shattering dimension of folding networks are estimated: We improve some lower bounds for recurrent networks and derive new lower bounds for the pseudodimension and lower and upper bounds for folding networks in general. As a consequence, folding architectures are not distribution independent learnable. Distribution dependent learnability can be guaranteed. Explicit bounds on the number of examples which guarantee valid generalization can be derived using the two approaches mentioned above. We examine in which cases these bounds are polynomial. Furthermore, we construct an explicit example for a learning scenario where an exponential number of examples is necessary. Complexity: It is shown that training a fixed folding architecture with perceptron activation function is polynomial. Afterwards, a decision problem, the so-called loading problem, which is correlated to neural network training is examined. For standard multilayer feed-forward networks the following situations turn out to be NP-hard: Concerning the perceptron activation function, a classical result from the literature, the NP-hardness for varying input dimension, is generalized to arbitrary multilayer architectures. Additionally, NP-hardness can be found if the input dimension is fixed but the number of neurons may vary in at least two hidden layers. Furthermore, the NP-hardness is examined if the number of patterns and number of hidden neurons are correlated. We finish with a generalization of the classical NP result as mentioned above to the sigmoidal activation function which is used in practical applications.
425

Reinforcement Learning with Recurrent Neural Networks

Schäfer, Anton Maximilian 20 November 2008 (has links)
Controlling a high-dimensional dynamical system with continuous state and action spaces in a partially unknown environment like a gas turbine is a challenging problem. So far often hard coded rules based on experts´ knowledge and experience are used. Machine learning techniques, which comprise the field of reinforcement learning, are generally only applied to sub-problems. A reason for this is that most standard RL approaches still fail to produce satisfactory results in those complex environments. Besides, they are rarely data-efficient, a fact which is crucial for most real-world applications, where the available amount of data is limited. In this thesis recurrent neural reinforcement learning approaches to identify and control dynamical systems in discrete time are presented. They form a novel connection between recurrent neural networks (RNN) and reinforcement learning (RL) techniques. RNN are used as they allow for the identification of dynamical systems in form of high-dimensional, non-linear state space models. Also, they have shown to be very data-efficient. In addition, a proof is given for their universal approximation capability of open dynamical systems. Moreover, it is pointed out that they are, in contrast to an often cited statement, well able to capture long-term dependencies. As a first step towards reinforcement learning, it is shown that RNN can well map and reconstruct (partially observable) MDP. In the so-called hybrid RNN approach, the resulting inner state of the network is then used as a basis for standard RL algorithms. The further developed recurrent control neural network combines system identification and determination of an optimal policy in one network. In contrast to most RL methods, it determines the optimal policy directly without making use of a value function. The methods are tested on several standard benchmark problems. In addition, they are applied to different kinds of gas turbine simulations of industrial scale.
426

Pattern Recognition in the Usage Sequences of Medical Apps / Analyse des Séquences d'Usage d'Applications Médicales

Adam, Chloé 01 April 2019 (has links)
Les radiologues utilisent au quotidien des solutions d'imagerie médicale pour le diagnostic. L'amélioration de l'expérience utilisateur est toujours un axe majeur de l'effort continu visant à améliorer la qualité globale et l'ergonomie des produits logiciels. Les applications de monitoring permettent en particulier d'enregistrer les actions successives effectuées par les utilisateurs dans l'interface du logiciel. Ces interactions peuvent être représentées sous forme de séquences d'actions. Sur la base de ces données, ce travail traite de deux sujets industriels : les pannes logicielles et l'ergonomie des logiciels. Ces deux thèmes impliquent d'une part la compréhension des modes d'utilisation, et d'autre part le développement d'outils de prédiction permettant soit d'anticiper les pannes, soit d'adapter dynamiquement l'interface logicielle en fonction des besoins des utilisateurs. Tout d'abord, nous visons à identifier les origines des crashes du logiciel qui sont essentielles afin de pouvoir les corriger. Pour ce faire, nous proposons d'utiliser un test binomial afin de déterminer quel type de pattern est le plus approprié pour représenter les signatures de crash. L'amélioration de l'expérience utilisateur par la personnalisation et l'adaptation des systèmes aux besoins spécifiques de l'utilisateur exige une très bonne connaissance de la façon dont les utilisateurs utilisent le logiciel. Afin de mettre en évidence les tendances d'utilisation, nous proposons de regrouper les sessions similaires. Nous comparons trois types de représentation de session dans différents algorithmes de clustering. La deuxième contribution de cette thèse concerne le suivi dynamique de l'utilisation du logiciel. Nous proposons deux méthodes -- basées sur des représentations différentes des actions d'entrée -- pour répondre à deux problématiques industrielles distinctes : la prédiction de la prochaine action et la détection du risque de crash logiciel. Les deux méthodologies tirent parti de la structure récurrente des réseaux LSTM pour capturer les dépendances entre nos données séquentielles ainsi que leur capacité à traiter potentiellement différents types de représentations d'entrée pour les mêmes données. / Radiologists use medical imaging solutions on a daily basis for diagnosis. Improving user experience is a major line of the continuous effort to enhance the global quality and usability of software products. Monitoring applications enable to record the evolution of various software and system parameters during their use and in particular the successive actions performed by the users in the software interface. These interactions may be represented as sequences of actions. Based on this data, this work deals with two industrial topics: software crashes and software usability. Both topics imply on one hand understanding the patterns of use, and on the other developing prediction tools either to anticipate crashes or to dynamically adapt software interface according to users' needs. First, we aim at identifying crash root causes. It is essential in order to fix the original defects. For this purpose, we propose to use a binomial test to determine which type of patterns is the most appropriate to represent crash signatures. The improvement of software usability through customization and adaptation of systems to each user's specific needs requires a very good knowledge of how users use the software. In order to highlight the trends of use, we propose to group similar sessions into clusters. We compare 3 session representations as inputs of different clustering algorithms. The second contribution of our thesis concerns the dynamical monitoring of software use. We propose two methods -- based on different representations of input actions -- to address two distinct industrial issues: next action prediction and software crash risk detection. Both methodologies take advantage of the recurrent structure of LSTM neural networks to capture dependencies among our sequential data as well as their capacity to potentially handle different types of input representations for the same data.
427

CONTRIBUTIONS TO EFFICIENT AUTOMATIC TRANSCRIPTION OF VIDEO LECTURES

Agua Teba, Miguel Ángel del 04 November 2019 (has links)
[ES] Durante los últimos años, los repositorios multimedia en línea se han convertido en fuentes clave de conocimiento gracias al auge de Internet, especialmente en el área de la educación. Instituciones educativas de todo el mundo han dedicado muchos recursos en la búsqueda de nuevos métodos de enseñanza, tanto para mejorar la asimilación de nuevos conocimientos, como para poder llegar a una audiencia más amplia. Como resultado, hoy en día disponemos de diferentes repositorios con clases grabadas que siven como herramientas complementarias en la enseñanza, o incluso pueden asentar una nueva base en la enseñanza a distancia. Sin embargo, deben cumplir con una serie de requisitos para que la experiencia sea totalmente satisfactoria y es aquí donde la transcripción de los materiales juega un papel fundamental. La transcripción posibilita una búsqueda precisa de los materiales en los que el alumno está interesado, se abre la puerta a la traducción automática, a funciones de recomendación, a la generación de resumenes de las charlas y además, el poder hacer llegar el contenido a personas con discapacidades auditivas. No obstante, la generación de estas transcripciones puede resultar muy costosa. Con todo esto en mente, la presente tesis tiene como objetivo proporcionar nuevas herramientas y técnicas que faciliten la transcripción de estos repositorios. En particular, abordamos el desarrollo de un conjunto de herramientas de reconocimiento de automático del habla, con énfasis en las técnicas de aprendizaje profundo que contribuyen a proporcionar transcripciones precisas en casos de estudio reales. Además, se presentan diferentes participaciones en competiciones internacionales donde se demuestra la competitividad del software comparada con otras soluciones. Por otra parte, en aras de mejorar los sistemas de reconocimiento, se propone una nueva técnica de adaptación de estos sistemas al interlocutor basada en el uso Medidas de Confianza. Esto además motivó el desarrollo de técnicas para la mejora en la estimación de este tipo de medidas por medio de Redes Neuronales Recurrentes. Todas las contribuciones presentadas se han probado en diferentes repositorios educativos. De hecho, el toolkit transLectures-UPV es parte de un conjunto de herramientas que sirve para generar transcripciones de clases en diferentes universidades e instituciones españolas y europeas. / [CAT] Durant els últims anys, els repositoris multimèdia en línia s'han convertit en fonts clau de coneixement gràcies a l'expansió d'Internet, especialment en l'àrea de l'educació. Institucions educatives de tot el món han dedicat molts recursos en la recerca de nous mètodes d'ensenyament, tant per millorar l'assimilació de nous coneixements, com per poder arribar a una audiència més àmplia. Com a resultat, avui dia disposem de diferents repositoris amb classes gravades que serveixen com a eines complementàries en l'ensenyament, o fins i tot poden assentar una nova base a l'ensenyament a distància. No obstant això, han de complir amb una sèrie de requisits perquè la experiència siga totalment satisfactòria i és ací on la transcripció dels materials juga un paper fonamental. La transcripció possibilita una recerca precisa dels materials en els quals l'alumne està interessat, s'obri la porta a la traducció automàtica, a funcions de recomanació, a la generació de resums de les xerrades i el poder fer arribar el contingut a persones amb discapacitats auditives. No obstant, la generació d'aquestes transcripcions pot resultar molt costosa. Amb això en ment, la present tesi té com a objectiu proporcionar noves eines i tècniques que faciliten la transcripció d'aquests repositoris. En particular, abordem el desenvolupament d'un conjunt d'eines de reconeixement automàtic de la parla, amb èmfasi en les tècniques d'aprenentatge profund que contribueixen a proporcionar transcripcions precises en casos d'estudi reals. A més, es presenten diferents participacions en competicions internacionals on es demostra la competitivitat del programari comparada amb altres solucions. D'altra banda, per tal de millorar els sistemes de reconeixement, es proposa una nova tècnica d'adaptació d'aquests sistemes a l'interlocutor basada en l'ús de Mesures de Confiança. A més, això va motivar el desenvolupament de tècniques per a la millora en l'estimació d'aquest tipus de mesures per mitjà de Xarxes Neuronals Recurrents. Totes les contribucions presentades s'han provat en diferents repositoris educatius. De fet, el toolkit transLectures-UPV és part d'un conjunt d'eines que serveix per generar transcripcions de classes en diferents universitats i institucions espanyoles i europees. / [EN] During the last years, on-line multimedia repositories have become key knowledge assets thanks to the rise of Internet and especially in the area of education. Educational institutions around the world have devoted big efforts to explore different teaching methods, to improve the transmission of knowledge and to reach a wider audience. As a result, online video lecture repositories are now available and serve as complementary tools that can boost the learning experience to better assimilate new concepts. In order to guarantee the success of these repositories the transcription of each lecture plays a very important role because it constitutes the first step towards the availability of many other features. This transcription allows the searchability of learning materials, enables the translation into another languages, provides recommendation functions, gives the possibility to provide content summaries, guarantees the access to people with hearing disabilities, etc. However, the transcription of these videos is expensive in terms of time and human cost. To this purpose, this thesis aims at providing new tools and techniques that ease the transcription of these repositories. In particular, we address the development of a complete Automatic Speech Recognition Toolkit with an special focus on the Deep Learning techniques that contribute to provide accurate transcriptions in real-world scenarios. This toolkit is tested against many other in different international competitions showing comparable transcription quality. Moreover, a new technique to improve the recognition accuracy has been proposed which makes use of Confidence Measures, and constitutes the spark that motivated the proposal of new Confidence Measures techniques that helped to further improve the transcription quality. To this end, a new speaker-adapted confidence measure approach was proposed for models based on Recurrent Neural Networks. The contributions proposed herein have been tested in real-life scenarios in different educational repositories. In fact, the transLectures-UPV toolkit is part of a set of tools for providing video lecture transcriptions in many different Spanish and European universities and institutions. / Agua Teba, MÁD. (2019). CONTRIBUTIONS TO EFFICIENT AUTOMATIC TRANSCRIPTION OF VIDEO LECTURES [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/130198 / TESIS
428

Statistické jazykové modely založené na neuronových sítích / STATISTICAL LANGUAGE MODELS BASED ON NEURAL NETWORKS

Mikolov, Tomáš January 2012 (has links)
Statistické jazykové modely jsou důležitou součástí mnoha úspěšných aplikací, mezi něž patří například automatické rozpoznávání řeči a strojový překlad (příkladem je známá aplikace Google Translate). Tradiční techniky pro odhad těchto modelů jsou založeny na tzv. N-gramech. Navzdory známým nedostatkům těchto technik a obrovskému úsilí výzkumných skupin napříč mnoha oblastmi (rozpoznávání řeči, automatický překlad, neuroscience, umělá inteligence, zpracování přirozeného jazyka, komprese dat, psychologie atd.), N-gramy v podstatě zůstaly nejúspěšnější technikou. Cílem této práce je prezentace několika architektur jazykových modelůzaložených na neuronových sítích. Ačkoliv jsou tyto modely výpočetně náročnější než N-gramové modely, s technikami vyvinutými v této práci je možné jejich efektivní použití v reálných aplikacích. Dosažené snížení počtu chyb při rozpoznávání řeči oproti nejlepším N-gramovým modelům dosahuje 20%. Model založený na rekurentní neurovové síti dosahuje nejlepších publikovaných výsledků na velmi známé datové sadě (Penn Treebank).
429

Dataset Drift in Radar Warning Receivers : Out-of-Distribution Detection for Radar Emitter Classification using an RNN-based Deep Ensemble

Coleman, Kevin January 2023 (has links)
Changes to the signal environment of a radar warning receiver (RWR) over time through dataset drift can negatively affect a machine learning (ML) model, deployed for radar emitter classification (REC). The training data comes from a simulator at Saab AB, in the form of pulsed radar in a time-series. In order to investigate this phenomenon on a neural network (NN), this study first implements an underlying classifier (UC) in the form of a deep ensemble (DE), where each ensemble member consists of an NN with two independently trained bidirectional LSTM channels for each of the signal features pulse repetition interval (PRI), pulse width (PW) and carrier frequency (CF). From tests, the UC performs best for REC when using all three features. Because dataset drift can be treated as detecting out-of-distribution (OOD) samples over time, the aim is to reduce NN overconfidence on data from unseen radar emitters in order to enable OOD detection. The method estimates uncertainty with predictive entropy and classifies samples reaching an entropy larger than a threshold as OOD. In the first set of tests, OOD is defined from holding out one feature modulation from the training dataset, and choosing this as the only modulation in the OOD dataset used during testing. With this definition, Stagger and Jitter are most difficult to detect as OOD. Moreover, using DEs with 6 ensemble members and implementing LogitNorm to the architecture improves the OOD detection performance. Furthermore, the OOD detection method performs well for up to 300 emitter classes and predictive entropy outperforms the baseline for almost all tests. Finally, the model performs worse when OOD is simply defined as signals from unseen emitters, because of a precision decrease. In conclusion, the implemented changes managed to reduce the overconfidence for this particular NN, and improve OOD detection for REC.
430

Reducing Training Time in Text Visual Question Answering

Behboud, Ghazale 15 July 2022 (has links)
Artificial Intelligence (AI) and Computer Vision (CV) have brought the promise of many applications along with many challenges to solve. The majority of current AI research has been dedicated to single-modal data processing meaning they use only one modality such as visual recognition or text recognition. However, real-world challenges are often a combination of different modalities of data such as text, audio and images. This thesis focuses on solving the Visual Question Answering (VQA) problem which is a significant multi-modal challenge. VQA is defined as a computer vision system that when given a question about an image will answer based on an understanding of both the question and image. The goal is improving the training time of VQA models. In this thesis, Look, Read, Reason and Answer (LoRRA), which is a state-of-the-art architecture, is used as the base model. Then, Reduce Uni-modal Biases (RUBi) is applied to this model to reduce the importance of uni- modal biases in training. Finally, an early stopping strategy is employed to stop the training process once the model accuracy has converged to prevent the model from overfitting. Numerical results are presented which show that training LoRRA with RUBi and early stopping can converge in less than 5 hours. The impact of batch size, learning rate and warm up hyper parameters is also investigated and experimental results are presented. / Graduate

Page generated in 0.089 seconds