Global ETD Search

301	Prediction of Protein-Protein Interactions Using Deep Learning Techniques Soleymani, Farzan 24 April 2023 (has links) Proteins are considered the primary actors in living organisms. Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. PPI identification has been addressed by various experimental methods such as the yeast two-hybrid, mass spectrometry, and protein microarrays, to mention a few. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. Therefore a sequence-based framework called ProtInteract is developed to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequential pattern by extracting uncorrelated attributes and more expressive descriptors. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction. Three different scenarios formulate the prediction task. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The present study makes two significant contributions to the field of protein-protein interaction (PPI) prediction. Firstly, it addresses the computational challenges posed by the high dimensionality of protein datasets through the use of dimensionality reduction techniques, which extract highly informative sequence attributes. Secondly, the proposed framework, ProtInteract, utilises this information to identify the interaction characteristics of a protein based on its amino acid configuration. ProtInteract encodes the protein's primary structure into a lower-dimensional vector space, thereby reducing the computational complexity of PPI prediction. Our results provide evidence of the proposed framework's accuracy and efficiency in predicting protein-protein interactions. Long-short Term Memory Recurrent Neural Networks Protein-Protein Interaction Temporal Convolutional Network Convolutional Neural Network Autoencoder Reinforcement learning actor-critic portfolio management stock market prediction coverage control multi-agent system SARSA Q-learning Graph convolutional neural network GCN state-action-reward-state-action
302	Object Based Image Retrieval Using Feature Maps of a YOLOv5 Network / Objektbaserad bildhämtning med hjälp av feature maps från ett YOLOv5-nätverk Essinger, Hugo, Kivelä, Alexander January 2022 (has links) As Machine Learning (ML) methods have gained traction in recent years, someproblems regarding the construction of such methods have arisen. One such problem isthe collection and labeling of data sets. Specifically when it comes to many applicationsof Computer Vision (CV), one needs a set of images, labeled as either being of someclass or not. Creating such data sets can be very time consuming. This project setsout to tackle this problem by constructing an end-to-end system for searching forobjects in images (i.e. an Object Based Image Retrieval (OBIR) method) using an objectdetection framework (You Only Look Once (YOLO) [16]). The goal of the project wasto create a method that; given an image of an object of interest q, search for that sameor similar objects in a set of other images S. The core concept of the idea is to passthe image q through an object detection model (in this case YOLOv5 [16]), create a”fingerprint” (can be seen as a sort of identity for an object) from a set of feature mapsextracted from the YOLOv5 [16] model and look for corresponding similar parts of aset of feature maps extracted from other images. An investigation regarding whichvalues to select for a few different parameters was conducted, including a comparisonof performance for a couple of different similarity metrics. In the table below,the parameter combination which resulted in the highest F_Top_300-score (a measureindicating the amount of relevant images retrieved among the top 300 recommendedimages) in the parameter selection phase is presented. Layer: 23Pool Methd: maxSim. Mtrc: eucFP Kern. Sz: 4 Evaluation of the method resulted in F_Top_300-scores as can be seen in the table below. Mouse: 0.820Duck: 0.640Coin: 0.770Jet ski: 0.443Handgun: 0.807Average: 0.696 / Medan ML-metoder har blivit mer populära under senare år har det uppstått endel problem gällande konstruktionen av sådana metoder. Ett sådant problem ärinsamling och annotering av data. Mer specifikt när det kommer till många metoderför datorseende behövs ett set av bilder, annoterande att antingen vara eller inte varaav en särskild klass. Att skapa sådana dataset kan vara väldigt tidskonsumerande.Metoden som konstruerades för detta projekt avser att bekämpa detta problem genomatt konstruera ett end-to-end-system för att söka efter objekt i bilder (alltså en OBIR-metod) med hjälp av en objektdetekteringsalgoritm (YOLO). Målet med projektet varatt skapa en metod som; givet en bild q av ett objekt, söka efter samma eller liknandeobjekt i ett bibliotek av bilder S. Huvudkonceptet bakom idén är att köra bilden qgenom objektdetekteringsmodellen (i detta fall YOLOv5 [16]), skapa ett ”fingerprint”(kan ses som en sorts identitet för ett objekt) från en samling feature maps extraheradefrån YOLOv5-modellen [16] och leta efter liknande delar av samlingar feature maps iandra bilder. En utredning angående vilka värden som skulle användas för ett antalolika parametrar utfördes, inklusive en jämförelse av prestandan som resultat av olikalikhetsmått. I tabellen nedan visas den parameterkombination som gav högst F_Top_300(ett mått som indikerar andelen relevanta bilder bland de 300 högst rekommenderadebilderna). Layer: 23Pool Methd: maxSim. Mtrc: eucFP Kern. Sz: 4 Evaluering av metoden med parameterval enligt tabellen ovan resulterade i F_Top_300enligt tabellen nedan. Mouse: 0.820Duck: 0.640Coin: 0.770Jet ski: 0.443Handgun: 0.807Average: 0.696 Content based image retrieval CBIR Object based image retrieval OBIR image retrieval YOLO YOLOv5 object detection PyTorch deep learning convolutional neural network CNN Content based image retrieval CBIR Object based image retrieval OBIR image retrieval YOLO YOLOv5 object detection PyTorch deep learning convolutional neural network CNN Other Mathematics Annan matematik
303	Simultaneous Detection and Validation of Multiple Ingredients on Product Packages: An Automated Approach : Using CNN and OCR Techniques / Simultant detektering och validering av flertal ingredienser på produktförpackningar: Ett automatiserat tillvägagångssätt : Genom användning av CNN och OCR tekniker Farokhynia, Rodbeh, Krikeb, Mokhtar January 2024 (has links) Manual proofreading of product packaging is a time-consuming and uncertain process that can pose significant challenges for companies, such as scalability issues, compliance risks and high costs. This thesis work introduces a novel solution by employing advanced computer vision and machine learning methods to automate the proofreading of multiple ingredients’ lists corresponding to multiple products simultaneously within a product package. By integrating Convolutional Neural Network (CNN) and Optical Character Recognition (OCR) techniques, this study examines the efficacy of automated proofreading in comparison to manual methods. The thesis involves analyzing product package artwork to identify ingredient lists utilizing the YOLOv5 object detection algorithm and the optical character recognition tool EasyOCR for ingredient extraction. Additionally, Python scripts are employed to extract ingredients from corresponding INCI PDF files (document that lists the standardized names of ingredients used in cosmetic products). A comprehensive comparison is then conducted to evaluate the accuracy and efficiency of automated proofreading. The comparison of the extracted ingredients from the product packages and their corresponding INCI PDF files yielded a match of 12.7%. Despite the suboptimal result, insights from the study highlights the limitations of current detection and recognition algorithms when applied to complex artwork. A few examples of the insights have been that the trained YOLOv5 model cuts through sentences in the ingredient list or that EasyOCR cannot extract ingredients from vertically aligned product package images. The findings underscore the need for advancements in detection algorithms and OCR tools to effectively handle objects like product packaging designs. The study also suggests that companies, such as H&M, consider updating their artwork and INCI PDF files to align with the capabilities of current AI-driven tools. By doing so, they can enhance the efficiency and overall effectiveness of automated proofreading processes, thereby reducing errors and improving accuracy. / Manuell korrekturläsning av produktförpackningar är en tidskrävande och osäker process som kan skapa betydande utmaningar för företag, såsom skalbarhetsproblem, efterlevnadsrisker och höga kostnader. Detta examensarbete presenterar en ny lösning genom att använda avancerade metoder inom datorseende och maskininlärning för att automatisera korrekturläsningen av flera ingredienslistor som motsvarar flera produkter samtidigt inom en produktförpackning. Genom att integrera Convolutional Neural Network (CNN) och Optical Character Recognition (OCR) utreder denna studie effektiviteten av automatiserad korrekturläsning i jämförelse med manuella metoder. Avhandlingen analyserar designen av produktförpackningar för att identifiera ingredienslistor med hjälp av objektdetekteringsalgoritmen YOLOv5 och det optiska teckenigenkänningsverktyget EasyOCR för extrahera enskilda ingredienser från listorna. Utöver detta används Python-skript för att extrahera ingredienser från motsvarande INCI-PDF filer (dokument med standardiserade namn på ingredienser som används i kosmetika produkter). En omfattande jämförelse genomförs sedan för att utvärdera noggrannheten och effektiviteten hos automatiserad korrekturläsning. Jämförelsen av de extraherade ingredienserna från produktförpackningarna och deras korresponderande INCI-PDF filer gav ett matchnings resultat på 12.7%. Trots de mindre optimala resultaten belyser studien de begränsningar som finns hos de nuvarande detekterings- och teckenigenkänningsalgoritmerna när de appliceras på komplexa verk av produktförpackningar. Ett fåtal exempel på insikterna är bland annat att den tränade YOLOv5 modellen skär igenom meningar i ingredienslistan eller att EasyOCR inte kan extrahera ingredienser från stående ingredienslistor på produktförpackningsbilder. Resultaten understryker behovet av framsteg inom detekteringsalgoritmer och OCR-verktyg för att effektivt kunna hantera komplexa objekt som produktförpackningar. Studien föreslår även att företag, såsom H&M, överväger att uppdatera sina design av produktförpackningar och INCI-PDF filer för att anpassa sig till kapaciteten hos aktuella AI-drivna verktyg. Genom att utföra detta kan de förbättra både effektiviteten och den övergripande kvaliteten hos de automatiserade korrekturläsningsprocesserna, vilket minskar fel och ökar noggrannheten. Proofreading product packaging automation computer vision machine learning Convolutional Neural Network (CNN) Optical Character Recognition (OCR) YOLOv5 EasyOCR accuracy Manuell korrekturläsning automatisering produktförpackningar datorseende maskininlärning Convolutional Neural Network (CNN) Optical Character Recognition (OCR) YOLOv5 EasyOCR noggrannhet Other Computer and Information Science Annan data- och informationsvetenskap
304	Eismo dalyvių kelyje atpažinimas naudojant dirbtinius neuroninius tinklus ir grafikos procesorių / On - road vehicle recognition using neural networks and graphics processing unit Kinderis, Povilas 27 June 2014 (has links) Kasmet daugybė žmonių būna sužalojami autoįvykiuose, iš kurių dalis sužalojimų būna rimti arba pasibaigia mirtimi. Dedama vis daugiau pastangų kuriant įvairias sistemas, kurios padėtų mažinti nelaimių skaičių kelyje. Tokios sistemos gebėtų perspėti vairuotojus apie galimus pavojus, atpažindamos eismo dalyvius ir sekdamos jų padėtį kelyje. Eismo dalyvių kelyje atpažinimas iš vaizdo yra pakankamai sudėtinga, daug skaičiavimų reikalaujanti problema. Šiame darbe šiai problemai spręsti pasitelkti stereo vaizdai, nesugretinamumo žemėlapis bei konvoliuciniai neuroniniai tinklai. Konvoliuciniai neuroniniai tinklai reikalauja daug skaičiavimų, todėl jie optimizuoti pasitelkus grafikos procesorių ir OpenCL. Gautas iki 33,4% spartos pagerėjimas lyginant su centriniu procesoriumi. Stereo vaizdai ir nesugretinamumo žemėlapis leidžia atmesti didelius kadro regionus, kurių nereikia klasifikuoti su konvoliuciniu neuroniniu tinklu. Priklausomai nuo scenos vaizde, reikalingų klasifikavimo operacijų skaičius sumažėja vidutiniškai apie 70-95% ir tai leidžia kadrą apdoroti atitinkamai greičiau. / Many people are injured during auto accidents each year, some injures are serious or end in death. Many efforts are being put in developing various systems, which could help to reduce accidents on the road. Such systems could warn drivers of a potential danger, while recognizing on-road vehicles and tracking their position on the road. On-road vehicle recognition on image is a complex and computationally very intensive problem. In this paper, to solve this problem, stereo images, disparity map and convolutional neural networks are used. Convolutional neural networks are very computational intensive, so to optimize it GPU and OpenCL are used. 33.4% speed improvement was achieved compared to the central processor. Stereo images and disparity map allows to discard large areas of the image, which are not needed to be classified using convolutional neural networks. Depending on the scene of the image, the number of the required classification operations decreases on average by 70-95% and this allows to process the image accordingly faster. Eismo dalyvių atpažinimas Konvoliucinis neuroninis tinklas Grafinis procesorius GP OpenCL OpenCV Stereo vaizdas Nesugretinamumo žemėlapis On-road vehicle recognition Convolutional neural network Graphics processing unit GPU Stereo image Disparity map
305	Vision based facial emotion detection using deep convolutional neural networks Julin, Fredrik January 2019 (has links) Emotion detection, also known as Facial expression recognition, is the art of mapping an emotion to some sort of input data taken from a human. This is a powerful tool to extract valuable information from individuals which can be used as data for many different purposes, ranging from medical conditions such as depression to customer feedback. To be able to solve the problem of facial expression recognition, smaller subtasks are required and all of them together form the complete system to the problem. Breaking down the bigger task at hand, one can think of these smaller subtasks in the form of a pipeline that implements the necessary steps for classification of some input to then give an output in the form of emotion. In recent time with the rise of the art of computer vision, images are often used as input for these systems and have shown great promise to assist in the task of facial expression recognition as the human face conveys the subjects emotional state and contain more information than other inputs, such as text or audio. Many of the current state-of-the-art systems utilize computer vision in combination with another rising field, namely AI, or more specifically deep learning. These proposed methods for deep learning are in many cases using a special form of neural network called convolutional neural network that specializes in extracting information from images. Then performing classification using the SoftMax function, acting as the last part before the output in the facial expression pipeline. This thesis work has explored these methods of utilizing convolutional neural networks to extract information from images and builds upon it by exploring a set of machine learning algorithms that replace the more commonly used SoftMax function as a classifier, in attempts to further increase not only the accuracy but also optimize the use of computational resources. The work also explores different techniques for the face detection subtask in the pipeline by comparing two approaches. One of these approaches is more frequently used in the state-of-the-art and is said to be more viable for possible real-time applications, namely the Viola-Jones algorithm. The other is a deep learning approach using a state-of-the-art convolutional neural network to perform the detection, in many cases speculated to be too computationally intense to run in real-time. By applying a state-of-the-art inspired new developed convolutional neural network together with the SoftMax classifier, the final performance did not reach state-of-the-art accuracy. However, the machine-learning classifiers used shows promise and bypass the SoftMax function in performance in several cases when given a massively smaller number of samples as training. Furthermore, the results given from implementing and testing a pure deep learning approach, using deep learning algorithms for both the detection and classification stages of the pipeline, shows that deep learning might outperform the classic Viola-Jones algorithm in terms of both detection rate and frames per second. Computer vision Deep learning Convolutional neural network CNN Neural network Machine learning emotion detection facial expression recognition real-time face detection Computer Sciences Datavetenskap (datalogi) Information Systems
306	Road Surface Preview Estimation Using a Monocular Camera Ekström, Marcus January 2018 (has links) Recently, sensors such as radars and cameras have been widely used in automotives, especially in Advanced Driver-Assistance Systems (ADAS), to collect information about the vehicle's surroundings. Stereo cameras are very popular as they could be used passively to construct a 3D representation of the scene in front of the car. This allowed the development of several ADAS algorithms that need 3D information to perform their tasks. One interesting application is Road Surface Preview (RSP) where the task is to estimate the road height along the future path of the vehicle. An active suspension control unit can then use this information to regulate the suspension, improving driving comfort, extending the durabilitiy of the vehicle and warning the driver about potential risks on the road surface. Stereo cameras have been successfully used in RSP and have demonstrated very good performance. However, the main disadvantages of stereo cameras are their high production cost and high power consumption. This limits installing several ADAS features in economy-class vehicles. A less expensive alternative are monocular cameras which have a significantly lower cost and power consumption. Therefore, this thesis investigates the possibility of solving the Road Surface Preview task using a monocular camera. We try two different approaches: structure-from-motion and Convolutional Neural Networks.The proposed methods are evaluated against the stereo-based system. Experiments show that both structure-from-motion and CNNs have a good potential for solving the problem, but they are not yet reliable enough to be a complete solution to the RSP task and be used in an active suspension control unit. Road Surface Preview Computer Vision Depth Estimation Convolutional Neural Network CNN traffic safety monocular camera mono vision system mono camera Structure from motion sfm 3D Reconstruction Autonomous Driving Datorseende trafiksäkerhet djupuppskattning mono kamera 3D rekonstruktion autonoma fordon Signal Processing Signalbehandling
307	[en] POPULATION DISTRIBUTION MAPPING THROUGH THE DETECTION OF BUILDING AREAS IN GOOGLE EARTH IMAGES OF HETEROGENEOUS REGIONS USING DEEP LEARNING / [pt] MAPEAMENTO DA DISTRIBUIÇÃO POPULACIONAL ATRAVÉS DA DETECÇÃO DE ÁREAS EDIFICADAS EM IMAGENS DE REGIÕES HETEROGÊNEAS DO GOOGLE EARTH USANDO DEEP LEARNING CASSIO FREITAS PEREIRA DE ALMEIDA 08 February 2018 (has links) [pt] Informações precisas sobre a distribuição da população são reconhecidamente importantes. A fonte de informação mais completa sobre a população é o censo, cujos os dados são disponibilizados de forma agregada em setores censitários. Esses setores são unidades operacionais de tamanho e formas irregulares, que dificulta a análise espacial dos dados associados. Assim, a mudança de setores censitários para um conjunto de células regulares com estimativas adequadas facilitaria a análise. Uma metodologia a ser utilizada para essa mudança poderia ser baseada na classificação de imagens de sensoriamento remoto para a identificação de domicílios, que é a base das pesquisas envolvendo a população. A detecção de áreas edificadas é uma tarefa complexa devido a grande variabilidade de características de construção e de imagens. Os métodos usuais são complexos e muito dependentes de especialistas. Os processos automáticos dependem de grandes bases de imagens para treinamento e são sensíveis à variação de qualidade de imagens e características das construções e de ambiente. Nesta tese propomos a utilização de um método automatizado para detecção de edificações em imagens Google Earth que mostrou bons resultados utilizando um conjunto de imagens relativamente pequeno e com grande variabilidade, superando as limitações dos processos existentes. Este resultado foi obtido com uma aplicação prática. Foi construído um conjunto de imagens com anotação de áreas construídas para 12 regiões do Brasil. Estas imagens, além de diferentes na qualidade, apresentam grande variabilidade nas características das edificações e no ambiente geográfico. Uma prova de conceito será feita na utilização da classificação de área construída nos métodos dasimétrico para a estimação de população em gride. Ela mostrou um resultado promissor quando comparado com o método usual, possibilitando a melhoria da qualidade das estimativas. / [en] The importance of precise information about the population distribution is widely acknowledged. The census is considered the most reliable and complete source of this information, and its data are delivered in an aggregated form in sectors. These sectors are operational units with irregular shapes, which hinder the spatial analysis of the data. Thus, the transformation of sectors onto a regular grid would facilitate such analysis. A methodology to achieve this transformation could be based on remote sensing image classification to identify building where the population lives. The building detection is considered a complex task since there is a great variability of building characteristics and on the images quality themselves. The majority of methods are complex and very specialist dependent. The automatic methods require a large annotated dataset for training and they are sensitive to the image quality, to the building characteristics, and to the environment. In this thesis, we propose an automatic method for building detection based on a deep learning architecture that uses a relative small dataset with a large variability. The proposed method shows good results when compared to the state of the art. An annotated dataset has been built that covers 12 cities distributed in different regions of Brazil. Such images not only have different qualities, but also shows a large variability on the building characteristics and geographic environments. A very important application of this method is the use of the building area classification in the dasimetric methods for the population estimation into grid. The concept proof in this application showed a promising result when compared to the usual method allowing the improvement of the quality of the estimates. [pt] APRENDIZADO DE MAQUINA [en] MACHINE LEARNING [pt] CLASSIFICACAO DE IMAGENS [en] IMAGE CLASSIFICATION [pt] DETECCAO DE OBJETOS [en] OBJECT DETECTION [pt] REDE NEURAL CONVOLUCIONAL [en] CONVOLUTIONAL NEURAL NETWORK [pt] SEGMENTACAO DE IMAGENS [en] IMAGE SEGMENTATION [pt] DASIMETRIA [en] DASIMETRY
308	Investigations of calorimeter clustering in ATLAS using machine learning Niedermayer, Graeme 11 January 2018 (has links) The Large Hadron Collider (LHC) at CERN is designed to search for new physics by colliding protons with a center-of-mass energy of 13 TeV. The ATLAS detector is a multipurpose particle detector built to record these proton-proton collisions. In order to improve sensitivity to new physics at the LHC, luminosity increases are planned for 2018 and beyond. With this greater luminosity comes an increase in the number of simultaneous proton-proton collisions per bunch crossing (pile-up). This extra pile-up has adverse effects on algorithms for clustering the ATLAS detector's calorimeter cells. These adverse effects stem from overlapping energy deposits originating from distinct particles and could lead to difficulties in accurately reconstructing events. Machine learning algorithms provide a new tool that has potential to improve clustering performance. Recent developments in computer science have given rise to new set of machine learning algorithms that, in many circumstances, out-perform more conventional algorithms. One of these algorithms, convolutional neural networks, has been shown to have impressive performance when identifying objects in 2d or 3d arrays. This thesis will develop a convolutional neural network model for calorimeter cell clustering and compare it to the standard ATLAS clustering algorithm. / Graduate Machine Learning Artificial Neural Network Convolutional Neural Network Calorimetry Particle Physics ATLAS LHC ANN CNN Topological Clustering Particle Detector CERN Energy Depositions Pile-Up High Luminosity Residual Neural Network Large Hadron Collider HL-LHC
309	Accelerated Deep Learning using Intel Xeon Phi Viebke, André January 2015 (has links) Deep learning, a sub-topic of machine learning inspired by biology, have achieved wide attention in the industry and research community recently. State-of-the-art applications in the area of computer vision and speech recognition (among others) are built using deep learning algorithms. In contrast to traditional algorithms, where the developer fully instructs the application what to do, deep learning algorithms instead learn from experience when performing a task. However, for the algorithm to learn require training, which is a high computational challenge. High Performance Computing can help ease the burden through parallelization, thereby reducing the training time; this is essential to fully utilize the algorithms in practice. Numerous work targeting GPUs have investigated ways to speed up the training, less attention have been paid to the Intel Xeon Phi coprocessor. In this thesis we present a parallelized implementation of a Convolutional Neural Network (CNN), a deep learning architecture, and our proposed parallelization scheme, CHAOS. Additionally a theoretical analysis and a performance model discuss the algorithm in detail and allow for predictions if even more threads are available in the future. The algorithm is evaluated on an Intel Xeon Phi 7120p, Xeon E5-2695v2 2.4 GHz and Core i5 661 3.33 GHz using various architectures and thread counts on the MNIST dataset. Findings show a 103.5x, 99.9x, 100.4x speed up for the large, medium, and small architecture respectively for 244 threads compared to 1 thread on the coprocessor. Moreover, a 10.9x - 14.1x (large to small) speed up compared to the sequential version running on Xeon E5. We managed to decrease training time from 7 days on the Core i5 and 31 hours on the Xeon E5, to 3 hours on the Intel Xeon Phi when training our large network for 15 epochs Machine Learning Deep Learning Supervised Deep Learning Intel Xeon Phi Convolutional Neural Network CNN High Performance Computing CHAOS parallel computing coprocessor MIC speed up performance model evaluation Computer Sciences Datavetenskap (datalogi)
310	Multi-Modal Technology for User Interface Analysis including Mental State Detection and Eye Tracking Analysis Husseini Orabi, Ahmed January 2017 (has links) We present a set of easy-to-use methods and tools to analyze human attention, behaviour, and physiological responses. A potential application of our work is evaluating user interfaces being used in a natural manner. Our approach is designed to be scalable and to work remotely on regular personal computers using expensive and noninvasive equipment. The data sources our tool processes are nonintrusive, and captured from video; i.e. eye tracking, and facial expressions. For video data retrieval, we use a basic webcam. We investigate combinations of observation modalities to detect and extract affective and mental states. Our tool provides a pipeline-based approach that 1) collects observational, data 2) incorporates and synchronizes the signal modality mentioned above, 3) detects users' affective and mental state, 4) records user interaction with applications and pinpoints the parts of the screen users are looking at, 5) analyzes and visualizes results. We describe the design, implementation, and validation of a novel multimodal signal fusion engine, Deep Temporal Credence Network (DTCN). The engine uses Deep Neural Networks to provide 1) a generative and probabilistic inference model, and 2) to handle multimodal data such that its performance does not degrade due to the absence of some modalities. We report on the recognition accuracy of basic emotions for each modality. Then, we evaluate our engine in terms of effectiveness of recognizing basic six emotions and six mental states, which are agreeing, concentrating, disagreeing, interested, thinking, and unsure. Our principal contributions include the implementation of a 1) multimodal signal fusion engine, 2) real time recognition of affective and primary mental states from nonintrusive and inexpensive modality, 3) novel mental state-based visualization techniques, 3D heatmaps, 3D scanpaths, and widget heatmaps that find parts of the user interface where users are perhaps unsure, annoyed, frustrated, or satisfied. Computer vision Convolutional Neural Network Multimodal Deep learning Machine learning HCI Usability Affective computing Mental states Eye tracking Software engineering Psychophysiology Pipeline Umple Component-based Development Emotion Facial Action Coding System Facial Expression

Search results