Global ETD Search

21	Using Deep Neural Networks and Industry-Friendly Standards to Create a Robot Follower for Human Leaders Gilliam, Austin Taylor 27 August 2018 (has links) No description available. Computer Science robot-follower human leader machine learning deep neural network video model inertia model
22	TickNet: A Lightweight Deep Classifier for Tick Recognition Wang, Li 01 February 2021 (has links) (PDF) The world is increasingly controlled by machine learning and deep learning. Deep neural networks are becoming powerful, encroaching on many tasks in computer vision system areas previously seen as the unique domain of humans, such as image classification, object detection, semantic segmentation, and instance segmentation. The success of a deep learning model at a specific application is determined by a sequence of choices, like what kind of deep neural network will be used, what data to be fed into the deep model, and what manners will be adopted to train a deep model. The goal of this work is to design a practical, lightweight image classification model built and trained from scratch which serves as an assistant to researchers and users to recognize if a small bug is a tick. Some of the images used in this work were collected by specialists using a microscope in the Laboratory of Medical Zoology (LMZ) at the University of Massachusetts Amherst. The following techniques are used in this work. We generated four datasets by collecting 53,150 images of small bugs and cleaning the data by deleting images with low quality. Both preprocessed images and augmented images were used in the training and validation processes. Initially, we proposed the use of five lightweight CNNs. We trained each network on the same training dataset and evaluated them using the same validation dataset. After comparing these five architectures, we chose the one with the best performance, named TickNet. We compared TickNet and five other classical image classification architectures used for large-scale image recognition tasks. We determined TickNet outperforms the five classical networks in model size, number of parameters, testing time on both a CPU and GPU with a tradeoff in testing accuracy. We deployed applications on an Android mobile phone to do binary classifications and four-class image classifications to conclude the research. Disclaimer: This work or any part of it should not be used as guidance or instruction regarding the diagnosis, care, or treatment of tick-borne diseases or supersede existing guidance. Deep Neural Network Image Classification Tick Recognition Other Computer Engineering Robotics
23	Data-driven subjective performance evaluation: An attentive deep neural networks model based on a call centre case Ahmed, Abdelrahman M., Sivarajah, Uthayasankar, Irani, Zahir, Mahroof, Kamran, Vincent, Charles 04 January 2023 (has links) Yes / Every contact centre engages in some form of Call Quality Monitoring in order to improve agent performance and customer satisfaction. Call centres have traditionally used a manual process to sort, select, and analyse a representative sample of interactions for evaluation purposes. Unfortunately, such a process is characterised by subjectivity, which in turn creates a skewed picture of agent performance. Detecting and eliminating subjectivity is the study challenge that requires empirical research to address. In this paper, we introduce an evidence-based machine learning-driven framework for the automatic detection of subjective calls. We analyse a corpus of seven hours of recorded calls from a real-estate call centre using a Deep Neural Network (DNN) for a multi-classification problem. The study draws the first baseline for subjectivity detection, achieving an accuracy of 75%, which is close to relevant speech studies in emotional recognition and performance classification. Among other findings, we conclude that in order to achieve the best performance evaluation, subjective calls should be removed from the evaluation process, or subjective scores should be deducted from the overall results. Subjective evaluation Agent performance Customer behaviour Deep neural network Call centre Call quality monitoring
24	Programmable Address Generation Unit for Deep Neural Network Accelerators Khan, Muhammad Jazib January 2020 (has links) The Convolutional Neural Networks are getting more and more popular due to their applications in revolutionary technologies like Autonomous Driving, Biomedical Imaging, and Natural Language Processing. With this increase in adoption, the complexity of underlying algorithms is also increasing. This trend entails implications for the computation platforms as well, i.e. GPUs, FPGA, or ASIC based accelerators, especially for the Address Generation Unit (AGU), which is responsible for the memory access. Existing accelerators typically have Parametrizable Datapath AGUs, which have minimal adaptability towards evolution in algorithms. Hence new hardware is required for new algorithms, which is a very inefficient approach in terms of time, resources, and reusability. In this research, six algorithms with different implications for hardware are evaluated for address generation, and a fully Programmable AGU (PAGU) is presented, which can adapt to these algorithms. These algorithms are Standard, Strided, Dilated, Upsampled and Padded convolution, and MaxPooling. The proposed AGU architecture is a Very Long Instruction Word based Application Specific Instruction Processor which has specialized components like hardware counters and zero-overhead loops and a powerful Instruction Set Architecture (ISA), which can model static and dynamic constraints and affine and non-affine Address Equations. The target has been to minimize the flexibility vs. area, power, and performance trade-off. For a working test network of Semantic Segmentation, results have shown that PAGU shows close to the ideal performance, one cycle per address, for all the algorithms under consideration excepts Upsampled Convolution for which it is 1.7 cycles per address. The area of PAGU is approx. 4.6 times larger than the Parametrizable Datapath approach, which is still reasonable considering the high flexibility benefits. The potential of PAGU is not just limited to neural network applications but also in more general digital signal processing areas, which can be explored in the future. / Convolutional Neural Networks blir mer och mer populära på grund av deras applikationer inom revolutionerande tekniker som autonom körning, biomedicinsk bildbehandling och naturligt språkbearbetning. Med denna ökning av antagandet ökar också komplexiteten hos underliggande algoritmer. Detta medför implikationer för beräkningsplattformarna såväl som GPU: er, FPGAeller ASIC-baserade acceleratorer, särskilt för Adressgenerationsenheten (AGU) som är ansvarig för minnesåtkomst. Befintliga acceleratorer har normalt Parametrizable Datapath AGU: er som har mycket begränsad anpassningsförmåga till utveckling i algoritmer. Därför krävs ny hårdvara för nya algoritmer, vilket är en mycket ineffektiv metod när det gäller tid, resurser och återanvändbarhet. I denna forskning utvärderas sex algoritmer med olika implikationer för hårdvara för adressgenerering och en helt programmerbar AGU (PAGU) presenteras som kan anpassa sig till dessa algoritmer. Dessa algoritmer är Standard, Strided, Dilated, Upsampled och Padded convolution och MaxPooling. Den föreslagna AGU-arkitekturen är en Very Long Instruction Word-baserad applikationsspecifik instruktionsprocessor som har specialiserade komponenter som hårdvara räknare och noll-overhead-slingor och en kraftfull Instruktionsuppsättning Arkitektur (ISA) som kan modellera statiska och dynamiska begränsningar och affinera och icke-affinerad adress ekvationer. Målet har varit att minimera flexibiliteten kontra avvägning av område, kraft och prestanda. För ett fungerande testnätverk av semantisk segmentering har resultaten visat att PAGU visar nära den perfekta prestanda, 1 cykel per adress, för alla algoritmer som beaktas undantar Upsampled Convolution för vilken det är 1,7 cykler per adress. Området för PAGU är ungefär 4,6 gånger större än Parametrizable Datapath-metoden, vilket fortfarande är rimligt med tanke på de stora flexibilitetsfördelarna. Potentialen för PAGU är inte bara begränsad till neurala nätverksapplikationer utan också i mer allmänna digitala signalbehandlingsområden som kan utforskas i framtiden. Computer and Information Sciences Data- och informationsvetenskap
25	Interpretability and Accuracy in Electricity Price Forecasting : Analysing DNN and LEAR Models in the Nord Pool and EPEX-BE Markets Margarida de Mendoça de Atayde P. de Mascarenhas, Maria January 2023 (has links) Market prices in the liberalized European electricity system play a crucial role in promoting competition, ensuring grid stability, and maximizing profits for market participants. Accurate electricity price forecasting algorithms have, therefore, become increasingly important in this competitive market. However, existing evaluations of forecasting models primarily focus on overall accuracy, overlooking the underlying causality of the predictions. The thesis explores two state-of-the-art forecasters, the deep neural network (DNN) and the Lasso Estimated AutoRegressive (LEAR) models, in the EPEX-BE and Nord Pool markets. The aim is to understand if their predictions can be trusted in more general settings than the limited context they are trained in. If the models produce poor predictions in extreme conditions or if their predictions are inconsistent with reality, they cannot be relied upon in the real world where these forecasts are used in downstream decision-making activities. The results show that for the EPEX-BE market, the DNN model outperforms the LEAR model in terms of overall accuracy. However, the LEAR model performs better in predicting negative prices, while the DNN model performs better in predicting price spikes. For the Nord Pool market, a simpler DNN model is more accurate for price forecasting. In both markets, the models exhibit behaviours inconsistent with reality, making it challenging to trust the models’ predictions. Overall, the study highlights the importance of understanding the underlying causality of forecasting models and the limitations of relying solely on overall accuracy metrics. / Priserna på den liberaliserade europeiska elmarknaden spelar en avgörande roll för att främja konkurrens, säkerställa stabilitet i elnätet och maximera aktörernas vinster. Exakta prisprognoalgoritmer har därför blivit allt viktigare på denna konkurrensutsatta marknad. Existerande utvärderingar av prognosverktyg fokuserar emellertid på den övergripande noggrannheten och förbiser de underliggande orsakssambanden i prognoserna. Denna rapport utforskar två moderna prognosverktyg, DNN (Deep Neural Network) och LEAR (Lasso Estimated AutoRegressive) på elmarknaderna i Belgien respektive Norden. Målsättningen är att förstå om deras prognoser är pålitliga i mer allmänna sammanhang än det begränsade sammahang som de är tränade i. Om modellerna producerar dåliga prognoser under extrema förhållanden eller om deras prognoser inte överensstämmer med verkligheten så kan man inte förlita sig på dem i den verkliga världen, där prognoserna ligger till grund för beslutsfattande aktiviteter. Resultaten för Belgien visar att DNN-modellen överträffar LEAR-modellen när det gäller övergripande noggrannhet. LEAR-modellen presterar dock bättre när det gäller att förutse negativa priser, medan DNN-modellen presterar bättre när det gäller prisspikar. På den nordiska elmarknaden är en enklare DNN-modell mer noggrann för prisprognoser. På båda marknaden visar modellerna beteenden som inte överensstämmer med verkligheten, vilket gör det utmanande att lita på modellernas prognoser. Sammantaget belyser studien vikten av att förstå de underliggande orsakssambanden i prognosmodellerna och begränsningarna med att enbart förlita sig på övergripande mått på noggrannhet. Deep Neural Network (DNN) Lasso Estimated AutoRegressive (LEAR) Electricity Price Forecasting (EPF) Interpretability EPEX-BE Nord Pool. Deep Neural Network (DNN) Lasso Estimate AutoRegressive (LEAR) elprisprognoser tolkbarhet EPEX-BE Nord Pool. Elektroteknik och elektronik
26	La représentation des documents par réseaux de neurones pour la compréhension de documents parlés / Neural network representations for spoken documents understanding Janod, Killian 27 November 2017 (has links) Les méthodes de compréhension de la parole visent à extraire des éléments de sens pertinents du signal parlé. On distingue principalement deux catégories dans la compréhension du signal parlé : la compréhension de dialogues homme/machine et la compréhension de dialogues homme/homme. En fonction du type de conversation, la structure des dialogues et les objectifs de compréhension varient. Cependant, dans les deux cas, les systèmes automatiques reposent le plus souvent sur une étape de reconnaissance automatique de la parole pour réaliser une transcription textuelle du signal parlé. Les systèmes de reconnaissance automatique de la parole, même les plus avancés, produisent dans des contextes acoustiques complexes des transcriptions erronées ou partiellement erronées. Ces erreurs s'expliquent par la présence d'informations de natures et de fonction variées, telles que celles liées aux spécificités du locuteur ou encore l'environnement sonore. Celles-ci peuvent avoir un impact négatif important pour la compréhension. Dans un premier temps, les travaux de cette thèse montrent que l'utilisation d'autoencodeur profond permet de produire une représentation latente des transcriptions d'un plus haut niveau d'abstraction. Cette représentation permet au système de compréhension de la parole d'être plus robuste aux erreurs de transcriptions automatiques. Dans un second temps, nous proposons deux approches pour générer des représentations robustes en combinant plusieurs vues d'un même dialogue dans le but d'améliorer les performances du système la compréhension. La première approche montre que plusieurs espaces thématiques différents peuvent être combinés simplement à l'aide d'autoencodeur ou dans un espace thématique latent pour produire une représentation qui augmente l'efficacité et la robustesse du système de compréhension de la parole. La seconde approche propose d'introduire une forme d'information de supervision dans les processus de débruitages par autoencodeur. Ces travaux montrent que l'introduction de supervision de transcription dans un autoencodeur débruitant dégrade les représentations latentes, alors que les architectures proposées permettent de rendre comparables les performances d'un système de compréhension reposant sur une transcription automatique et un système de compréhension reposant sur des transcriptions manuelles. / Application of spoken language understanding aim to extract relevant items of meaning from spoken signal. There is two distinct types of spoken language understanding : understanding of human/human dialogue and understanding in human/machine dialogue. Given a type of conversation, the structure of dialogues and the goal of the understanding process varies. However, in both cases, most of the time, automatic systems have a step of speech recognition to generate the textual transcript of the spoken signal. Speech recognition systems in adverse conditions, even the most advanced one, produce erroneous or partly erroneous transcript of speech. Those errors can be explained by the presence of information of various natures and functions such as speaker and ambience specificities. They can have an important adverse impact on the performance of the understanding process. The first part of the contribution in this thesis shows that using deep autoencoders produce a more abstract latent representation of the transcript. This latent representation allow spoken language understanding system to be more robust to automatic transcription mistakes. In the other part, we propose two different approaches to generate more robust representation by combining multiple views of a given dialogue in order to improve the results of the spoken language understanding system. The first approach combine multiple thematic spaces to produce a better representation. The second one introduce new autoencoders architectures that use supervision in the denoising autoencoders. These contributions show that these architectures reduce the difference in performance between a spoken language understanding using automatic transcript and one using manual transcript. Réseaux de neurones artificiels Traitement du langage naturel Reconnaissance de la parole Automatic speech recognition Natural language processing Deep neural network
27	EVALUATING THE IMPACT OF UNCERTAINTY ON THE INTEGRITY OF DEEP NEURAL NETWORKS Harborn, Jakob January 2021 (has links) Deep Neural Networks (DNNs) have proven excellent performance and are very successful in image classification and object detection. Safety critical industries such as the automotive and aerospace industry aim to develop autonomous vehicles with the help of DNNs. In order to certify the usage of DNNs in safety critical systems, it is essential to prove the correctness of data within the system. In this thesis, the research is focused on investigating the sources of uncertainty, what effects various sources of uncertainty has on NNs, and how it is possible to reduce uncertainty within an NN. Probabilistic methods are used to implement an NN with uncertainty estimation to analyze and evaluate how the integrity of the NN is affected. By analyzing and discussing the effects of uncertainty in an NN it is possible to understand the importance of including a method of estimating uncertainty. Preventing, reducing, or removing the presence of uncertainty in such a network improves the correctness of data within the system. With the implementation of the NN, results show that estimating uncertainty makes it possible to identify and classify the presence of uncertainty in the system and reduce the uncertainty to achieve an increased level of integrity, which improves the correctness of the predictions. Uncertainty Deep Neural Network Bayesian Neural Network Dependability Integrity Probability
28	RMNv2: Reduced Mobilenet V2 An Efficient Lightweight Model for Hardware Deployment MANEESH AYI (8735112) 22 April 2020 (has links) Humans can visually see things and can differentiate objects easily but for computers, it is not that easy. Computer Vision is an interdisciplinary field that allows computers to comprehend, from digital videos and images, and differentiate objects. With the Introduction to CNNs/DNNs, computer vision is tremendously used in applications like ADAS, robotics and autonomous systems, etc. This thesis aims to propose an architecture, RMNv2, that is well suited for computer vision applications such as ADAS, etc.<br><div>RMNv2 is inspired by its original architecture Mobilenet V2. It is a modified version of Mobilenet V2. It includes changes like disabling downsample layers, Heterogeneous kernel-based convolutions, mish activation, and auto augmentation. The proposed model is trained from scratch in the CIFAR10 dataset and produced an accuracy of 92.4% with a total number of parameters of 1.06M. The results indicate that the proposed model has a model size of 4.3MB which is like a 52.2% decrease from its original implementation. Due to its less size and competitive accuracy the proposed model can be easily deployed in resource-constrained devices like mobile and embedded devices for applications like ADAS etc. Further, the proposed model is also implemented in real-time embedded devices like NXP Bluebox 2.0 and NXP i.MX RT1060 for image classification tasks. <br></div> Computer Engineering convolution neural network Deep Neural Network (DNN) embedded systems Bluebox 2.0
29	Optimizing Deep Neural Networks for Classification of Short Texts Pettersson, Fredrik January 2019 (has links) This master's thesis investigates how a state-of-the-art (SOTA) deep neural network (NN) model can be created for a specific natural language processing (NLP) dataset, the effects of using different dimensionality reduction techniques on common pre-trained word embeddings and how well this model generalize on a secondary dataset. The research is motivated by two factors. One is that the construction of a machine learning (ML) text classification (TC) model is typically done around a specific dataset and often requires a lot of manual intervention. It's therefore hard to know exactly what procedures to implement for a specific dataset and how the result will be affected. The other reason is that, if the dimensionality of pre-trained embedding vectors can be lowered without losing accuracy, and thus saving execution time, other techniques can be used during the time saved to achieve even higher accuracy. A handful of deep neural network architectures are used, namely a convolutional neural network (CNN), long short-term memory neural network (LSTM) and a bidirectional LSTM (Bi-LSTM) architecture. These deep neural network architectures are combined with four different word embeddings: GoogleNews-vectors-negative300, glove.840B.300d, paragram_300_sl999 and wiki-news-300d-1M. Three main experiments are conducted in this thesis. In the first experiment, a top-performing TC model is created for a recent NLP competition held at Kaggle.com. Each implemented procedure is benchmarked on how the accuracy and execution time of the model is affected. In the second experiment, principal component analysis (PCA) and random projection (RP) are applied to the pre-trained word embeddings used in the top-performing model to investigate how the accuracy and execution time is affected when creating lower-dimensional embedding vectors. In the third experiment, the same model is benchmarked on a separate dataset (Sentiment140) to investigate how well it generalizes on other data and how each implemented procedure affects the accuracy compared to on the original dataset. The first experiment results in a bidirectional LSTM model and a combination of the three embeddings: glove, paragram and wiki-news concatenated together. The model is able to give predictions with an F1 score of 71% which is good enough to reach 9th place out of 1,401 participating teams in the competition. In the second experiment, the execution time is improved by 13%, by using PCA, while lowering the dimensionality of the embeddings by 66% and only losing half a percent of F1 accuracy. RP gave a constant accuracy of 66-67% regardless of the projected dimensions compared to over 70% when using PCA. In the third experiment, the model gained around 12% accuracy from the initial to the final benchmarks, compared to 19% on the competition dataset. The best-achieved accuracy on the Sentiment140 dataset is 86% and thus higher than the 71% achieved on the Quora dataset. Machine learning Text classification state-of-the-art SOTA Deep neural network Dimensionality reduction Word embeddings Computer and Information Sciences Data- och informationsvetenskap
30	A contemporary machine learning approach to detect transportation mode - A case study of Borlänge, Sweden Golshan, Arman January 2020 (has links) Understanding travel behavior and identifying the mode of transportation are essential for adequate urban devising and transportation planning. Global positioning systems (GPS) tracking data is mainly used to find human mobility patterns in cities. Some travel information, such as most visited location, temporal changes, and the trip speed, can be easily extracted from GPS raw tracking data. GPS trajectories can be used as a method to indicate the mobility modes of commuters. Most previous studies have applied traditional machine learning algorithms and manually computed data features, making the model error-prone. Thus, there is a demand for developing a new model to resolve these methods' weaknesses. The primary purpose of this study is to propose a semi-supervised model to identify transportation mode by using a contemporary machine learning algorithm and GPS tracking data. The model can accept GPS trajectory with adjustable length and extracts their latent information with LSTM Autoencoder. This study adopts a deep neural network architecture with three hidden layers to map the latent information to detect transportation mode. Moreover, different case studies are performed to evaluate the proposed model's efficiency. The model results in an accuracy of 93.6%, which significantly outperforms similar studies. GPS data Semi-supervised learning Transport mode detection LSTM Autoencoder Deep Neural Network Computer and Information Sciences Data- och informationsvetenskap

Search results