Global ETD Search

1	Graph-based Multi-ODE Neural Networks for Spatio-Temporal Traffic Forecasting Liu, Zibo 20 December 2022 (has links) There is a recent surge in the development of spatio-temporal forecasting models in many applications, and traffic forecasting is one of the most important ones. Long-range traffic forecasting, however, remains a challenging task due to the intricate and extensive spatio-temporal correlations observed in traffic networks. Current works primarily rely on road networks with graph structures and learn representations using graph neural networks (GNNs), but this approach suffers from over-smoothing problem in deep architectures. To tackle this problem, recent methods introduced the combination of GNNs with residual connections or neural ordinary differential equations (NODEs). The existing graph ODE models are still limited in feature extraction due to (1) having bias towards global temporal patterns and ignoring local patterns which are crucial in case of unexpected events; (2) missing dynamic semantic edges in the model architecture; and (3) using simple aggregation layers that disregard the high-dimensional feature correlations. In this thesis, we propose a novel architecture called Graph-based Multi-ODE Neural Networks (GRAM-ODE) which is designed with multiple connective ODE-GNN modules to learn better representations by capturing different views of complex local and global dynamic spatio-temporal dependencies. We also add some techniques to further improve the communication between different ODE-GNN modules towards the forecasting task. Extensive experiments conducted on four real-world datasets demonstrate the outperformance of GRAM-ODE compared with state-of-the-art baselines as well as the contribution of different GRAM-ODE components to the performance. / Master of Science / There is a recent surge in the development of spatio-temporal forecasting models in many applications, and traffic forecasting is one of the most important ones. In traffic forecasting, current works limited in correctly capturing the key correlation of spatial and temporal patterns. In this thesis, we propose a novel architecture called Graph-based Multi-ODE Neural Networks (GRAM-ODE) to tackle the problem by using the separate ODE modules to deal with spatial and temporal patterns and further improve the communication between different modules. Extensive experiments conducted on four real-world datasets demonstrate the outperformance of GRAM-ODE compared with state-of-the-art baselines. Traffic Forecasting Neural ODE Attention Mechanism
2	Multi-Kernel Deformable 3D Convolution for Video Super-Resolution Dou, Tianyu 17 September 2021 (has links) Video super-resolution (VSR) methods align and fuse consecutive low-resolution frames to generate high-resolution frames. One of the main difficulties for the VSR process is that video contains various motions, and the accuracy of motion estimation dramatically affects the quality of video restoration. However, standard CNNs share the same receptive field in each layer, and it is challenging to estimate diverse motions effectively. Neuroscience research has shown that the receptive fields of biological visual areas will be adjusted according to the input information. Diverse receptive fields in temporal and spatial dimensions have the potential to adapt to various motions, which is rarely paid attention in most known VSR methods. In this thesis, we propose to provide adaptive receptive fields for the VSR model. Firstly, we design a multi-kernel 3D convolution network and integrate it with a multi-kernel deformable convolution network for motion estimation and multiple frames alignment. Secondly, we propose a 2D multi-kernel convolution framework to improve texture restoration quality. Our experimental results show that the proposed framework outperforms the state-of-the-art VSR methods. Attention mechanism CNN Deformable convolution Separate 3D convolution Video super-resolution
3	Analýza polygonálních modelů pomocí neuronových sítí / Analysis of Polygonal Models Using Neural Networks Dronzeková, Michaela January 2020 (has links) This thesis deals with rotation estimation of 3D model of human jaw. It describes and compares methods for direct analysis od 3D models as well as method to analyze model using rasterization. To evaluate perfomance of proposed method, a metric that computes number of cases when prediction was less than 30° from ground truth is used. Proposed method that uses rasterization, takes three x-ray views of model as an input and processes it with convolutional network. It achieves best preformance, 99% with described metric. Method to directly analyze polygonal model as a sequence uses attention mechanism to do so and was inspired by transformer architecture. A special pooling function was proposed for this network that decreases memory requirements of the network. This method achieves 88%, but does not use rasterization and can process polygonal model directly. It is not as good as rasterization method with x-ray display, byt it is better than rasterization method with model not rendered as x-ray. The last method uses graph representation of mesh. Graph network had problems with overfitting, that is why it did not get good results and I think this method is not very suitable for analyzing plygonal model.
4	Real-Time Video Object Detection with Temporal Feature Aggregation Chen, Meihong 05 October 2021 (has links) In recent years, various high-performance networks have been proposed for single-image object detection. An obvious choice is to design a video detection network based on state-of-the-art single-image detectors. However, video object detection is still challenging due to the lower quality of individual frames in a video, and hence the need to include temporal information for high-quality detection results. In this thesis, we design a novel interleaved architecture combining a 2D convolutional network and a 3D temporal network. We utilize Yolov3 as the base detector. To explore inter-frame information, we propose feature aggregation based on a temporal network. Our temporal network utilizes Appearance-preserving 3D convolution (AP3D) for extracting aligned features in the temporal dimension. Our multi-scale detector and multi-scale temporal network communicate at each scale and also across scales. The number of inputs of our temporal network can be either 4, 8, or 16 frames in this thesis and correspondingly we name our temporal network TemporalNet-4, TemporalNet-8 and TemporalNet-16. Our approach achieves 77.1\% mAP (mean Average Precision) on ImageNet VID 2017 dataset with TemporalNet-4, where TemporalNet-16 achieves 80.9\% mAP which is a competitive result on this video object detection benchmark. Our network is also real-time with a running time of 35ms/frame. Attention Mechanism AP3D CNN Octave Convolution One-Stage Detection Video Object Detection
5	Development of Novel Attention-Aware Deep Learning Models and Their Applications in Computer Vision and Dynamical System Calibration Maftouni, Maede 12 July 2023 (has links) In recent years, deep learning has revolutionized computer vision and natural language processing tasks, but the black-box nature of these models poses significant challenges for their interpretability and reliability, especially in critical applications such as healthcare. To address this, attention-based methods have been proposed to enhance the focus and interpretability of deep learning models. In this dissertation, we investigate the effectiveness of attention mechanisms in improving prediction and modeling tasks across different domains. We propose three essays that utilize task-specific designed trainable attention modules in manufacturing, healthcare, and system identification applications. In essay 1, we introduce a novel computer vision tool that tracks the melt pool in X-ray images of laser powder bed fusion using attention modules. In essay 2, we present a mask-guided attention (MGA) classifier for COVID-19 classification on lung CT scan images. The MGA classifier incorporates lesion masks to improve both the accuracy and interpretability of the model, outperforming state-of-the-art models with limited training data. Finally, in essay 3, we propose a Transformer-based model, utilizing self-attention mechanisms, for parameter estimation in system dynamics models that outpaces the conventional system calibration methods. Overall, our results demonstrate the effectiveness of attention-based methods in improving deep learning model performance and reliability in diverse applications. / Doctor of Philosophy / Deep learning, a type of artificial intelligence, has brought significant advancements to tasks like recognizing images or understanding texts. However, the inner workings of these models are often not transparent, which can make it difficult to comprehend and have confidence in their decision-making processes. Transparency is particularly important in areas like healthcare, where understanding why a decision was made can be as crucial as the decision itself. To help with this, we've been exploring an interpretable tool that helps the computer focus on the most important parts of the data, which we call the ``attention module''. Inspired by the human perception system, these modules focus more on certain important details, similar to how our eyes might be drawn to a familiar face in a crowded room. We propose three essays that utilize task-specific attention modules in manufacturing, healthcare, and system identification applications. In essay one, we introduce a computer vision tool that tracks a moving object in a manufacturing X-ray image sequence using attention modules. In the second essay, we discuss a new deep learning model that uses focused attention on lung lesions for more accurate COVID-19 detection on CT scan images, outperforming other top models even with less training data. In essay three, we propose an attention-based deep learning model for faster parameter estimation in system dynamics models. Overall, our research shows that attention-based methods can enhance the performance, transparency, and usability of deep learning models across diverse applications. Deep Learning Attention Mechanism Transformer Computer Vision Video Object Segmentation Manufacturing Healthcare System Dynamics
6	Unsupervised Video Summarization Using Adversarial Graph-Based Attention Network Gunuganti, Jeshmitha 05 June 2023 (has links) No description available. Computer Science Video Summarization Key-Frame Extraction Unsupervised Attention mechanism Graph modeling VAE-GAN
7	Social-pose : Human Trajectory Prediction using Input Pose Gao, Yang January 2022 (has links) In this work, we study the benefits of predicting human trajectories using human body poses instead of solely their x-y locations in time. We propose ‘Social-pose’, an attention-based pose encoder that encodes the poses of all humans in the scene and their social relations. Our method can be used as a plugin to any existing trajectory predictor. We explore the advantages to use 2D versus 3D poses, as well as a limited set of poses. We also investigate the attention map to find out which frames of poses are critical to improve human trajectory prediction. We have done extensive experiments on state-of-the-art models (based on LSTMs, GANs and transformers), and showed improvements over all of them on synthetic (Joint Track Auto) and real (Human3.6M and Pedestrians and Cyclists in Road Traffic) datasets. / I det här arbetet studerar vi fördelarna med att förutsäga mänskliga banor med hjälp av människokroppspositioner istället för enbart deras x-y-positioner i tiden. Vi föreslår ”Social-pose”, en uppmärksamhetsbaserad poseringskodare som kodar poserna för alla människor på scenen och deras sociala relationer. Vår metod kan användas som en plugin till vilken befintlig bana som helst. Vi utforskar fördelarna med att använda 2D kontra 3D poser, såväl som en begränsad uppsättning poser. Vi undersöker också uppmärksamhetskartan för att ta reda på vilka ramar av poser som är avgörande för att förbättra förutsägelsen av mänsklig bana. Vi har gjort omfattande experiment på toppmoderna modeller (baserade på LSTM, GAN och transformers) och visat förbättringar jämfört med dem alla på syntetiska (Joint Track Auto) och riktiga (Human3.6M och Fotgängare och cyklister på vägen) trafik) datauppsättningar. Humantrajectory prediction pose attention mechanism Mänsklig bana förutsägelse pose uppmärksamhet mekanism Computer and Information Sciences Data- och informationsvetenskap
8	Evaluation of Attention Mechanisms for Just-In-Time Software Defect Prediction / En Utvärdering av Attention Mechanisms för Just-In-Time Software Defect Prediction Isunza Navarro, Abgeiba Yaroslava January 2020 (has links) Just-In-Time Software Defect Prediction (JIT-DP) focuses on predicting errors in software at change-level with the objective of helping developers identify defects while the development process is still ongoing, and improving the quality of software applications. This work studies deep learning techniques by applying attention mechanisms that have been successful in, among others, Natural Language Processing (NLP) tasks. We introduce two networks named Convolutional Neural Network with Bidirectional Attention (BACNN) and Bidirectional Attention Code Network (BACoN) that employ a bi-directional attention mechanism between the code and message of a software change. Furthermore, we examine BERT [17] and RoBERTa [57] attention architectures for JIT-DP. More specifically, we study the effectiveness of the aforementioned attention-based models to predict defective commits compared to the current state of the art, DeepJIT [37] and TLEL [101]. Our experiments evaluate the models by using software changes from the OpenStack open source project. The results showed that attention-based networks outperformed the baseline models in terms of accuracy in the different evaluation settings. The attention-based models, particularly BERT and RoBERTa architectures, demonstrated promising results in identifying defective software changes and proved to be effective in predicting defects in changes of new software releases. / Just-In-Time Defect Prediction (JIT-DP) fokuserar på att förutspå fel i mjukvara vid ändringar i koden, med målet att hjälpa utvecklare att identifiera defekter medan utvecklingsprocessen fortfarande är pågående, och att förbättra kvaliteten hos applikationsprogramvara. Detta arbete studerar djupinlärningstekniker genom att tillämpa attentionmekanismer som har varit framgångsrika inom, bland annat, språkteknologi (NLP). Vi introducerar två nätverk vid namn Convolutional Neural Network with Bidirectional Attention (BACNN), och Bidirectional Attention Code Network (BACoN), som använder en tvåriktad attentionmekanism mellan koden och meddelandet om en mjukvaruändring. Dessutom undersöker vi BERT [17] och RoBERTa [57], attentionarkitekturer för JIT-DP. Mer specifikt studerar vi hur effektivt dessa attentionbaserade modeller kan förutspå defekta ändringar, och jämför dem med de bästa tillgängliga arkitekturerna DeePJIT [37] och TLEL [101]. Våra experiment utvärderar modellerna genom att använda mjukvaruändringar från det öppna källkodsprojektet OpenStack. Våra resultat visar att attentionbaserade nätverk överträffar referensmodellen sett till träffsäkerheten i de olika scenarierna. De attentionbaserade modellerna, framför allt BERT och RoBERTa, demonstrerade lovade resultat när det kommer till att identifiera defekta mjukvaruändringar och visade sig vara effektiva på att förutspå defekter i ändringar av nya mjukvaruversioner. Just-in-Time Software Defect Prediction Attention Mechanism Convolutional Neural Network Feature Extraction Just-in-Time Software Defect Prediction Attention Mechanism Convolutional Neural Network Feature Extraction Computer and Information Sciences Data- och informationsvetenskap
9	Interaction-Aware Vehicle Trajectory Prediction via Attention Mechanism and Beyond Wu, Wenxuan January 2022 (has links) With the development of autonomous driving technology, vehicle trajectory prediction has become a hot topic in the intelligent traffic area. However, complex road conditions may bring multiple challenges to the vehicle trajectory prediction model. To address this, most recent studies mainly focus on designing different neural network structures to learn vehicles’ dynamics and interaction features for better prediction. In this thesis we restrict our research scope to highway scenarios. Based on the experimental comparison among Vanilla Recurrent Neural Network (Vanilla RNN), Vanilla Long short-term memory (Vanilla LSTM), and Vanilla-Transformer, we find the best configuration of the Dynamics-Only encoder module and utilize it to design a novel model called the LSTM-Attention model for vehicle trajectory prediction. The objective of our design is to explore whether the Self-Attention mechanism based encoder outperforms the pooling mechanism based encoder utilized in most current baseline models. The experiment results on the interaction encoder module show that the Self- Attention mechanism based encoder with 8 heads outperforms the pooling mechanism based encoder for the longer prediction horizons. To test the robustness of our LSTM-Attention model, we also compare the prediction performance between using Maneuver-Based decoder and using Maneuver-Free decoder, respectively. According to the experiment results, we find the Maneuver-Based decoder performs better on the heavily unbalanced Next Generation Simulation (NGSIM) dataset. Finally, to explore other latent interaction features our LSTM-Attention model might fuse, we analyze the Graph-Based encoder and the Polar-Based encoder, respectively. Based on this, we find more meaningful designs that could be exploited in our future work. / Med utvecklingen av självkörande fordon har förmågan att förutsäga fordonsbanan blivit ett attraktivt ämne inom intelligenta trafiksystem. Däremot kan komplexa vägförhållanden medföra flera utmaningar för modellering av fordonets bana. För att ta itu med detta fokuserar de senaste studierna huvudsakligen på att designa olika neurala nätverksstrukturer för att lära sig fordons dynamiker och interaktioner för bättre kunna förutsäga resebanan. I denna avhandling begränsar vi vårt forskningsområde till motorvägsscenarier. Baserat på den experimentella jamförelsen mellan Vanilla Recurrent Neural Network (Vanilla RNN), Vanilla Long-korttidsminne (Vanilla LSTM) och Vanilla-Transformer, hittar vi den bästa konfigurationen av Dynamic-Only kodningsmodulen och använder den för att designa en enkel modell som vi kallar LSTM- Attention-modellen för förutsägelse av fordonets resebana. Målet med vår design är att undersöka om den Self-Attention-baserade kodaren överträffar den pooling-baserade kodaren som används i de flesta nuvarande basmodeller. Experimentens resultat på interaktionskodarmodulen visar att Self-Attention kodaren med 8 huvuden överträffar den poolning baserade kodaren när de gäller längre fönster av förutsägelser. För att testa robustheten hos vår LSTM-Attention-modell, jämför vi också prestandan mellan att använda manöverbaserad avkodare respektive att använda manöverfri avkodare. Enligt experimentens resultat finner vi att den manöverbaserade avkodaren presterar bättre på den kraftigt obalanserade Next Generation Simulation (NGSIM) datamängden. Slutligen, för att utforska andra möjliga egenskaper som vår LSTM-Attention-modell kan utnytja, analyserar vi den grafbaserade kodaren respektive den polbaserade kodaren. Baserat på detta så hittar vi mer meningsfulla mönster som skulle kunna utnyttjas i framtida arbeten. Trajectory Prediction Dynamics Features Interaction Features Self-Attention Mechanism Banförutsägelse Dynamiska Funktioner Interaktions Funktioner Självuppmärksamhets Engineering and Technology Teknik och teknologier
10	A Graph Attention plus Reinforcement Learning Method for Antenna Tilt Optimization Ma, Tengfei January 2021 (has links) Remote Electrical Tilt optimization is an effective method to obtain the optimal Key Performance Indicators (KPIs) by remotely controlling the base station antenna’s vertical tilt. To improve the KPIs aims to improve antennas’ cooperation effect since KPIs measure the quality of cooperation between the antenna to be optimized and its neighbor antennas. Reinforcement Learning (RL) is an appropriate method to learn an antenna tilt control policy since the agent in RL can generate the optimal epsilon greedy tilt optimization policy by observing the environment and learning from the state- action pairs. However, existing models only produced tilt modification strategies by interpreting the to- be- optimized antenna’s features, which cannot fully characterize the mobile cellular network formed by the to- be- optimized antenna and its neighbors. Therefore, incorporating the features of the neighboring antennas into the model is an important measure to improve the optimization strategy. This work will introduce the Graph Attention Network to model the neighborhood antenna’s impact on the antenna to be optimized through the attention mechanism. Furthermore, it will generate a low- dimensional embedding vector with more expressive power to represent the to- be- optimized antenna’s state in the RL framework through dealing with graph- structural data. This new model, namely Graph Attention Q- Network (GAQ), is a model based on DQN and aims to acquire a higher performance than the Deep Q- Network (DQN) model, which is the baseline, evaluated by the same metric — KPI Improvement. Since GAQ has a richer perception of the environment than the vanilla DQN model, it thereby outperforms the DQN model, obtaining fourteen percent performance improvement compared to the baseline. Besides, GAQ also performs 14 per cent better than DQN in terms of convergence efficiency. / Optimering av fjärrlutning är en effektiv metod för att nå optimala nyckeltal genom fjärrstyrning av den vertikala lutningen av en antenn i en basstation. Att förbättra nyckeltalen innebär att förbättra sammarbetseffekten mellan antenner eftersom nyckeltalen är mått på kvalitén av sammarbetet mellan den antenn som optimeras och dess angränsande antenner. Förstärkande Inlärning (FI) är en lämplig metod för att lära sig en optimal strategi för reglering av antennlutningen eftersom agenten inom FI kan generera den optimala epsilongiriga optimeringsstrategin genom att observera miljön och lära sig från par av tillstånd och aktioner. Nuvarande modeller genererar dock endast lutningsstrategier genom att tolka egenskaperna hos den antenn som ska optimeras, vilket inte är tillräckligt för att karatärisera mobilnätverket bestående av antennen som ska optimeras samt dess angränsande antenner. Därav är inkluderingen av de angränsande antennernas egenskaper i modellen viktig för att förbättra optimeringsstrategin. Detta arbete introducerar Graf- Uppmärksammat Nätverk för att modellera de angränsande antennernas påverkan på den antenn som ska optimeras genom uppmärksamhetsmekanismen. Metoden genererar en lågdimensionell vektor med större förmåga att representera den optimerade antennens tillstånd i FI modellen genom att hantera data i struktur av en graf. Den nya modellen, Graf- Uppmärksammat Q- Nätverk (GUQ), är en modell baserad på DQN med mål att nå bättre prestanda än en standard DQN- modell, utvärderat efter samma mätvärde –– förbättring av nyckeltalen. Eftersom GUQ har en större upfattning av miljön så överträffar metoden DQN- modellen genom en fjorton procent bättre prestandaökning. Dessutom, så överträffar GUQ även DQN i form av snabbare konvergens. Graph Attention Reinforcement Learning Antenna Tilt Optimization 5G Attention Mechanism Graph DQN Back- Propagation Gradient Descent Computer and Information Sciences Data- och informationsvetenskap

Search results