Spelling suggestions: "subject:"selfattention"" "subject:"selfintervention""
1 |
Multi-Object Tracking Using Dual-Attention with Regional-RepresentationChen, Weijian January 2021 (has links)
Nowadays, researchers have shown convolutional neural network (CNN) can achieve an improved performance in multi-object tracking (MOT) by performing detection and re-identification (ReID) simultaneously. Many models have been created to overcome challenges and bring the state-of-the-art performance to a new level. However, due to the fact the CNN models only utilize feature from a local region, the potential of the model has not been fully utilized. The long range dependencies in spatial domain are usually difficult for a network to capture. Hence, how to obtain such dependencies has become the new focus in MOT field. One approach is to adopt the self-attention mechanism named transformer. Since it was successfully transferred from natural language processing to computer vision, many recent works have implemented it to their trackers. With the introduce of global information, the trackers become more robust and stable. There are also traditional methods which are re-designed in the manner of CNN and achieve satisfying performance such as optical flow. It can generate a correlated relation between feature maps and also obtain non-local information. However, the introduces of these mechanism usually causes a significant surge in computational power and memory. They also requires huge amount of epochs to train thus the training time is largely increased. To solve this issue, we propose a new method to gather non-local information based on the existing self-attention methods, we named it dual attention with regional-representation, which significantly reduces the training time as well as the inference time, but only causes a small increase in computational memory and are able to run with a reasonable speed. Our experiments shows this module can help the ReID be more stable to improve the performance in different tasks. / Thesis / Master of Applied Science (MASc)
|
2 |
Long Document Understanding using Hierarchical Self Attention NetworksKekuda, Akshay January 2022 (has links)
No description available.
|
3 |
Model of detection of phishing URLsbased on machine learningBurbela, Kateryna January 2023 (has links)
Background: Phishing attacks continue to pose a significant threat to internetsecurity. One of the most common forms of phishing is through URLs, whereattackers disguise malicious URLs as legitimate ones to trick users into clickingon them. Machine learning techniques have shown promise in detecting phishingURLs, but their effectiveness can vary depending on the approach used.Objectives: The objective of this research is to propose an ensemble of twomachine learning techniques, Convolutional Neural Networks (CNN) and MultiHead Self-Attention (MHSA), for detecting phishing URLs. The goal is toevaluate and compare the effectiveness of this approach against other methodsand models.Methods: a dataset of URLs was collected and labeled as either phishing orlegitimate. The performance of several models using different machine learningtechniques, including CNN and MHSA, to classify these URLs was evaluatedusing various metrics, such as accuracy, precision, recall, and F1-score.Results: The results show that the ensemble of CNN and MHSA outperformsother individual models and achieves an accuracy of 98.3%. Which comparing tothe existing state-of-the-art techniques provides significant improvements indetecting phishing URLs.Conclusions: In conclusion, the ensemble of CNN and MHSA is an effectiveapproach for detecting phishing URLs. The method outperforms existing state-ofthe-art techniques, providing a more accurate and reliable method for detectingphishing URLs. The results of this study demonstrate the potential of ensemblemethods in improving the accuracy and reliability of machine learning-basedphishing URL detection.
|
4 |
Title-based video summarization using attention networksLi, Changwei 23 August 2022 (has links)
No description available.
|
5 |
Interaction-Aware Vehicle Trajectory Prediction via Attention Mechanism and BeyondWu, Wenxuan January 2022 (has links)
With the development of autonomous driving technology, vehicle trajectory prediction has become a hot topic in the intelligent traffic area. However, complex road conditions may bring multiple challenges to the vehicle trajectory prediction model. To address this, most recent studies mainly focus on designing different neural network structures to learn vehicles’ dynamics and interaction features for better prediction. In this thesis we restrict our research scope to highway scenarios. Based on the experimental comparison among Vanilla Recurrent Neural Network (Vanilla RNN), Vanilla Long short-term memory (Vanilla LSTM), and Vanilla-Transformer, we find the best configuration of the Dynamics-Only encoder module and utilize it to design a novel model called the LSTM-Attention model for vehicle trajectory prediction. The objective of our design is to explore whether the Self-Attention mechanism based encoder outperforms the pooling mechanism based encoder utilized in most current baseline models. The experiment results on the interaction encoder module show that the Self- Attention mechanism based encoder with 8 heads outperforms the pooling mechanism based encoder for the longer prediction horizons. To test the robustness of our LSTM-Attention model, we also compare the prediction performance between using Maneuver-Based decoder and using Maneuver-Free decoder, respectively. According to the experiment results, we find the Maneuver-Based decoder performs better on the heavily unbalanced Next Generation Simulation (NGSIM) dataset. Finally, to explore other latent interaction features our LSTM-Attention model might fuse, we analyze the Graph-Based encoder and the Polar-Based encoder, respectively. Based on this, we find more meaningful designs that could be exploited in our future work. / Med utvecklingen av självkörande fordon har förmågan att förutsäga fordonsbanan blivit ett attraktivt ämne inom intelligenta trafiksystem. Däremot kan komplexa vägförhållanden medföra flera utmaningar för modellering av fordonets bana. För att ta itu med detta fokuserar de senaste studierna huvudsakligen på att designa olika neurala nätverksstrukturer för att lära sig fordons dynamiker och interaktioner för bättre kunna förutsäga resebanan. I denna avhandling begränsar vi vårt forskningsområde till motorvägsscenarier. Baserat på den experimentella jamförelsen mellan Vanilla Recurrent Neural Network (Vanilla RNN), Vanilla Long-korttidsminne (Vanilla LSTM) och Vanilla-Transformer, hittar vi den bästa konfigurationen av Dynamic-Only kodningsmodulen och använder den för att designa en enkel modell som vi kallar LSTM- Attention-modellen för förutsägelse av fordonets resebana. Målet med vår design är att undersöka om den Self-Attention-baserade kodaren överträffar den pooling-baserade kodaren som används i de flesta nuvarande basmodeller. Experimentens resultat på interaktionskodarmodulen visar att Self-Attention kodaren med 8 huvuden överträffar den poolning baserade kodaren när de gäller längre fönster av förutsägelser. För att testa robustheten hos vår LSTM-Attention-modell, jämför vi också prestandan mellan att använda manöverbaserad avkodare respektive att använda manöverfri avkodare. Enligt experimentens resultat finner vi att den manöverbaserade avkodaren presterar bättre på den kraftigt obalanserade Next Generation Simulation (NGSIM) datamängden. Slutligen, för att utforska andra möjliga egenskaper som vår LSTM-Attention-modell kan utnytja, analyserar vi den grafbaserade kodaren respektive den polbaserade kodaren. Baserat på detta så hittar vi mer meningsfulla mönster som skulle kunna utnyttjas i framtida arbeten.
|
6 |
Using Bidirectional Encoder Representations from Transformers for Conversational Machine Comprehension / Användning av BERT-språkmodell för konversationsförståelseGogoulou, Evangelina January 2019 (has links)
Bidirectional Encoder Representations from Transformers (BERT) is a recently proposed language representation model, designed to pre-train deep bidirectional representations, with the goal of extracting context-sensitive features from an input text [1]. One of the challenging problems in the field of Natural Language Processing is Conversational Machine Comprehension (CMC). Given a context passage, a conversational question and the conversational history, the system should predict the answer span of the question in the context passage. The main challenge in this task is how to effectively encode the conversational history into the prediction of the next answer. In this thesis work, we investigate the use of the BERT language model for the CMC task. We propose a new architecture, named BERT-CMC, using the BERT model as a base. This architecture includes a new module for encoding the conversational history, inspired by the Transformer-XL model [2]. This module serves the role of memory throughout the conversation. The proposed model is trained and evaluated on the Conversational Question Answering dataset (CoQA) [3]. Our hypothesis is that the BERT-CMC model will effectively learn the underlying context of the conversation, leading to better performance than the baseline model proposed for CoQA. Our results of evaluating the BERT-CMC on the CoQA dataset show that the model performs poorly (44.7% F1 score), comparing to the CoQA baseline model (66.2% F1 score). In the light of model explainability, we also perform a qualitative analysis of the model behavior in questions with various linguistic phenomena eg coreference, pragmatic reasoning. Additionally, we motivate the critical design choices made, by performing an ablation study of the effect of these choices on the model performance. The results suggest that fine tuning the BERT layers boost the model performance. Moreover, it is shown that increasing the number of extra layers on top of BERT leads to bigger capacity of the conversational memory. / Bidirectional Encoder Representations from Transformers (BERT) är en nyligen föreslagen språkrepresentationsmodell, utformad för att förträna djupa dubbelriktade representationer, med målet att extrahera kontextkänsliga särdrag från en inmatningstext [1]. Ett utmanande problem inom området naturligtspråkbehandling är konversationsförståelse (förkortat CMC). Givet en bakgrundstext, en fråga och konversationshistoriken ska systemet förutsäga vilken del av bakgrundstexten som utgör svaret på frågan. Den viktigaste utmaningen i denna uppgift är hur man effektivt kan kodifiera konversationshistoriken i förutsägelsen av nästa svar. I detta examensarbete undersöker vi användningen av BERT-språkmodellen för CMC-uppgiften. Vi föreslår en ny arkitektur med namnet BERT-CMC med BERT-modellen som bas. Denna arkitektur innehåller en ny modul för kodning av konversationshistoriken, inspirerad av Transformer-XL-modellen [2]. Den här modulen tjänar minnets roll under hela konversationen. Den föreslagna modellen tränas och utvärderas på en datamängd för samtalsfrågesvar (CoQA) [3]. Vår hypotes är att BERT-CMC-modellen effektivt kommer att lära sig det underliggande sammanhanget för konversationen, vilket leder till bättre resultat än basmodellen som har föreslagits för CoQA. Våra resultat av utvärdering av BERT-CMC på CoQA-datasetet visar att modellen fungerar dåligt (44.7% F1 resultat), jämfört med CoQAbasmodellen (66.2% F1 resultat). För att bättre kunna förklara modellen utför vi också en kvalitativ analys av modellbeteendet i frågor med olika språkliga fenomen, t.ex. koreferens, pragmatiska resonemang. Dessutom motiverar vi de kritiska designvalen som gjorts genom att utföra en ablationsstudie av effekten av dessa val på modellens prestanda. Resultaten tyder på att finjustering av BERT-lager ökar modellens prestanda. Dessutom visas att ökning av antalet extra lager ovanpå BERT leder till större konversationsminne.
|
7 |
Learning Pose and State-Invariant Object Representations for Fine-Grained Recognition and RetrievalRohan Sarkar (19065215) 11 July 2024 (has links)
<p dir="ltr">Object Recognition and Retrieval is a fundamental problem in Computer Vision that involves
recognizing objects and retrieving similar object images through visual queries. While
deep metric learning is commonly employed to learn image embeddings for solving such
problems, the representations learned using existing methods are not robust to changes in
viewpoint, pose, and object state, especially for fine-grained recognition and retrieval tasks.
To overcome these limitations, this dissertation aims to learn robust object representations
that remain invariant to such transformations for fine-grained tasks. First, it focuses on
learning dual pose-invariant embeddings to facilitate recognition and retrieval at both the
category and finer object-identity levels by learning category and object-identity specific representations
in separate embedding spaces simultaneously. For this, the PiRO framework is
introduced that utilizes an attention-based dual encoder architecture and novel pose-invariant
ranking losses for each embedding space to disentangle the category and object representations
while learning pose-invariant features. Second, the dissertation introduces ranking
losses that cluster multi-view images of an object together in both the embedding spaces
while simultaneously pulling the embeddings of two objects from the same category closer in
the category embedding space to learn fundamental category-specific attributes and pushing
them apart in the object embedding space to learn discriminative features to distinguish
between them. Third, the dissertation addresses state-invariance and introduces a novel ObjectsWithStateChange
dataset to facilitate research in recognizing fine-grained objects with
state changes involving structural transformations in addition to pose and viewpoint changes.
Fourth, it proposes a curriculum learning strategy to progressively sample object images that
are harder to distinguish for training the model, enhancing its ability to capture discriminative
features for fine-grained tasks amidst state changes and other transformations. Experimental
evaluations demonstrate significant improvements in object recognition and retrieval
performance compared to previous methods, validating the effectiveness of the proposed
approaches across several challenging datasets under various transformations.</p>
|
8 |
Learning Discriminative Neural Representations for Visual Recognition / 画像認識のための識別性の高いニューラル表現の学習Cai, Sudong 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第25424号 / 情博第862号 / 新制||情||144(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 西野 恒, 教授 鹿島 久嗣, 教授 阿久津 達也 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
9 |
Attention based Knowledge Tracing in a language learning settingVergunst, Sebastiaan January 2022 (has links)
Knowledge Tracing aims to predict future performance of users of learning platforms based on historical data, by modeling their knowledge state. In this task, the target is a binary variable representing the correctness of the exercise, where an exercise is a word uttered by the user. Current state-of-the-art models add attention layers to autoregressive models or rely on self-attention networks. However, these models are built on publicly available datasets that lack useful information about the interactions users have with exercises. In this work, various techniques are introduced that allow for the incorporation of additional information made available in a dataset provided by Astrid Education. They consist of encoding a time dimension, modeling the skill needed for each exercise explicitly, and adjusting the length of the interaction sequence. Introducing new information to the Knowledge Tracing framework allows Astrid to craft a more personalized experience for its users; thus fulfilling the purpose and goal of the thesis. Additionally, we perform experiments to understand what aspects influence the models. Results show that modeling the skills needed to solve an exercise using an encoding strategy and reducing the length of the interaction sequence lead to improvements in terms of both accuracy and AUC. The time-encoding did not lead to better results, further experimentation is needed to include the time dimension successfully. / Mänsklig kunskap är ett försök att förutsäga användarnas framtida prestanda på lärandeplattformar baserat på historiska data, genom att modellera deras kunskaps tillstånd. I denna uppgift är målet en binär variabel som representerar överensstämmelsen av övningen. Nuvarande state-of-the-art-modeller lägger till uppmärksamhetslager på autoregressiva modeller eller förlitar sig på self-attention-nätverk. Dessa modeller bygger dock på offentligt tillgängliga databaser som saknar användbar information om de interaktioner som användare har med övningar. I detta arbete introduceras olika tekniker som gör det möjligt att inkludera ytterligare information som görs tillgänglig i en databas som tillhandahålls av Astrid Education AB. De består av att koda en tidsdimension, modellera färdigheten som krävs för varje övning explicit och justera interaktionssekvenslängden. Genom att introducera ny information i ramverket för kunskapstracing tillåter Astrid att skapa en mer personlig upplevelse för sina användare; därmed uppfyller syftet och målet med denna avhandling. Dessutom genomför vi experiment för att förstå vilka aspekter som påverkar modellerna. Resultaten visar att modellering av färdigheter med en kodningsstrategi och reducering av interaktionssekvenslängden leder till förbättringar både vad gäller noggrannhet och AUC. Tidskodningen ledde inte till bättre resultat, ytterligare experimentering krävs för att inkludera tidsdimensionen på ett framgångsrikt sätt.
|
10 |
Deep neural networks for natural language processing and its accelerationLin, Zhouhan 08 1900 (has links)
Cette thèse par article comprend quatre articles qui contribuent au domaine de l'apprentissage profond, en particulier à l'accélération de l’apprentissage par le biais de réseaux à faible précision et à l'application de réseaux de neurones profonds au traitement du langage naturel.
Dans le premier article, nous étudions un schéma d’entraînement de réseau de neurones qui élimine la plupart des multiplications en virgule flottante. Cette approche consiste à binariser ou à ternariser les poids dans la propagation en avant et à quantifier les états cachés dans la propagation arrière, ce qui convertit les multiplications en changements de signe et en décalages binaires. Les résultats expérimentaux sur des jeux de données de petite à moyenne taille montrent que cette approche produit des performances encore meilleures que l’approche standard de descente de gradient stochastique, ouvrant la voie à un entraînement des réseaux de neurones rapide et efficace au niveau du matériel.
Dans le deuxième article, nous avons proposé un mécanisme structuré d’auto-attention d’enchâssement de phrases qui extrait des représentations interprétables de phrases sous forme matricielle. Nous démontrons des améliorations dans 3 tâches différentes: le profilage de l'auteur, la classification des sentiments et l'implication textuelle. Les résultats expérimentaux montrent que notre modèle génère un gain en performance significatif par rapport aux autres méthodes d’enchâssement de phrases dans les 3 tâches.
Dans le troisième article, nous proposons un modèle hiérarchique avec graphe de calcul dynamique, pour les données séquentielles, qui apprend à construire un arbre lors de la lecture de la séquence. Le modèle apprend à créer des connexions de saut adaptatives, ce qui facilitent l'apprentissage des dépendances à long terme en construisant des cellules récurrentes de manière récursive. L’entraînement du réseau peut être fait soit par entraînement supervisée en donnant des structures d’arbres dorés, soit par apprentissage par renforcement. Nous proposons des expériences préliminaires dans 3 tâches différentes: une nouvelle tâche d'évaluation de l'expression mathématique (MEE), une tâche bien connue de la logique propositionnelle et des tâches de modélisation du langage. Les résultats expérimentaux montrent le potentiel de l'approche proposée.
Dans le quatrième article, nous proposons une nouvelle méthode d’analyse par circonscription utilisant les réseaux de neurones. Le modèle prédit la structure de l'arbre d'analyse en prédisant un scalaire à valeur réelle, soit la distance syntaxique, pour chaque position de division dans la phrase d'entrée. L'ordre des valeurs relatives de ces distances syntaxiques détermine ensuite la structure de l'arbre d'analyse en spécifiant l'ordre dans lequel les points de division seront sélectionnés, en partitionnant l'entrée de manière récursive et descendante. L’approche proposée obtient une performance compétitive sur le jeu de données Penn Treebank et réalise l’état de l’art sur le jeu de données Chinese Treebank. / This thesis by article consists of four articles which contribute to the field of deep learning, specifically in the acceleration of training through low-precision networks, and the application of deep neural networks on natural language processing.
In the first article, we investigate a neural network training scheme that eliminates most of the floating-point multiplications. This approach consists of binarizing or ternarizing the weights in the forward propagation and quantizing the hidden states in the backward propagation, which converts multiplications to sign changes and binary shifts. Experimental results on datasets from small to medium size show that this approach result in even better performance than standard stochastic gradient descent training, paving the way to fast, hardware-friendly training of neural networks.
In the second article, we proposed a structured self-attentive sentence embedding that extracts interpretable sentence representations in matrix form. We demonstrate improvements on 3 different tasks: author profiling, sentiment classification and textual entailment. Experimental results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.
In the third article, we propose a hierarchical model with dynamical computation graph for sequential data that learns to construct a tree while reading the sequence. The model learns to create adaptive skip-connections that ease the learning of long-term dependencies through constructing recurrent cells in a recursive manner. The training of the network can either be supervised training by giving golden tree structures, or through reinforcement learning. We provide preliminary experiments in 3 different tasks: a novel Math Expression Evaluation (MEE) task, a well-known propositional logic task, and language modelling tasks. Experimental results show the potential of the proposed approach.
In the fourth article, we propose a novel constituency parsing method with neural networks. The model predicts the parse tree structure by predicting a real valued scalar, named syntactic distance, for each split position in the input sentence. The order of the relative values of these syntactic distances then determine the parse tree structure by specifying the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Our proposed approach was demonstrated with competitive performance on Penn Treebank dataset, and the state-of-the-art performance on Chinese Treebank dataset.
|
Page generated in 0.1037 seconds