Global ETD Search

1	[en] A CLUSTER-BASED METHOD FOR ACTION SEGMENTATION USING SPATIO-TEMPORAL AND POSITIONAL ENCODED EMBEDDINGS / [pt] MÉTODO BASEADO EM AGRUPAMENTO PARA A SEGMENTAÇÃO DE AÇÕES UTILIZANDO EMBEDDINGS ESPAÇO-TEMPORAIS E COM CODIFICAÇÃO POSICIONAL GUILHERME DE AZEVEDO P MARQUES 20 April 2023 (has links) [pt] Vídeos se tornaram a principal mídia para a comunicação, com um volume massivo de dados criado a cada segundo. Conseguir entender essa quantidade de dados de forma automática se tornou importante e, por conseguinte, métodos de video understanding são cada vez mais necessários. Uma tarefa crucial para o entendimento de vídeos é a classificação e localização no tempo de diferentes ações. Para isso, a segmentação de ações precisa ser realizada. Segmentação de ações é a tarefa que consiste em segmentar temporalmente um vídeo, classificando cada quadro com alguma ação. Neste trabalho, é proposto um método de segmentação de ações que não requer análise prévia do vídeo e nenhum dado anotado. O método envolve a extração de embeddings espaço-temporais dos vídeos com redes de aprendizado profundo pré-treinadas, seguida por uma transformação realizada por um codificador posicional e pela aplicação de um algoritmo de grupamento em que cada cluster gerado corresponde a uma ação diferente. Os experimentos realizados demonstram que o método produz resultados competitivos nos conjuntos de dados Breakfast e Inria Instructional Videos. / [en] The rise of video content as the main media for communication has been creating massive volumes of video data every second. The ability of understanding this huge quantities of data automatically has become increasingly important, therefore better video understanding methods are needed. A crucial task to overall video understanding is the recognition and localisation in time of dierent actions. To address this problem, action segmentation must be achieved. Action segmentation consists of temporally segmenting a video by labeling each frame with a specific action. In this work, we propose a novel action segmentation method that requires no prior video analysis and no annotated data. Our method involves extracting spatio-temporal features from videos using a pre-trained deep network. Data is then transformed using a positional encoder, and finally a clustering algorithm is applied where each cluster presumably corresponds to a dierent single and distinguishable action. In experiments, we show that our method produces competitive results on the Breakfast and Inria Instructional Videos dataset benchmarks. [pt] CLUSTERIZACAO [pt] CODIFICACAO POSICIONAL [pt] SEGMENTACAO DE ACOES [pt] APRENDIZADO PROFUNDO [en] CLUSTERING [en] POSITIONAL ENCODING [en] ACTION SEGMENTATION [en] DEEP LEARNING
2	Tree positional encodings for transformer models on HTML DOM tree element classification : Enabling structurally aware transformer models through positional encodings to improve performance on an HTML element classification problem / Transformer modeller och positionskodningar för klassificering av element i HTML träd : Positionskodningar av trädstrukurerad data som möjliggör transformer modeller i ett klassificeringsproblem av HTML element Rousselet, Gustave January 2021 (has links) With the continued proliferation of access and the usage of the internet, the field of web learning is continuously growing in order to automate and improve parts of our experience on the web. Research in web learning has often lagged behind its counterparts in Natural Language Processing (NLP), novel methods often reach adoption in web learning research with a delay. Web pages are more complex in both content and structure, as they are semi- structured documents divided into sections, often containing a combination of images, text, and markup. For humans, this is not difficult to understand, as we are familiar with the structure of web pages and in fact are often aided by the styling and markup of the pages. However for machine learning algorithms, this structure and mixture of content poses several challenges which are not similar in nature to comparable documents in NLP problems. Transformer models have shown significant performance gains on a multitude of tasks ranging from NLP to image processing. This thesis studies the usage of alternative and novel approaches to encoding positional information of nodes in a HyperText Markup Language (HTML) Document Object Model (DOM) tree in order to enable effective use of transformer models on web page data. The problem studied was a HTML element classification problem, specifically the task of extracting product data from a product web page. Three positional encodings for tree structured data were studied: Breadth First Search (BFS), Depth First Search (DFS), and novel tree positional encodings. These encodings resulted in 3 trained transformer models which were compared to a baseline transformer model trained with no positional encoding in order to measure the change in performance that the encodings produced. The analysis of the results show that the BFS and DFS encodings increased model performance across all measured metrics (precision, recall, f1-score, accuracy) by up to 1% in absolute performance. The novel tree positional encodings resulted in worse model performance across all metrics measured. The results show that transformers benefit from certain tree positional encodings of the HTML elements, and further research should be done to see how these positions can be effectively encoded for transformer models to process web pages. / Med den fortsatta spridningen av åtkomst och användningen av internet växer området för webbinlärning kontinuerligt för att automatisera och förbättra vår erfarenhet på nätet. Forskning inom webbinlärning har ofta släpat efter dess motsvarigheter inom NLP, nya metoder når oftast webbinlärningsforskning med försening. Webbsidor är mer komplexa i både innehåll och struktur än text dokument, eftersom de är halvstrukturerade dokument indelade i sektioner, som ofta innehåller en kombination av bilder, text och stil. För människor är detta inte svårt att förstå, eftersom vi är bekanta med strukturen på webbsidor och faktiskt ofta får hjälp av utformningen och uppdelningen av sidorna. Men för maskininlärnings algoritmer är strukturen och blandningen av innehåll en utmaning som inte liknar jämförbara dokument i NLP. Transformer modeller har visat flera prestandaförbättringar på en mängd uppgifter som sträcker sig från NLP till bildbehandling. Denna uppsats studerar användningen av alternativa och nya metoder för kodning av positionsinformation för noder i ett HTML DOM träd för att möjliggöra effektiva användningen av transformer modeller på webbsidadata. Problemet som studerades var ett HTML elementklassificeringsproblem, specifikt uppgiften att extrahera produktdata från en produktsida. Tre positionskodningar för trädstrukturerade data har studerats: BFS, DFS och träd positionskodningar. Dessa kodningar resulterade i tre tränade transformer modeller som jämfördes med en modell tränad utan någon positionskodning för att mäta förändringen i prestanda som kodningarna producerade. Analysen av resultaten visar att BFS och DFS kodningarna ökade modellprestanda över alla uppmätta mått (precision, accuracy, f1-score) med upp till 1% i absolut prestanda. De nya trädpositionskodningarna resulterade i sämre modellprestanda över alla mått mätt. Resultaten visar att transformer modellererna drar nytta av vissa trädkodningar för HTML-elementen, och ytterligare undersökningar bör göras för att se hur dessa positioner av datan effektivt kan kodas för transformer modeller för att bearbeta webbsidor. Transformer Positional Encoding HTML DOM Tree Element Transformer Positionskodning HTML DOM Träd Element Computer and Information Sciences Data- och informationsvetenskap

Search results

[en] A CLUSTER-BASED METHOD FOR ACTION SEGMENTATION USING SPATIO-TEMPORAL AND POSITIONAL ENCODED EMBEDDINGS / [pt] MÉTODO BASEADO EM AGRUPAMENTO PARA A SEGMENTAÇÃO DE AÇÕES UTILIZANDO EMBEDDINGS ESPAÇO-TEMPORAIS E COM CODIFICAÇÃO POSICIONAL