Global ETD Search

Refine Query
Source
Publication year
to
Language
english 1
Tagged with
artificial 1
computer 1
digital 1
intelligence 1
programs 1
science 1
video 1
vision 1

About
The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

1	Multimodal Representations for Video Suris Coll-Vinent, Didac January 2024 (has links) My thesis explores the fields of multimodal and video analysis in computer vision, aiming to bridge the gap between human perception and machine understanding. Recognizing the interplay among various signals such as text, audio, and visual data, my research explores novel frameworks to integrate these diverse modalities in order to achieve a deeper understanding of complex scenes, with a particular emphasis on video analysis. As part of this exploration, I study diverse tasks such as translation, future prediction, or visual question answering, all connected through the lens of multimodal and video representations. I present novel approaches for each of these challenges, contributing across different facets of computer vision, from dataset creation to algorithmic innovations, and from achieving state-of-the-art results on established benchmarks to introducing new tasks. Methodologically, my thesis embraces two key approaches: self-supervised learning and the integration of structured representations. Self-supervised learning, a technique that allows computers to learn from unlabeled data, helps uncovering inherent connections within multimodal and video inputs. Structured representations, on the other hand, serve as a means to capture complex temporal patterns and uncertainties inherent in video analysis. By employing these techniques, I offer novel insights into modeling multimodal representations for video analysis, showing improved performance with respect to prior work in all studied scenarios. Artificial intelligence Computer science Computer vision--Computer programs Digital video--Computer programs

1

Page generated in 0.0615 seconds