• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Multimodal Representations for Video

Suris Coll-Vinent, Didac January 2024 (has links)
My thesis explores the fields of multimodal and video analysis in computer vision, aiming to bridge the gap between human perception and machine understanding. Recognizing the interplay among various signals such as text, audio, and visual data, my research explores novel frameworks to integrate these diverse modalities in order to achieve a deeper understanding of complex scenes, with a particular emphasis on video analysis. As part of this exploration, I study diverse tasks such as translation, future prediction, or visual question answering, all connected through the lens of multimodal and video representations. I present novel approaches for each of these challenges, contributing across different facets of computer vision, from dataset creation to algorithmic innovations, and from achieving state-of-the-art results on established benchmarks to introducing new tasks. Methodologically, my thesis embraces two key approaches: self-supervised learning and the integration of structured representations. Self-supervised learning, a technique that allows computers to learn from unlabeled data, helps uncovering inherent connections within multimodal and video inputs. Structured representations, on the other hand, serve as a means to capture complex temporal patterns and uncertainties inherent in video analysis. By employing these techniques, I offer novel insights into modeling multimodal representations for video analysis, showing improved performance with respect to prior work in all studied scenarios.
2

High-level, part-based features for fine-grained visual categorization

Berg, Thomas January 2017 (has links)
Object recognition--"What is in this image?"--is one of the basic problems of computer vision. Most work in this area has been on finding basic-level object categories such as plant, car, and bird, but recently there has been an increasing amount of work in fine-grained visual categorization, in which the task is to recognize subcategories of a basic-level category, such as blue jay and bluebird. Experimental psychology has found that while basic-level categories are distinguished by the presence or absence of parts (a bird has a beak but car does not), subcategories are more often distinguished by the characteristics of their parts (a starling has a narrow, yellow beak while a cardinal has a wide, red beak). In this thesis we tackle fine-grained visual categorization, guided by this observation. We develop alignment procedures that let us compare corresponding parts, build classifiers tailored to finding the interclass differences at each part, and then combine the per-part classifiers to build subcategory classifiers. Using this approach, we outperform previous work in several fine-grained categorization settings: bird species identification, face recognition, and face attribute classification. In addition, the construction of subcategory classifiers from part classifiers allows us to automatically determine which parts are most relevant when distinguishing between any two subcategories. We can use this to generate illustrations of the differences between subcategories. To demonstrate this, we have built a digital field guide to North American birds which includes automatically generated images highlighting the key differences between visually similar species. This guide, "Birdsnap," also identifies bird species in users' uploaded photos using our subcategory classifiers. We have released Birdsnap as a web site and iPhone application.

Page generated in 0.0914 seconds