• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 32
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 49
  • 49
  • 12
  • 11
  • 11
  • 11
  • 11
  • 10
  • 9
  • 7
  • 7
  • 7
  • 7
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Conducting gesture recognition, analysis and performance system

Kolesnik, Paul January 2004 (has links)
A number of conducting gesture analysis and performance systems have been developed over the years. However, most of the previous projects either primarily concentrated on tracking tempo and amplitude indicating gestures, or implemented individual mapping techniques for expressive gestures that varied from research to research. There is a clear need for a uniform process that could be applied toward analysis of both indicative and expressive gestures. The proposed system provides a set of tools that contain extensive functionality for identification, classification and performance with conducting gestures. Gesture recognition procedure is designed on the basis of Hidden Markov Model (HMM) process. A set of HMM tools are developed for Max/MSP software. Training and recognition procedures are applied toward both right-hand beat- and amplitude-indicative gestures, and left-hand expressive gestures. Continuous recognition of right-hand gestures is incorporated into a real-time gesture analysis and performance system in Eyesweb and Max/MSP/Jitter environments.
32

An analog VLSI centroid imager

Blum, Richard Alan 12 1900 (has links)
No description available.
33

Architectural studies for visual processing / Andre J. S. Yakovleff.

Yakovleff, Andre J. S. (Andre Julian Stuart) January 1995 (has links)
Bibliography: p. 165-184. / xvi, 184 p. : ill. ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / This thesis explores the issue of copying natural vision processes with regard to providing sensing information in real-time to the lowest control levels of an autonomous vehicle. It is argued that the interpretation of sensory data should result in a level of perception which is tailored to the requirements of the control system. / Thesis (Ph.D.)--University of Adelaide, Dept. of Electrical and Electronic Engineering, 1996?
34

Color face recognition by auto-regressive moving averaging

Aljarrah, Inad A. January 2002 (has links)
Thesis (M.S.)--Ohio University, November, 2002. / Title from PDF t.p. Includes bibliographical references (leaves 46-48).
35

Vizuální odometrie pro robotické vozidlo Car4 / Visual odometry for robotic vehicle Car4

Szente, Michal January 2017 (has links)
This thesis deals with algorithms of visual odometry and its application on the experimental vehicle Car4. The first part contains different researches in this area on which the solution process is based. Next chapters introduce theoretical design and ideas of monocular and stereo visual odometry algorithms. The third part deals with the implementation in the software MATLAB with the use of Image processing toolbox. After tests done and based on real data, the chosen algorithm is applied to the vehicle Car4 used in practical conditions of interior and exterior. The last part summarizes the results of the work and address the problems which are asociated with the application of visual obmetry algorithms.
36

Conducting gesture recognition, analysis and performance system

Kolesnik, Paul January 2004 (has links)
No description available.
37

Unified Approaches for Multi-Task Vision-Language Interactions

You, Haoxuan January 2024 (has links)
Vision and Language are two major modalities that humans rely on to perceive the environment and understand the world. Recent advances in Artificial Intelligence (AI) facilitate the development of a variety of vision-language tasks derived from diverse multimodal interactions in daily life, such as image captioning, image-text matching, visual question answering (VQA), text-to-image generation, etc. Despite the remarkable performance, most previous state-of-the-art models are merely specialized for a single vision-language task, which lack generalizability across multiple tasks. Additionally, those specialized models sophisticate the algorithm designs and bring redundancy to model deployment when dealing with complex scenes. In this study, we investigate developing unified approaches capable of solving various vision-language interactions in a multi-task manner. We argue that unified multi-task methods could enjoy several potential advantages: (1) A unified framework for multiple tasks can reduce human efforts in designing different models for different tasks; (2) Reusing and sharing parameters across tasks can improve efficiency; (3) Some tasks may be complementary to other tasks so that multi-tasking can boost the performance; (4) They can deal with the complex tasks that need a joint collaborating of multiple basic tasks and enable new applications. In the first part of this thesis, we explore unified multi-task models with the goal of sharing and reusing as many parameters as possible between different tasks. We started with unifying many vision-language question-answering tasks, such as visual entailment, outside-knowledge VQA, and visual commonsense reasoning, in a simple iterative divide-and-conquer framework. Specifically, it iteratively decomposes the original text question into sub-question, solves each sub-question, and derives the answer to the original question, which can uniformly handle reasoning of various types and semantics levels within one framework. In the next work, we take one step further to unify tasks of image-to-text generation, text-to-image generation, vision-language understanding, and image-text matching all in one single large-scale Transformer-based model. The above two works demonstrate the feasibility, effectiveness and efficiency of sharing the parameters across different tasks in a single model. Nevertheless, they still need to switch between different tasks and can only conduct one task at a time. In the second part of this thesis, we introduce our efforts toward simultaneous multi-task models that can conduct multiple tasks at the same time with a single model. It has additional advantages: the model can learn to perform different tasks or combinations of multiple tasks automatically according to user queries; the joint interaction of tasks can enable new potential applications. We begin with compounding spatial understanding and semantic understanding in a single multimodal Transformer-based model. To enable models to understand and localize local regions, we proposed a hybrid region representation that seamlessly bridges regions with image and text. Coupled with a delicately collected training dataset, our model can perform joint spatial and semantic understanding at the same iteration, and empower a new application: spatial reasoning. Continuing the above project, we further introduce an effective module to encode the high-resolution images, and propose a pre-training method that aligns semantics and spatial understanding in high resolution. Besides, we also couple the Optical Character Recognition (OCR) capability together with spatial understanding in the model and study the techniques to improve the compatibility of various tasks.
38

Multimodal Representations for Video

Suris Coll-Vinent, Didac January 2024 (has links)
My thesis explores the fields of multimodal and video analysis in computer vision, aiming to bridge the gap between human perception and machine understanding. Recognizing the interplay among various signals such as text, audio, and visual data, my research explores novel frameworks to integrate these diverse modalities in order to achieve a deeper understanding of complex scenes, with a particular emphasis on video analysis. As part of this exploration, I study diverse tasks such as translation, future prediction, or visual question answering, all connected through the lens of multimodal and video representations. I present novel approaches for each of these challenges, contributing across different facets of computer vision, from dataset creation to algorithmic innovations, and from achieving state-of-the-art results on established benchmarks to introducing new tasks. Methodologically, my thesis embraces two key approaches: self-supervised learning and the integration of structured representations. Self-supervised learning, a technique that allows computers to learn from unlabeled data, helps uncovering inherent connections within multimodal and video inputs. Structured representations, on the other hand, serve as a means to capture complex temporal patterns and uncertainties inherent in video analysis. By employing these techniques, I offer novel insights into modeling multimodal representations for video analysis, showing improved performance with respect to prior work in all studied scenarios.
39

Completing unknown portions of 3D scenes by 3D visual propagation

Breckon, Toby P. January 2006 (has links)
As the requirement for more realistic 3D environments is pushed forward by the computer {graphics | movie | simulation | games} industry, attention turns away from the creation of purely synthetic, artist derived environments towards the use of real world captures from the 3D world in which we live. However, common 3D acquisition techniques, such as laser scanning and stereo capture, are realistically only 2.5D in nature - such that the backs and occluded portions of objects cannot be realised from a single uni-directional viewpoint. Although multi-directional capture has existed for sometime, this incurs additional temporal and computational cost with no existing guarantee that the resulting acquisition will be free of minor holes, missing surfaces and alike. Drawing inspiration from the study of human abilities in 3D visual completion, we consider the automated completion of these hidden or missing portions in 3D scenes originally acquired from 2.5D (or 3D) capture. We propose an approach based on the visual propagation of available scene knowledge from the known (visible) scene areas to these unknown (invisible) 3D regions (i.e. the completion of unknown volumes via visual propagation - the concept of volume completion). Our proposed approach uses a combination of global surface fitting, to derive an initial underlying geometric surface completion, together with a 3D extension of nonparametric texture synthesis in order to provide the propagation of localised structural 3D surface detail (i.e. surface relief). We further extend our technique both to the combined completion of 3D surface relief and colour and additionally to hierarchical surface completion that offers both improved structural results and computational efficiency gains over our initial non-hierarchical technique. To validate the success of these approaches we present the completion and extension of numerous 2.5D (and 3D) surface examples with relief ranging in natural, man-made, stochastic, regular and irregular forms. These results are evaluated both subjectively within our definition of plausible completion and quantitatively by statistical analysis in the geometric and colour domains.
40

High-level, part-based features for fine-grained visual categorization

Berg, Thomas January 2017 (has links)
Object recognition--"What is in this image?"--is one of the basic problems of computer vision. Most work in this area has been on finding basic-level object categories such as plant, car, and bird, but recently there has been an increasing amount of work in fine-grained visual categorization, in which the task is to recognize subcategories of a basic-level category, such as blue jay and bluebird. Experimental psychology has found that while basic-level categories are distinguished by the presence or absence of parts (a bird has a beak but car does not), subcategories are more often distinguished by the characteristics of their parts (a starling has a narrow, yellow beak while a cardinal has a wide, red beak). In this thesis we tackle fine-grained visual categorization, guided by this observation. We develop alignment procedures that let us compare corresponding parts, build classifiers tailored to finding the interclass differences at each part, and then combine the per-part classifiers to build subcategory classifiers. Using this approach, we outperform previous work in several fine-grained categorization settings: bird species identification, face recognition, and face attribute classification. In addition, the construction of subcategory classifiers from part classifiers allows us to automatically determine which parts are most relevant when distinguishing between any two subcategories. We can use this to generate illustrations of the differences between subcategories. To demonstrate this, we have built a digital field guide to North American birds which includes automatically generated images highlighting the key differences between visually similar species. This guide, "Birdsnap," also identifies bird species in users' uploaded photos using our subcategory classifiers. We have released Birdsnap as a web site and iPhone application.

Page generated in 0.0621 seconds