Global ETD Search

1	Real-time Mosaic for Multi-Camera Videoconferencing Klechenov, Anton, Gupta, Aditya Kumar, Wong, Weng Fai, Ng, Teck Khim, Leow, Wee Kheng 01 1900 (has links) This paper describes a system for high resolution video conferencing. A number of camcorders are used to capture the video, which are then mosaiced to generate a wide angle panoramic view. Furthermore this system is made “real-time” by detecting changes and updating them on the mosaic. This system can be deployed on a single machine or on a cluster for better performance. It is also scalable and shows a good real-time performance. The main application for this system is videoconferencing for distance learning but it can be used for any high resolution broadcasting. / Singapore-MIT Alliance (SMA) real-time mosaic parallel multi-camera
2	Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking And Event Recognition Akman, Oytun 01 August 2007 (has links) (PDF) In this thesis, novel methods for background modeling, tracking, occlusion handling and event recognition via multi-camera configurations are presented. As the initial step, building blocks of typical single camera surveillance systems that are moving object detection, tracking and event recognition, are discussed and various widely accepted methods for these building blocks are tested to asses on their performance. Next, for the multi-camera surveillance systems, background modeling, occlusion handling, tracking and event recognition for two-camera configurations are examined. Various foreground detection methods are discussed and a background modeling algorithm, which is based on multi-variate mixture of Gaussians, is proposed. During occlusion handling studies, a novel method for segmenting the occluded objects is proposed, in which a top-view of the scene, free of occlusions, is generated from multi-view data. The experiments indicate that the occlusion handling algorithm operates successfully on various test data. A novel tracking method by using multi-camera configurations is also proposed. The main idea of multi-camera employment is fusing the 2D information coming from the cameras to obtain a 3D information for better occlusion handling and seamless tracking. The proposed algorithm is tested on different data sets and it shows clear improvement over single camera tracker. Finally, multi-camera trajectories of objects are classified by proposed multi-camera event recognition method. In this method, concatenated different view trajectories are used to train Gaussian Mixture Hidden Markov Models. The experimental results indicate an improvement for the multi-camera event recognition performance over the event recognition by using single camera.
3	Resource-constrained re-identification in camera networks Tahir, Syed Fahad January 2016 (has links) In multi-camera surveillance, association of people detected in different camera views over time, known as person re-identification, is a fundamental task. Re-identification is a challenging problem because of changes in the appearance of people under varying camera conditions. Existing approaches focus on improving the re-identification accuracy, while no specific effort has yet been put into efficiently utilising the available resources that are normally limited in a camera network, such as storage, computation and communication capabilities. In this thesis, we aim to perform and improve the task of re-identification under constrained resources. More specifically, we reduce the data needed to represent the appearance of an object through a proposed feature selection method and a difference-vector representation method. The proposed feature-selection method considers the computational cost of feature extraction and the cost of storing the feature descriptor jointly with the feature's re-identification performance to select the most cost-effective and well-performing features. This selection allows us to improve inter-camera re-identification while reducing storage and computation requirements within each camera. The selected features are ranked in the order of effectiveness, which enable a further reduction by dropping the least effective features when application constraints require this conformity. We also reduce the communication overhead in the camera network by transferring only a difference vector, obtained from the extracted features of an object and the reference features within a camera, as an object representation for the association. In order to reduce the number of possible matches per association, we group the objects appearing within a defined time-interval in un-calibrated camera pairs. Such a grouping improves the re-identification, since only those objects that appear within the same time-interval in a camera pair are needed to be associated. For temporal alignment of cameras, we exploit differences between the frame numbers of the detected objects in a camera pair. Finally, in contrast to pairwise camera associations used in literature, we propose a many-to-one camera association method for re-identification, where multiple cameras can be candidates for having generated the previous detections of an object. We obtain camera-invariant matching scores from the scores obtained using the pairwise re-identification approaches. These scores measure the chances of a correct match between the objects detected in a group of cameras. Experimental results on publicly available and in-lab multi-camera image and video datasets show that the proposed methods successfully reduce storage, computation and communication requirements while improving the re-identification rate compared to existing re-identification approaches.
4	Surveillance of Time-varying Geometry Objects using a Multi-camera Active-vision System Mackay, Matthew Donald 10 January 2012 (has links) The recognition of time-varying geometry (TVG) objects (in particular, humans) and their actions is a complex task due to common real-world sensing challenges, such as obstacles and environmental variations, as well as due to issues specific to TVG objects, such as self-occlusion. Herein, it is proposed that a multi-camera active-vision system, which dynamically selects camera poses in real-time, be used to improve TVG action sensing performance by selecting camera views on-line for near-optimal sensing-task performance. Active vision for TVG objects requires an on-line sensor-planning strategy that incorporates information about the object itself, including its current action, and information about the state of the environment, including obstacles, into the pose-selection process. Thus, the focus of this research is the development of a novel methodology for real-time sensing-system reconfiguration (active vision), designed specifically for the recognition of a single TVG object and its actions in a cluttered, dynamic environment, which may contain multiple other dynamic (maneuvering) obstacles. The proposed methodology was developed as a complete, customizable sensing-system framework which can be readily modified to suit a variety of specific TVG action-sensing tasks – a 10-stage pipeline real-time architecture. Sensor Agents capture and correct camera images, removing noise and lens distortion, and segment the images into regions of interest. A Synchronization Agent aligns multiple images from different cameras to a single ‘world-time.’ Point Tracking and De-Projection Agents detect, identify, and track points of interest in the resultant 2-D images, and form constraints in normalized camera coordinates using the tracked pixel coordinates. A 3-D Solver Agent combines all constraints to estimate world-coordinate positions for all visible features of the object-of-interest (OoI) 3-D articulated model. A Form-Recovery Agent uses an iterative process to combine model constraints, detected feature points, and other contextual information to produce an estimate of the OoI’s current form. This estimate is used by an Action-Recognition Agent to determine which action the OoI is performing, if any, from a library of known actions, using a feature-vector descriptor for identification. A Prediction Agent provides estimates of future OoI and obstacle poses, given past detected locations, and estimates of future OoI forms given the current action and past forms. Using all of the data accumulated in the pipeline, a Central Planning Agent implements a formal, mathematical optimization developed from the general sensing problem. The agent seeks to optimize a visibility metric, which is positively related to sensing-task performance, to select desirable, feasible, and achievable camera poses for the next sensing instant. Finally, a Referee Agent examines the complete set of chosen poses for consistency, enforces global rules not captured through the optimization, and maintains system functionality if a suitable solution cannot be determined. In order to validate the proposed methodology, rigorous experiments are also presented herein. They confirm the basic assumptions of active vision for TVG objects, and characterize the gains in sensing-task performance. Simulated experiments provide a method for rapid evaluation of new sensing tasks. These experiments demonstrate a tangible increase in single-action recognition performance over the use of a static-camera sensing system. Furthermore, they illustrate the need for feedback in the pose-selection process, allowing the system to incorporate knowledge of the OoI’s form and action. Later real-world, multi-action and multi-level action experiments demonstrate the same tangible increase when sensing real-world objects that perform multiple actions which may occur simultaneously, or at differing levels of detail. A final set of real-world experiments characterizes the real-time performance of the proposed methodology in relation to several important system design parameters, such as the number of obstacles in the environment, and the size of the action library. Overall, it is concluded that the proposed system tangibly increases TVG action-sensing performance, and can be generalized to a wide range of applications, including human-action sensing. Future research is proposed to develop similar methods to address deformable objects and multiple objects of interest. Active Vision Action Recognition Multi-Camera Surveillance 0548
5	Surveillance of Time-varying Geometry Objects using a Multi-camera Active-vision System Mackay, Matthew Donald 10 January 2012 (has links) The recognition of time-varying geometry (TVG) objects (in particular, humans) and their actions is a complex task due to common real-world sensing challenges, such as obstacles and environmental variations, as well as due to issues specific to TVG objects, such as self-occlusion. Herein, it is proposed that a multi-camera active-vision system, which dynamically selects camera poses in real-time, be used to improve TVG action sensing performance by selecting camera views on-line for near-optimal sensing-task performance. Active vision for TVG objects requires an on-line sensor-planning strategy that incorporates information about the object itself, including its current action, and information about the state of the environment, including obstacles, into the pose-selection process. Thus, the focus of this research is the development of a novel methodology for real-time sensing-system reconfiguration (active vision), designed specifically for the recognition of a single TVG object and its actions in a cluttered, dynamic environment, which may contain multiple other dynamic (maneuvering) obstacles. The proposed methodology was developed as a complete, customizable sensing-system framework which can be readily modified to suit a variety of specific TVG action-sensing tasks – a 10-stage pipeline real-time architecture. Sensor Agents capture and correct camera images, removing noise and lens distortion, and segment the images into regions of interest. A Synchronization Agent aligns multiple images from different cameras to a single ‘world-time.’ Point Tracking and De-Projection Agents detect, identify, and track points of interest in the resultant 2-D images, and form constraints in normalized camera coordinates using the tracked pixel coordinates. A 3-D Solver Agent combines all constraints to estimate world-coordinate positions for all visible features of the object-of-interest (OoI) 3-D articulated model. A Form-Recovery Agent uses an iterative process to combine model constraints, detected feature points, and other contextual information to produce an estimate of the OoI’s current form. This estimate is used by an Action-Recognition Agent to determine which action the OoI is performing, if any, from a library of known actions, using a feature-vector descriptor for identification. A Prediction Agent provides estimates of future OoI and obstacle poses, given past detected locations, and estimates of future OoI forms given the current action and past forms. Using all of the data accumulated in the pipeline, a Central Planning Agent implements a formal, mathematical optimization developed from the general sensing problem. The agent seeks to optimize a visibility metric, which is positively related to sensing-task performance, to select desirable, feasible, and achievable camera poses for the next sensing instant. Finally, a Referee Agent examines the complete set of chosen poses for consistency, enforces global rules not captured through the optimization, and maintains system functionality if a suitable solution cannot be determined. In order to validate the proposed methodology, rigorous experiments are also presented herein. They confirm the basic assumptions of active vision for TVG objects, and characterize the gains in sensing-task performance. Simulated experiments provide a method for rapid evaluation of new sensing tasks. These experiments demonstrate a tangible increase in single-action recognition performance over the use of a static-camera sensing system. Furthermore, they illustrate the need for feedback in the pose-selection process, allowing the system to incorporate knowledge of the OoI’s form and action. Later real-world, multi-action and multi-level action experiments demonstrate the same tangible increase when sensing real-world objects that perform multiple actions which may occur simultaneously, or at differing levels of detail. A final set of real-world experiments characterizes the real-time performance of the proposed methodology in relation to several important system design parameters, such as the number of obstacles in the environment, and the size of the action library. Overall, it is concluded that the proposed system tangibly increases TVG action-sensing performance, and can be generalized to a wide range of applications, including human-action sensing. Future research is proposed to develop similar methods to address deformable objects and multiple objects of interest. Active Vision Action Recognition Multi-Camera Surveillance 0548
6	Multi-camera Human Tracking on Realtime 3D Immersive Surveillance System Hsieh, Meng-da 23 June 2010 (has links) Conventional surveillance systems present video to a user from more than one camera on a single display. Such a display allows the user to observe different part of the scene, or to observe the same part of the scene from different viewpoints. Each video is usually labeled by a fixed textual annotation displayed under the video segment to identify the image. With the growing number of surveillance cameras set up and the expanse of surveillance area, the conventional split-screen display approach cannot provide intuitive correspondence between the images acquired and the areas under surveillance. Such a system has a number of inherent flaws¡GLower relativity of split videos¡BThe difficulty of tracking new activities¡BLow resolution of surveillance videos¡BThe difficulty of total surveillance¡FIn order to improve the above defects, the ¡§Immersive Surveillance for Total Situational Awareness¡¨ use computer graphic technique to construct 3D model of buildings on the 2D satellite-images, the users can construct the floor platform by defining the information of each floor or building and the position of each camera. This information is combined to construct 3D surveillance scene, and the images acquired by surveillance cameras are pasted into the constructed 3D model to provide intuitively visual presentation. The users could also walk through the scene by a fixed-frequency , self-defined business model to perform a virtual surveillance. Multi-camera Human Tracking on Realtime 3D Immersive Surveillance System based on the ¡§Immersive Surveillance for Total Situational Awareness,¡¨ 1. Salient object detection¡GThe System converts videos to corresponding image sequences and analyze the videos provided by each camera. In order to filter out the foreground pixels, the background model of each image is calculated by pixel-stability-based background update algorithm. 2. Nighttime image fusion¡GUse the fuzzy enhancement method to enhance the dark area in nighttime image, and also maintain the saturation information. Then apply the Salient object detection Algorithm to extract salient objects of the dark area. The system divides fusion results into 3 parts: wall, ceiling, and floor, then pastes them as materials into corresponding parts of 3D scene. 3. Multi-camera human tracking¡GApply connected component labeling to filter out small area and save each block¡¦s infomation. Use RGB-weight percentage information in each block and 5-state status (Enter¡BLeave¡BMatch¡BOcclusion¡BFraction) to draw out the trajectory of each person in every camera¡¦s field of view on the 3D surveillance scene. Finally, fuse every camera together to complete the multi-camera realtime people tracking. Above all, we can track every human in our 3D immersive surveillance system without watching out each of thousand of camera views. Salient object detection Nighttime image fusion Multi-camera human tracking
7	3D Surface Reconstruction from Multi-Camera Stereo with Disturbed Processing Arora, Gorav 03 1900 (has links) In this thesis a system which extracts 3D surfaces of arbitrary scenes under natural illumination is constructed using low-cost, off-the-shelf components. The system is implemented over a network of workstations using standardized distributed software technology. The architecture of the system is highly influenced by the performance requirements of multimedia applications which require 3D computer vision. Visible scene surfaces are extracted using a passive multi-baseline stereo technique. The implementation efficiently supports any number of cameras in arbitrary positions through an effective rectification strategy. The distributed software components interact through CORBA and work cooperatively in parallel. Experiments are performed to assess the effects of various parameters on the performance of the system and to demonstrate the feasibility of this approach. / Thesis / Master of Engineering (ME) 3D surface reconstruction multi-camera stereo distributed processing
8	Calibrating Video Capture Systems To Aid Automated Analysis And Expert Rating Of Human Movement Performance Yeshala, Sai krishna 27 June 2022 (has links) We propose a methodology for calibrating the activity space and the cameras involved in video capture systems for upper extremity stroke rehabilitation. We discuss an in-home stroke rehabilitation system called Semi-Automated Rehabilitation At Home System (SARAH) and a clinic-based system called Action Research Arm Test (ARAT) developed by the Interactive Neuro-Rehabilitation Lab (INR) at Virginia Tech. We propose a calibration workflow for achieving invariant video capture across multiple therapy sessions. This ensures that the captured data is less noisy. In addition, there is prior knowledge of the captured activity space and patient location in the video frames provided to the Computer Vision algorithms analyzing the captured data. Such a standardized calibration approach improved machine learning analysis of patient movements and a higher rate of agreement across multiple therapists regarding the captured patient performance. We further propose a Multi-Camera Calibration approach to perform stereo camera calibration in SARAH and ARAT capture systems to help perform a 3D reconstruction of the activity space from 2D videos. The importance of the proposed activity space and camera calibration workflows, including new research paths opened as a result of our approach, are discussed in this thesis. / Master of Science / In this thesis, I describe the workflows I developed to perform calibration of stroke rehabilitation activity spaces, including the calibration of cameras involved in video capture systems for analyzing patient movements in stroke rehabilitation practices. The proposed workflows are designed to facilitate convenient user involvement in calibrating the video capture systems to provide invariant and consistent video captures, including the extraction of fine-grain information utilizing camera calibration results, to the therapists and computer vision-based automated systems for improved analysis of patient performance in stroke rehabilitation practices. The importance of human-in-the-loop systems, including future research paths to strengthen the symbiotic relationship between humans and Artificial Intelligence systems in stroke rehabilitation practices, is discussed. The quantitative and qualitative results generated from the workshops conducted to test and evaluate the calibration workflows align with the stakeholder's needs in stroke rehabilitation systems. Video capture systems Multi-camera calibration Activity space calibration.
9	VOODIO: Proposal for an Online Video Content Creation Tool Kirkland, Benjamin Renfroe 22 January 2020 (has links) Video content is a massive source of entertainment, education, and income for a large population of online users. As more reliance upon this medium enters the field of education, formal and informal, people need tools to enhance their ability to tell stories and engage an audience. A tool that easily adjusts without compromising the interaction, the storytelling, or the visual moment, while also capturing as much information as possible, might be of great benefit to all creators of video content. Allowing tutorial creators the ability to efficiently record multiple views of their content may better aid in presenting concepts while retaining the attention of the viewership. The opportunity to present information effectively may have impacts on fields including education as well as entertainment. This thesis aims to explore possible reasons why content can be made to retain the audience's attention and to create a tool utilizing these facets for far reaching possibilities. / Master of Science / Video content is a massive source of entertainment, education, and income for a large population of online users. As more reliance upon this medium enters the field of education, formal and informal, people need tools to enhance their ability to tell stories and engage an audience. A tool that easily adjusts without compromising the interaction, the storytelling, or the visual moment, while also capturing as much information as possible, might be of great benefit to all creators of video content. Allowing tutorial creators the ability to efficiently record multiple views of their content may better aid in presenting concepts while retaining the attention of the viewership. The opportunity to present information effectively may have impacts on fields including education as well as entertainment. This thesis aims to explore possible reasons why content can be made to retain the audience's attention and to create a tool utilizing these facets for far reaching possibilities. Industrial Design Online Video YouTube Tutorials Multi-Camera
10	Camera Motion Estimation for Multi-Camera Systems Kim, Jae-Hak, Jae-Hak.Kim@anu.edu.au January 2008 (has links) The estimation of motion of multi-camera systems is one of the most important tasks in computer vision research. Recently, some issues have been raised about general camera models and multi-camera systems. Using many cameras as a single camera is studied [60], and the epipolar geometry constraints of general camera models is theoretically derived. Methods for calibration, including a self-calibration method for general camera models, are studied [78, 62]. Multi-camera systems are an example of practically implementable general camera models and they are widely used in many applications nowadays because of both the low cost of digital charge-coupled device (CCD) cameras and the high resolution of multiple images from the wide ﬁeld of views. To our knowledge, no research has been conducted on the relative motion of multi-camera systems with non-overlapping views to obtain a geometrically optimal solution. ¶ In this thesis, we solve the camera motion problem for multi-camera systems by using linear methods and convex optimization techniques, and we make ﬁve substantial and original contributions to the ﬁeld of computer vision. First, we focus on the problem of translational motion of omnidirectional cameras, which are multi-camera systems, and present a constrained minimization method to obtain robust estimation results. Given known rotation, we show that bilinear and trilinear relations can be used to build a system of linear equations, and singular value decomposition (SVD) is used to solve the equations. Second, we present a linear method that estimates the relative motion of generalized cameras, in particular, in the case of non-overlapping views. We also present four types of generalized cameras, which can be solvable using our proposed, modiﬁed SVD method. This is the ﬁrst study ﬁnding linear relations for certain types of generalized cameras and performing experiments using our proposed linear method. Third, we present a linear 6-point method (5 points from the same camera and 1 point from another camera) that estimates the relative motion of multi-camera systems, where cameras have no overlapping views. In addition, we discuss the theoretical and geometric analyses of multi-camera systems as well as certain critical conﬁgurations where the scale of translation cannot be determined. Fourth, we develop a global solution under an L∞ norm error for the relative motion problem of multi-camera systems using second-order cone programming. Finally, we present a fast searching method to obtain a global solution under an L∞ norm error for the relative motion problem of multi-camera systems, with non-overlapping views, using a branch-and-bound algorithm and linear programming (LP). By testing the feasibility of LP at the earlier stage, we reduced the time of computation of solving LP.¶ We tested our proposed methods by performing experiments with synthetic and real data. The Ladybug2 camera, for example, was used in the experiment on estimation of the translation of omnidirectional cameras and in the estimation of the relative motion of non-overlapping multi-camera systems. These experiments showed that a global solution using L∞ to estimate the relative motion of multi-camera systems could be achieved. Motion Estimation Global Optimization Second-Order Cone Programming Linear Programming Multi-Camera System Plucker coordinates Generalized Essential Matrix Visual Odometry Non-Overlapping View Multi-Camera Rigs

Search results