Spelling suggestions: "subject:"human segmentation"" "subject:"suman segmentation""
1 |
Camera Planning and Fusion in a Heterogeneous Camera NetworkZhao, Jian 01 January 2011 (has links)
Wide-area camera networks are becoming more and more common. They have widerange of commercial and military applications from video surveillance to smart home and from traffic monitoring to anti-terrorism. The design of such a camera network is a challenging problem due to the complexity of the environment, self and mutual occlusion of moving objects, diverse sensor properties and a myriad of performance metrics for different applications. In this dissertation, we consider two such challenges: camera planing and camera fusion. Camera planning is to determine the optimal number and placement of cameras for a target cost function. Camera fusion describes the task of combining images collected by heterogenous cameras in the network to extract information pertinent to a target application.
I tackle the camera planning problem by developing a new unified framework based on binary integer programming (BIP) to relate the network design parameters and the performance goals of a variety of camera network tasks. Most of the BIP formulations are NP hard problems and various approximate algorithms have been proposed in the literature. In this dissertation, I develop a comprehensive framework in comparing the entire spectrum of approximation algorithms from Greedy, Markov Chain Monte Carlo (MCMC) to various relaxation techniques. The key contribution is to provide not only a generic formulation of the camera planning problem but also novel approaches to adapt the formulation to powerful approximation schemes including Simulated Annealing (SA) and Semi-Definite Program (SDP). The accuracy, efficiency and scalability of each technique are analyzed and compared in depth. Extensive experimental results are provided to illustrate the strength and weakness of each method.
The second problem of heterogeneous camera fusion is a very complex problem. Information can be fused at different levels from pixel or voxel to semantic objects, with large variation in accuracy, communication and computation costs. My focus is on the geometric transformation of shapes between objects observed at different camera planes. This so-called the geometric fusion approach usually provides the most reliable fusion approach at the expense of high computation and communication costs. To tackle the complexity, a hierarchy of camera models with different levels of complexity was proposed to balance the effectiveness and efficiency of the camera network operation. Then different calibration and registration methods are proposed for each camera model. At last, I provide two specific examples to demonstrate the effectiveness of the model: 1)a fusion system to improve the segmentation of human body in a camera network consisted of thermal and regular visible light cameras and 2) a view dependent rendering system by combining the information from depth and regular cameras to collecting the scene information and generating new views in real time.
|
2 |
Segmentação de imagens de pessoas em tempo real para videoconferênciasParolin, Alessandro 22 March 2011 (has links)
Submitted by Mariana Dornelles Vargas (marianadv) on 2015-03-16T14:26:47Z
No. of bitstreams: 1
segmentacao_imagens.pdf: 6472132 bytes, checksum: b5a25706eff2375403bc63c7d6a89f0d (MD5) / Made available in DSpace on 2015-03-16T14:26:47Z (GMT). No. of bitstreams: 1
segmentacao_imagens.pdf: 6472132 bytes, checksum: b5a25706eff2375403bc63c7d6a89f0d (MD5)
Previous issue date: 2011 / HP - Hewlett-Packard Brasil Ltda / Milton Valente / Segmentação de objetos em imagens e vídeos é uma área relativamente antiga na área de processamento de imagens e visão computacional. De fato, recentemente, devido à grande evolução dos sistemas computacionais em termos de hardware e à popularização da internet, uma aplicação de segmentação de imagens de pessoas que vem ganhando grande destaque na área acadêmica e comercial são as videoconferências. Esse tipo de aplicação traz benefícios a diferentes áreas, como telemedicina, educação à distância, e principalmente empresarial. Diversas empresas utilizam esse tipo de recurso para realizar reuniões/conferências a nível global economizando quantias consideráveis de recursos. No entanto, videoconferências ainda não proporcionam a mesma experiência que as pessoas têm quando estão num mesmo ambiente. Portanto, esse trabalho propõe o desenvolvimento de um sistema de segmentação da imagem do locutor, específico para videoconferências, a fim de permitir futuros processamentos que aumentem a sensação de imersão dos participantes, como por exemplo, a substituição do fundo da imagem por um fundo padrão em todos ambientes. O sistema proposto utiliza basicamente um algoritmo de programação dinâmica guiado por energias extraídas da imagem, envolvendo informações de borda, movimento e probabilidade. Através de diversos testes realizados, observou-se que o sistema apresenta resultados equiparáveis aos do estado da arte do tema, sendo capaz de ser executado em tempo real a uma taxa de 8 FPS, mesmo com um código não otimizado. O grande diferencial do sistema proposto é que nenhum tipo de treinamento prévio é necessário para efetuar a segmentação / Object segmentation has been discussed on Computer Vision and Image processing fields for quite some time. Recently, given the hardware evolution and popularization of the World Wide Web, videoconferences have been the main discussion in this area. This technique brings advantages to many fields, such as telemedicine, education (distance learning), and mainly to the business world. Many companies use videoconferences for worldwide meetings, in order to save a substantial amount o
f resources. However, videoconferences still do not provide the same experience a
s people have when they are in the same room. Therefore, in this paper we propose the development of a system to segment the image of a person who is attending the videoconference, in order to allow future processing that may increase the experience of being in the same room. For instance, the background of the scene could be replaced by a standard one for all participants. The proposed system uses a dynamic programming algorithm guided by energies, such as image edges, motion and probabilistic information. After extensive tests, we could conclude that the results obtained are comparable to other state of the art works and the system is able to execute in real time at 8 FPS. The advantage of the proposed system when compared to others is that no previous training is required in order to perform the segmentation
|
3 |
3D real time object recognitionAmplianitis, Konstantinos 01 March 2017 (has links)
Die Objekterkennung ist ein natürlicher Prozess im Menschlichen Gehirn. Sie ndet im visuellen Kortex statt und nutzt die binokulare Eigenschaft der Augen, die eine drei- dimensionale Interpretation von Objekten in einer Szene erlaubt. Kameras ahmen das menschliche Auge nach. Bilder von zwei Kameras, in einem Stereokamerasystem, werden von Algorithmen für eine automatische, dreidimensionale Interpretation von Objekten in einer Szene benutzt. Die Entwicklung von Hard- und Software verbessern den maschinellen Prozess der Objek- terkennung und erreicht qualitativ immer mehr die Fähigkeiten des menschlichen Gehirns. Das Hauptziel dieses Forschungsfeldes ist die Entwicklung von robusten Algorithmen für die Szeneninterpretation. Sehr viel Aufwand wurde in den letzten Jahren in der zweidimen- sionale Objekterkennung betrieben, im Gegensatz zur Forschung zur dreidimensionalen Erkennung. Im Rahmen dieser Arbeit soll demnach die dreidimensionale Objekterkennung weiterent- wickelt werden: hin zu einer besseren Interpretation und einem besseren Verstehen von sichtbarer Realität wie auch der Beziehung zwischen Objekten in einer Szene. In den letzten Jahren aufkommende low-cost Verbrauchersensoren, wie die Microsoft Kinect, generieren Farb- und Tiefendaten einer Szene, um menschenähnliche visuelle Daten zu generieren. Das Ziel hier ist zu zeigen, wie diese Daten benutzt werden können, um eine neue Klasse von dreidimensionalen Objekterkennungsalgorithmen zu entwickeln - analog zur Verarbeitung im menschlichen Gehirn. / Object recognition is a natural process of the human brain performed in the visual cor- tex and relies on a binocular depth perception system that renders a three-dimensional representation of the objects in a scene. Hitherto, computer and software systems are been used to simulate the perception of three-dimensional environments with the aid of sensors to capture real-time images. In the process, such images are used as input data for further analysis and development of algorithms, an essential ingredient for simulating the complexity of human vision, so as to achieve scene interpretation for object recognition, similar to the way the human brain perceives it. The rapid pace of technological advancements in hardware and software, are continuously bringing the machine-based process for object recognition nearer to the inhuman vision prototype. The key in this eld, is the development of algorithms in order to achieve robust scene interpretation. A lot of recognisable and signi cant e ort has been successfully carried out over the years in 2D object recognition, as opposed to 3D. It is therefore, within this context and scope of this dissertation, to contribute towards the enhancement of 3D object recognition; a better interpretation and understanding of reality and the relationship between objects in a scene. Through the use and application of low-cost commodity sensors, such as Microsoft Kinect, RGB and depth data of a scene have been retrieved and manipulated in order to generate human-like visual perception data. The goal herein is to show how RGB and depth information can be utilised in order to develop a new class of 3D object recognition algorithms, analogous to the perception processed by the human brain.
|
Page generated in 0.1134 seconds