Global ETD Search

Return to search

Use of projector-camera system for human-computer interaction.

用投影機替代傳統的顯示器可在較小尺寸的設備上得到較大尺寸的顯示，從而彌補了傳統顯示器移動性差的不足。投影機照相機系統通過不可感知的結構光，在顯示視頻內容的同時具備了三維傳感能力，從而可為自然人機交互提供良好的平臺。投影機照相機系統在人機交互中的應用主要包括以下四個核心內容: (1)同時顯示和傳感，即如何在最低限度的影響原始投影的前提下，使得普通視頻投影機既是顯示設備又是三維傳感器;(2) 三維信息的理解:即如何通過利用額外的信息來彌補稀疏點云的不足，從而改善系統性能; (3) 分割:即如何在不斷變化投影內容的影響下得到準確的分割(4) 姿態識別:即如何從單張圖像中得到三維姿態。本文將針對上述四個方面進行深入的研究和探討，並提出改造方案。 / 首先，為了解決嵌入編碼不可見性與編碼恢復魯棒性之間的矛盾，本文提出一種在編解碼兩端同時具備抗噪能力的方法。我們使用特殊設計的幾何圖元和較大的海明距離來編碼，從而增強了抗噪聲干擾能力。同時在解碼端，我們使用事先通過訓練得到的幾何圖元檢測器來檢測和識別嵌入圖像的編碼，從而解決了因噪聲干擾使用傳統結構光中的分割方法很難提取嵌入編碼的困難。 / 其次在三維信息的理解方面，我們提出了一個通過不可感知結構光來實現六自由度頭部姿態估計的方法。首先，通過精心設計的投影策略和照相機-投影機的同步，在不可感知結構光的照射下，我們得到了模式圖和與之相對應的紋理圖。然後，在紋理圖中使用主動表觀模型定位二維面部特徵，在模式圖中通用結構光方法計算出點雲坐標，結合上述兩種信息來計算面部特征點的三維坐標。最后，通過不同幀中對應特征點三維坐標間的相關矩陣的奇異值分解來估計頭部的朝向和位移。 / 在分割方面，我們提出一種在投影機-照相機系統下由粗到精的手部分割方法。首先手部區域先通過對比度顯著性檢測的方法粗略分割出來，然後通過保護邊界的平滑方法保證分割區域的一致性，最后精確的分割結果自置信度分析得到。 / 最後，我們又探討如何僅使用投影機和照相機將在普通桌面上的投影區域轉化成觸摸屏的方案。我們將一種經過統計分析得到的隨機二元編碼嶽入到普通投影內容中，從而在用戶沒有感知的情況下，使得投影機-照相機系統具備三維感知的能力。最終手指是否觸及桌面是通過投影機-照相機-桌面系統的標定信息，精准的手部區域分割和手指尖定位，投影機投影平面勻照相機圖像平面的單應映射以及最入投影的編碼來確定。 / The use of a projector in place of traditional display device would dissociate display size from device size, making portability much less an issue. Associated with camera, the projector-camera system allows simultaneous video display and 3D acquisition through imperceptible structured light sensing, providing a vivid and immersed platform for natural human-computer interaction. Key issues involved in the approach include: (1) Simultaneous Display and Acquisition: how to make normal video projector not only a display device but also a 3D sensor even with the prerequisite of incurring minimum disturbance to the original projection; (2) 3D Information Interpretation: how to interpret the spare depth information with the assistance of some additional cues to enhance the system performance; (3) Segmentation: how to acquire accurate segmentation in the presence of the incessant variation of the projected video content; (4) Posture Recognition: how to infer 3D posture from single image. This thesis aims at providing improved solutions to each of these issues. / To address the conflict between imperceptibility of the embedded codes and the robustness of code retrieval, noise-tolerant schemes to both the coding and decoding stages are introduced. At the coding end, specifically designed primitive shapes and large Hamming distance are employed to enhance tolerance toward noise. At the decoding end, pre-trained primitive shape detectors are used to detect and identify the embedded codes a task difficult to achieve by segmentation that is used in general structured light methods, for the weakly embedded information is generally interfered by substantial noise. / On 3D information interpretation, a system that estimates 6-DOF head pose by imperceptible structured light sensing is proposed. First, through elaborate pattern projection strategy and camera-projector synchronization, pattern-illuminated images and the corresponding scene-texture image are captured with imperceptible patterned illumination. Then, 3D positions of the key facial feature points are derived by a combination of the 2D facial feature points in the scene-texture image localized by AAM and the point cloud generated by structured light sensing. Eventually, the head orientation and translation are estimated by SVD of a correlation matrix that is generated from the 3D corresponding feature point pairs over different frames. / On the segmentation issue, we describe a coarse-to-fine hand segmentation method for projector-camera system. After rough segmentation by contrast saliency detection and mean shift-based discontinuity-preserved smoothing, the refined result is confirmed through confidence evaluation. / Finally, we address how an HCI (Human-Computer Interface) with small device size, large display, and touch input facility can be made possible by a mere projector and camera. The realization is through the use of a properly embedded structured light sensing scheme that enables a regular light-colored table surface to serve the dual roles of both a projection screen and a touch-sensitive display surface. A random binary pattern is employed to code structured light in pixel accuracy, which is embedded into the regular projection display in a way that the user perceives only regular display but not the structured pattern hidden in the display. With the projection display on the table surface being imaged by a camera, the observed image data, plus the known projection content, can work together to probe the 3D world immediately above the table surface, like deciding if there is a finger present and if the finger touches the table surface, and if so at what position on the table surface the finger tip makes the contact. All the decisions hinge upon a careful calibration of the projector-camera-table surface system, intelligent segmentation of the hand in the image data, and exploitation of the homography mapping existing between the projector’s display panel and the camera’s image plane. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Dai, Jingwen. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 155-182). / Abstract also in Chinese. / Abstract --- p.i / 摘要 --- p.iv / Acknowledgement --- p.vi / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Challenges --- p.2 / Chapter 1.2.1 --- Simultaneous Display and Acquisition --- p.2 / Chapter 1.2.2 --- 3D Information Interpretation --- p.3 / Chapter 1.2.3 --- Segmentation --- p.4 / Chapter 1.2.4 --- Posture Recognition --- p.4 / Chapter 1.3 --- Objective --- p.5 / Chapter 1.4 --- Organization of the Thesis --- p.5 / Chapter 2 --- Background --- p.9 / Chapter 2.1 --- Projector-Camera System --- p.9 / Chapter 2.1.1 --- Projection Technologies --- p.10 / Chapter 2.1.2 --- Researches in ProCams --- p.16 / Chapter 2.2 --- Natural Human-Computer Interaction --- p.24 / Chapter 2.2.1 --- Head Pose --- p.25 / Chapter 2.2.2 --- Hand Gesture --- p.33 / Chapter 3 --- Head Pose Estimation by ISL --- p.41 / Chapter 3.1 --- Introduction --- p.42 / Chapter 3.2 --- Previous Works --- p.44 / Chapter 3.2.1 --- Head Pose Estimation --- p.44 / Chapter 3.2.2 --- Imperceptible Structured Light --- p.46 / Chapter 3.3 --- Method --- p.47 / Chapter 3.3.1 --- Pattern Projection Strategy for Imperceptible Structured Light Sensing --- p.47 / Chapter 3.3.2 --- Facial Feature Localization --- p.48 / Chapter 3.3.3 --- 6 DOF Head Pose Estimation --- p.54 / Chapter 3.4 --- Experiments --- p.57 / Chapter 3.4.1 --- Overview of Experiment Setup --- p.57 / Chapter 3.4.2 --- Test Dataset Collection --- p.58 / Chapter 3.4.3 --- Results --- p.59 / Chapter 3.5 --- Summary --- p.63 / Chapter 4 --- Embedding Codes into Normal Projection --- p.65 / Chapter 4.1 --- Introduction --- p.66 / Chapter 4.2 --- Previous Works --- p.68 / Chapter 4.3 --- Method --- p.70 / Chapter 4.3.1 --- Principle of Embedding Imperceptible Codes --- p.70 / Chapter 4.3.2 --- Design of Embedded Pattern --- p.73 / Chapter 4.3.3 --- Primitive Shape Identification and Decoding --- p.76 / Chapter 4.3.4 --- Codeword Retrieval --- p.77 / Chapter 4.4 --- Experiments --- p.79 / Chapter 4.4.1 --- Overview of Experiment Setup --- p.79 / Chapter 4.4.2 --- Embedded Code Imperceptibility Evaluation --- p.81 / Chapter 4.4.3 --- Primitive Shape Detection Accuracy Evaluation --- p.82 / Chapter 4.5 --- Sensitivity Evaluation --- p.84 / Chapter 4.5.1 --- Working Distance --- p.85 / Chapter 4.5.2 --- Projection Surface Orientation --- p.87 / Chapter 4.5.3 --- Projection Surface Shape --- p.88 / Chapter 4.5.4 --- Projection Surface Texture --- p.91 / Chapter 4.5.5 --- Projector-Camera System --- p.91 / Chapter 4.6 --- Applications --- p.95 / Chapter 4.6.1 --- 3D Reconstruction with Normal Video Projection --- p.95 / Chapter 4.6.2 --- Sensing Surrounding Environment on Mobile Robot Platform --- p.97 / Chapter 4.6.3 --- Natural Human-Computer Interaction --- p.99 / Chapter 4.7 --- Summary --- p.99 / Chapter 5 --- Hand Segmentation in PROCAMS --- p.102 / Chapter 5.1 --- Previous Works --- p.103 / Chapter 5.2 --- Method --- p.106 / Chapter 5.2.1 --- Rough Segmentation by Contrast Saliency --- p.106 / Chapter 5.2.2 --- Mean-Shift Region Smoothing --- p.108 / Chapter 5.2.3 --- Precise Segmentation by Fusing --- p.110 / Chapter 5.3 --- Experiments --- p.111 / Chapter 5.4 --- Summary --- p.115 / Chapter 6 --- Surface Touch-Sensitive Display --- p.116 / Chapter 6.1 --- Introduction --- p.117 / Chapter 6.2 --- Previous Works --- p.119 / Chapter 6.3 --- Priors in Pro-Cam System --- p.122 / Chapter 6.3.1 --- Homography Estimation --- p.123 / Chapter 6.3.2 --- Radiometric Prediction --- p.124 / Chapter 6.4 --- Embedding Codes into Video Projection --- p.125 / Chapter 6.4.1 --- Imperceptible Structured Light --- p.125 / Chapter 6.4.2 --- Embedded Pattern Design Strategy and Statistical Analysis --- p.126 / Chapter 6.5 --- Touch Detection using Homography and Embedded Code --- p.129 / Chapter 6.5.1 --- Hand Segmentation --- p.130 / Chapter 6.5.2 --- Fingertip Detection --- p.130 / Chapter 6.5.3 --- Touch Detection Through Homography --- p.131 / Chapter 6.5.4 --- From Resistive Touching to Capacitive Touching --- p.133 / Chapter 6.6 --- Experiments --- p.135 / Chapter 6.6.1 --- System Initialization --- p.137 / Chapter 6.6.2 --- Display Quality Evaluation --- p.139 / Chapter 6.6.3 --- Touch Accuracy Evaluation --- p.141 / Chapter 6.6.4 --- Trajectory Tracking Evaluation --- p.145 / Chapter 6.6.5 --- Multiple-Touch Evaluation --- p.145 / Chapter 6.6.6 --- Efficiency Evaluation --- p.147 / Chapter 6.7 --- Summary --- p.149 / Chapter 7 --- Conclusion and Future Work --- p.150 / Chapter 7.1 --- Conclusion and Contributions --- p.150 / Chapter 7.2 --- Related Publications --- p.152 / Chapter 7.3 --- Future Work --- p.153 / Bibliography --- p.155

Three-dimensional imaging

Optical pattern recognition

Computer vision

Human-computer interaction

Identifer	oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328404
Date	January 2012
Contributors	Dai, Jingwen., Chinese University of Hong Kong Graduate School. Division of Mechanical and Automation Engineering.
Source Sets	The Chinese University of Hong Kong
Language	English, Chinese
Detected Language	English
Type	Text, bibliography
Format	electronic resource, electronic resource, remote, 1 online resource (xvii, 182 leaves) : ill. (chiefly col.)
Rights	Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0021 seconds

Use of projector-camera system for human-computer interaction.

Description

Links & Downloads

Tags

Additional Fields