Spelling suggestions: "subject:"computer vision -- amathematical models."" "subject:"computer vision -- dmathematical models.""
11 |
Modeling and rendering from multiple views. / CUHK electronic theses & dissertations collectionJanuary 2006 (has links)
The first approach, described in the first part of this thesis, studies 3D face modeling from multi-views. Today human face modeling and animation techniques are widely used to generate virtual characters and models. Such characters and models are used in movies, computer games, advertising, news broadcasting and other activities. We propose an efficient method to estimate the poses, the global shape and the local structures of a human head recorded in multiple face images or a video sequence by using a generic wireframe face model. Based on this newly proposed method, we have successfully developed a pose invariant face recognition system and a pose invariant face contour extraction method. / The objective of this thesis is to model and render complex scenes or objects from multiple images taken from different viewpoints. Two approaches to achieve this objective were investigated in this thesis. The first one is for known objects with prior geometrical models, which can be deformed to match the objects recorded in multiple input images. The second one is for general scenes or objects without prior geometrical models. / The proposed algorithms in this thesis were tested on many real and synthetic data. The experimental results illustrate their efficiency and limitations. / The second approach, described in the second part of this thesis, investigates 3D modeling and rendering for general complex scenes. The entertainment industry touches hundreds of millions of people every day, and synthetic pictures and 3D reconstruction of real scenes, often mixed with actual film footage, are now common place in computer games, sports broadcasting, TV advertising and feature films. A series of techniques has been developed to complete this task. First, a new view-ordering algorithm was proposed to organize and order an unorganized image database. Second, a novel and efficient multiview feature matching approach was developed to calibrate and track all views. Finally, both match propagation based and Bayesian based methods were developed to produce 3D scene models for rendering. / Yao Jian. / "September 2006." / Adviser: Wai-Kuen Chan. / Source: Dissertation Abstracts International, Volume: 68-03, Section: B, page: 1849. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2006. / Includes bibliographical references (p. 170-181). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307.
|
12 |
Motion and shape from apparent flow.January 2013 (has links)
捕捉攝像機運動和重建攝像機成像場景深度圖的測定是在計算機視覺和機器任務包括可視化控制和自主導航是非常重要。在執行上述任務時,一個攝像機(或攝像機群組)通常安裝在機器的執行端部。攝像機和執行端部之間的手眼校準在視覺控制的正常操作中是不可缺少的。同樣,在對於需要使用多個攝像機的情况下,它們的相對幾何關係也是對各種計算機視覺應用來說也是非常重要。 / 攝像機和場景的相對運動通常產生出optical flow。問題的困難主要在於,在直接觀察視頻中的optical flow通常不是完全由運動誘導出的optical flow,而只是它的一部分。這個部分就是空間圖像等光線輪廓的正交。這部分的流場被稱為normal flow。本論文提出直接利用normal flow,而不是由normal flow引申出的optical flow,去解決以下的問題:尋找攝像機運動,場景深度圖和手眼校準。這種方法有許多顯著的貢獻,它不需引申流場,進而不要求平滑的成像場景。跟optical flow相反,normal flow不需要複雜的優化處理程序去解決流場不連續性的問題,這種技術一般是需要用大量的計算量。這也打破了傳統攝像機運動與場景深度之間的問題,在沒有預先知道不連續位置的情況下也可找出攝像機的運動。這篇論提出了幾個直接方法運用在三種不同類型的視覺系統,分別是單個攝像機,雙攝像機和多個攝像機,去找出攝像機的運動。 / 本論文首先提通過Apparent Flow 正深度 (AFPD) 約束去利用所有觀察到的normal flow去找出單個攝像機的運動參數。AFPD約束是利用一個優化問題來估計運動參數。一個反复由粗到細雙重約束的投票框架能使AFPD約束尋找出運動參數。 / 由於有限的視頻採樣率,normal flow在提取方向比其幅度部分更準確。本論文提出了兩個約束條件:一個是Apparent Flow方向(AFD)的約束,另外一個是Apparent Flow 幅度(AFM)的約束去尋找運動參數。第一個約束本身是作為一個線性不等式系統去約束運動方向的參數,第二個是利用所有圖像位置的旋轉幅度的統一性去進一步限制運動參數。一個兩階段從粗到細的約束框架能使AFD及AFM約束尋找出運動參數。 / 然而,如果沒有optical flow,normal flow是唯一的原始資料,它通常遭受到有限影像分辨率和有限視頻採樣率的問題而產生出錯誤。本文探討了這個問題的補救措施,方法是把一些攝像機併在一起,形成一個近似球形的攝像機,以增加成像系統的視野。有了一個加寬視野,normal flow的數量可更大,這可以用來抵銷normal flow在每個成像點的提取錯誤。更重要的是,攝像頭的平移和旋轉運動方向可以透過Apparent Flow分離 (AFS) 約束 及 延伸Apparent Flow分離 (EAFS) 約束來獨立估算。 / 除了使用單攝像機或球面成像系統之外,立體視覺成像系統提供了其它的視覺線索去尋找攝像機在沒有被任意縮放大小的平移運動和深度圖。傳統的立體視覺方法是確定在兩個輸入圖像特徵的對應。然而,對應的建立是非常困難。本文探討了兩個直接方法來恢復完整的攝像機運動,而沒有需要利用一對影像明確的點至點對應。第一種方法是利用AFD和AFM約束伸延到立體視覺系統,並提供了一個穩定的幾何方法來確定平移運動的幅度。第二個方法需要利用有一個較大的重疊視場,以提供一個不需反覆計算的closed-form算法。一旦確定了運動參數,深度圖可以沒有任何困難地重建。從normal flow產生的深度圖一般是以稀疏的形式存在。我們可以通過擴張深度圖,然後利用它作為在常見的TV-L₁框架的初始估計。其結果不僅有一個更好的重建性能,也產生出更快的運算時間。 / 手眼校準通常是基於像圖特徵對應。本文提出一個替代方法,是從動態攝像系統產生的normal flow來做自我校準。為了使這個方法有更強防備噪音的能力,策略是使用normal flow的流場方向去尋找手眼幾何的方向部份。偏離點及部分的手眼幾何可利用normal flow固有的流場屬性去尋找。最後完整的手眼幾何可使用穩定法來變得更可靠。手眼校準還可以被用來確定多個攝像機的相對幾何關係,而不需要求它們有重疊的視場。 / Determination of general camera motion and reconstructing depth map from a captured video of the imaged scene relative to a camera is important for computer vision and various robotics tasks including visual control and autonomous navigation. A camera (or a cluster of cameras) is usually mounted on the end-effector of a robot arm when performing the above tasks. The determination of the relative geometry between the camera frame and the end-effector frame which is commonly referred as hand-eye calibration is essential to proper operation in visual control. Similarly, determining the relative geometry of multiple cameras is also important to various applications requiring the use of multi-camera rig. / The relative motion between an observer and the imaged scene generally induces apparent flow in the video. The difficulty of the problem lies mainly in that the flow pattern directly observable in the video is generally not the full flow field induced by the motion, but only partial information of it, which is orthogonal to the iso-brightness contour of the spatial image intensity profile. The partial flow field is known as the normal flow field. This thesis addresses several important problems in computer vision: determination of camera motion, recovery of depth map, and performing hand-eye calibration from the apparent flow (normal flow) pattern itself in the video data directly but not from the full flow interpolated from the apparent flow. This approach has a number of significant contributions. It does not require interpolating the flow field and in turn does not demand the imaged scene to be smooth. In contrast to optical flow, no sophisticated optimization procedures that account for handling flow discontinuities are required, and such techniques are generally computational expensive. It also breaks the classical chicken-and-egg problem between scene depth and camera motion. No prior knowledge about the locations of the discontinuities is required for motion determination. In this thesis, several direct methods are proposed to determine camera motion using three different types of imaging systems, namely monocular camera, stereo camera, and multi-camera rig. / This thesis begins with the Apparent Flow Positive Depth (AFPD) constraint to determine the motion parameters using all observable normal flows from a monocular camera. The constraint presents itself as an optimization problem to estimate the motion parameters. An iterative process in a constrained dual coarse-to-fine voting framework on the motion parameter space is used to exploit the constraint. / Due to the finite video sampling rate, the extracted normal flow field is generally more accurate in direction component than its magnitude part. This thesis proposes two constraints: one related to the direction component of the normal flow field - the Apparent Flow Direction (AFD) constraint, and the other to the magnitude component of the field - the Apparent Flow Magnitude (AFM) constraint, to determine motion. The first constraint presents itself as a system of linear inequalities to bind the direction of motion parameters; the second one uses the globality of rotational magnitude to all image positions to constrain the motion parameters further. A two-stage iterative process in a coarse-to-fine framework on the motion parameter space is used to exploit the two constraints. / Yet without the need of the interpolation step, normal flow is only raw information extracted locally that generally suffers from flow extraction error arisen from finiteness of the image resolution and video sampling rate. This thesis explores a remedy to the problem, which is to increase the visual field of the imaging system by fixating a number of cameras together to form an approximate spherical eye. With a substantially widened visual field, the normal flow data points would be in a much greater number, which can be used to combat the local flow extraction error at each image point. More importantly, the directions of translation and rotation components in general motion can be separately estimated with the use of the novel Apparent Flow Separation (AFS) and Extended Apparent Flow Separation (EAFS) constraints. / Instead of using a monocular camera or a spherical imaging system, stereo vision contributes another visual clue to determine magnitude of translation and depth map without the problem of arbitrarily scaling of the magnitude. The conventional approach in stereo vision is to determine feature correspondences across the two input images. However, the correspondence establishment is often difficult. This thesis explores two direct methods to recover the complete camera motion from the stereo system without the explicit point-to-point correspondences matching. The first method extends the AFD and AFM constraints to stereo camera, and provides a robust geometrical method to determine translation magnitude. The second method which requires the stereo image pair to have a large overlapped field of view provides a closed-form solution, requiring no iterative computation. Once the motion parameters are here, depth map can be reconstructed without any difficulty. The depth map resulted from normal flows is generally sparse in nature. We can interpolate the depth map and then utilizing it as an initial estimate in a conventional TV-L₁ framework. The result is not only a better reconstruction performance, but also a faster computation time. / Calibration of hand-eye geometry is usually based on feature correspondences. This thesis presents an alternative method that uses normal flows generated from an active camera system to perform self-calibration. In order to make the method more robust to noise, the strategy is to use the direction component of the flow field which is more noise-immune to recover the direction part of the hand-eye geometry first. Outliers are then detected using some intrinsic properties of the flow field together with the partially recovered hand-eye geometry. The final solution is refined using a robust method. The method can also be used to determine the relative geometry of multiple cameras without demanding overlap in their visual fields. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Hui, Tak Wai. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 159-165). / Abstracts in English and Chinese. / Acknowledgements --- p.i / Abstract --- p.ii / Lists of Figures --- p.xiii / Lists of Tables --- p.xix / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Motivation --- p.4 / Chapter 1.3 --- Research Objectives --- p.6 / Chapter 1.4 --- Thesis Outline --- p.7 / Chapter Chapter 2 --- Literature Review --- p.10 / Chapter 2.1 --- Introduction --- p.10 / Chapter 2.2 --- Recovery of Optical Flows --- p.10 / Chapter 2.3 --- Egomotion Estimation Based on Optical Flow Field --- p.14 / Chapter 2.3.1 --- Bilinear Constraint --- p.14 / Chapter 2.3.2 --- Subspace Method --- p.15 / Chapter 2.3.3 --- Partial Search Method --- p.16 / Chapter 2.3.4 --- Fixation --- p.17 / Chapter 2.3.5 --- Region Alignment --- p.17 / Chapter 2.3.6 --- Linearity and Divergence Properties of Optical Flows --- p.18 / Chapter 2.3.7 --- Constraint Lines and Collinear Points --- p.18 / Chapter 2.3.8 --- Multi-Camera Rig --- p.19 / Chapter 2.3.9 --- Discussion --- p.21 / Chapter 2.4 --- Determining Egomotion Using Direct Methods --- p.22 / Chapter 2.4.1 --- Introduction --- p.22 / Chapter 2.4.2 --- Classical Methods --- p.23 / Chapter 2.4.3 --- Pattern Matching --- p.24 / Chapter 2.4.4 --- Search Subspace Method --- p.25 / Chapter 2.4.5 --- Histogram-Based Method --- p.26 / Chapter 2.4.6 --- Multi-Camera Rig --- p.26 / Chapter 2.4.7 --- Discussion --- p.27 / Chapter 2.5 --- Determining Egomotion Using Feature Correspondences --- p.28 / Chapter 2.6 --- Hand-Eye Calibration --- p.30 / Chapter 2.7 --- Summary --- p.31 / Chapter Chapter 3 --- Determining Motion from Monocular Camera Using Merely the Positive Depth Constraint --- p.32 / Chapter 3.1 --- Introduction --- p.32 / Chapter 3.2 --- Related Works --- p.33 / Chapter 3.3 --- Background --- p.34 / Chapter 3.3 --- Apparent Flow Positive Depth (AFPD) Constraint --- p.39 / Chapter 3.4 --- Numerical Solution to AFPD Constraint --- p.40 / Chapter 3.5 --- Constrained Coarse-to-Fine Searching --- p.40 / Chapter 3.6 --- Experimental Results --- p.43 / Chapter 3.7 --- Conclusion --- p.47 / Chapter Chapter 4 --- Determining Motion from Monocular Camera Using Direction and Magnitude of Normal Flows Separately --- p.48 / Chapter 4.1 --- Introduction --- p.48 / Chapter 4.2 --- Related Works --- p.50 / Chapter 4.3 --- Apparent Flow Direction (AFD) Constraint --- p.51 / Chapter 4.3.1 --- The Special Case: Pure Translation --- p.51 / Chapter 4.3.1.1 --- Locus of Translation Using Full Flow as a Constraint --- p.51 / Chapter 4.3.1.2 --- Locus of Translation Using Normal Flow as a Constraint --- p.53 / Chapter 4.3.2 --- The Special Case: Pure Rotation --- p.54 / Chapter 4.3.2.1 --- Locus of Rotation Using Full Flow as a Constraint --- p.54 / Chapter 4.3.2.2 --- Locus of Rotation Using Normal Flow as a Constraint --- p.54 / Chapter 4.3.3 --- Solving the System of Linear Inequalities for the Two Special Cases --- p.55 / Chapter 4.3.5 --- Ambiguities of AFD Constraint --- p.59 / Chapter 4.4 --- Apparent Flow Magnitude (AFM) Constraint --- p.60 / Chapter 4.5 --- Putting the Two Constraints Together --- p.63 / Chapter 4.6 --- Experimental Results --- p.65 / Chapter 4.6.1 --- Simulation --- p.65 / Chapter 4.6.2 --- Video Data --- p.67 / Chapter 4.6.2.1 --- Pure Translation --- p.67 / Chapter 4.6.2.2 --- General Motion --- p.68 / Chapter 4.7 --- Conclusion --- p.72 / Chapter Chapter 5 --- Determining Motion from Multi-Cameras with Non-Overlapping Visual Fields --- p.73 / Chapter 5.1 --- Introduction --- p.73 / Chapter 5.2 --- Related Works --- p.75 / Chapter 5.3 --- Background --- p.76 / Chapter 5.3.1 --- Image Sphere --- p.77 / Chapter 5.3.2 --- Planar Case --- p.78 / Chapter 5.3.3 --- Projective Transformation --- p.79 / Chapter 5.4 --- Constraint from Normal Flows --- p.80 / Chapter 5.5 --- Approximation of Spherical Eye by Multiple Cameras --- p.81 / Chapter 5.6 --- Recovery of Motion Parameters --- p.83 / Chapter 5.6.1 --- Classification of a Pair of Normal Flows --- p.84 / Chapter 5.6.2 --- Classification of a Triplet of Normal Flows --- p.86 / Chapter 5.6.3 --- Apparent Flow Separation (AFS) Constraint --- p.87 / Chapter 5.6.3.1 --- Constraint to Direction of Translation --- p.87 / Chapter 5.6.3.2 --- Constraint to Direction of Rotation --- p.88 / Chapter 5.6.3.3 --- Remarks about the AFS Constraint --- p.88 / Chapter 5.6.4 --- Extension of Apparent Flow Separation Constraint (EAFS) --- p.89 / Chapter 5.6.4.1 --- Constraint to Direction of Translation --- p.90 / Chapter 5.6.4.2 --- Constraint to Direction of Rotation --- p.92 / Chapter 5.6.5 --- Solution to the AFS and EAFS Constraints --- p.94 / Chapter 5.6.6 --- Apparent Flow Magnitude (AFM) Constraint --- p.96 / Chapter 5.7 --- Experimental Results --- p.98 / Chapter 5.7.1 --- Simulation --- p.98 / Chapter 5.7.2 --- Real Video --- p.103 / Chapter 5.7.2.1 --- Using Feature Correspondences --- p.108 / Chapter 5.7.2.2 --- Using Optical Flows --- p.108 / Chapter 5.7.2.3 --- Using Direct Methods --- p.109 / Chapter 5.8 --- Conclusion --- p.111 / Chapter Chapter 6 --- Motion and Shape from Binocular Camera System: An Extension of AFD and AFM Constraints --- p.112 / Chapter 6.1 --- Introduction --- p.112 / Chapter 6.2 --- Related Works --- p.112 / Chapter 6.3 --- Recovery of Camera Motion Using Search Subspaces --- p.113 / Chapter 6.4 --- Correspondence-Free Stereo Vision --- p.114 / Chapter 6.4.1 --- Determination of Full Translation Using Two 3D Lines --- p.114 / Chapter 6.4.2 --- Determination of Full Translation Using All Normal Flows --- p.115 / Chapter 6.4.3 --- Determination of Full Translation Using a Geometrical Method --- p.117 / Chapter 6.5 --- Experimental Results --- p.119 / Chapter 6.5.1 --- Synthetic Image Data --- p.119 / Chapter 6.5.2 --- Real Scene --- p.120 / Chapter 6.6 --- Conclusion --- p.122 / Chapter Chapter 7 --- Motion and Shape from Binocular Camera System: A Closed-Form Solution for Motion Determination --- p.123 / Chapter 7.1 --- Introduction --- p.123 / Chapter 7.2 --- Related Works --- p.124 / Chapter 7.3 --- Background --- p.125 / Chapter 7.4 --- Recovery of Camera Motion Using a Linear Method --- p.126 / Chapter 7.4.1 --- Region-Correspondence Stereo Vision --- p.126 / Chapter 7.3.2 --- Combined with Epipolar Constraints --- p.127 / Chapter 7.4 --- Refinement of Scene Depth --- p.131 / Chapter 7.4.1 --- Using Spatial and Temporal Constraints --- p.131 / Chapter 7.4.2 --- Using Stereo Image Pairs --- p.134 / Chapter 7.5 --- Experiments --- p.136 / Chapter 7.5.1 --- Synthetic Data --- p.136 / Chapter 7.5.2 --- Real Image Sequences --- p.137 / Chapter 7.6 --- Conclusion --- p.143 / Chapter Chapter 8 --- Hand-Eye Calibration Using Normal Flows --- p.144 / Chapter 8.1 --- Introduction --- p.144 / Chapter 8.2 --- Related Works --- p.144 / Chapter 8.3 --- Problem Formulation --- p.145 / Chapter 8.3 --- Model-Based Brightness Constraint --- p.146 / Chapter 8.4 --- Hand-Eye Calibration --- p.147 / Chapter 8.4.1 --- Determining the Rotation Matrix R --- p.148 / Chapter 8.4.2 --- Determining the Direction of Position Vector T --- p.149 / Chapter 8.4.3 --- Determining the Complete Position Vector T --- p.150 / Chapter 8.4.4 --- Extrinsic Calibration of a Multi-Camera Rig --- p.151 / Chapter 8.5 --- Experimental Results --- p.151 / Chapter 8.5.1 --- Synthetic Data --- p.151 / Chapter 8.5.2 --- Real Image Data --- p.152 / Chapter 8.6 --- Conclusion --- p.153 / Chapter Chapter 9 --- Conclusion and Future Work --- p.154 / Related Publications --- p.158 / Bibliography --- p.159 / Appendix --- p.166 / Chapter A --- Apparent Flow Direction Constraint --- p.166 / Chapter B --- Ambiguity of AFD Constraint --- p.168 / Chapter C --- Relationship between the Angle Subtended by any two Flow Vectors in Image Plane and the Associated Flow Vectors in Image Sphere --- p.169
|
13 |
Three dimensional motion tracking using micro inertial measurement unit and monocular visual system. / 應用微慣性測量單元和單目視覺系統進行三維運動跟踪 / Ying yong wei guan xing ce liang dan yuan he dan mu shi jue xi tong jin xing san wei yun dong gen zongJanuary 2011 (has links)
Lam, Kin Kwok. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 99-103). / Abstracts in English and Chinese. / Abstract --- p.ii / 摘要 --- p.iii / Acknowledgements --- p.iv / Table of Contents --- p.v / List of Figures --- p.viii / List of Tables --- p.xi / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Intrinsic Problem of Today's Pose Estimation Systems --- p.1 / Chapter 1.2 --- Multi-sensors Data Fusion --- p.2 / Chapter 1.3 --- Objectives and Contributions --- p.3 / Chapter 1.4 --- Organization of the dissertation --- p.4 / Chapter Chapter 2 --- Architecture of Sensing System --- p.5 / Chapter 2.1 --- Hardware for Pose Estimation System --- p.5 / Chapter 2.2 --- Software for Pose Estimation System --- p.6 / Chapter Chapter 3 --- Inertial Measurement System --- p.7 / Chapter 3.1 --- Basic knowledge of Inertial Measurement System --- p.7 / Chapter 3.2 --- Strapdown Inertial Navigation --- p.8 / Chapter 3.2.1 --- Tracking Orientation --- p.9 / Chapter 3.2.2 --- Discussion of Attitude Representations --- p.14 / Chapter 3.2.3 --- Tracking Position --- p.16 / Chapter 3.3 --- Summary of Strapdown Inertial Navigation --- p.16 / Chapter Chapter 4 --- Visual Tracking System --- p.17 / Chapter 4.1 --- Background of Visual Tracking System --- p.17 / Chapter 4.2 --- Basic knowledge of Camera Calibration and Model --- p.18 / Chapter 4.2.1 --- Related Coordinate Frames --- p.18 / Chapter 4.2.2 --- Pinhole Camera Model --- p.20 / Chapter 4.2.3 --- Calibration for Nonlinear Model --- p.21 / Chapter 4.3 --- Implementation of Process to Calibrate Camera --- p.22 / Chapter 4.3.1 --- Image Capture and Corners Extraction --- p.22 / Chapter 4.3.2 --- Camera Calibration --- p.23 / Chapter 4.4 --- Perspective-n-Point Problem --- p.25 / Chapter 4.5 --- Camera Pose Estimation Algorithms --- p.26 / Chapter 4.5.1 --- Pose Estimation Using Quadrangular Targets --- p.27 / Chapter 4.5.2 --- Efficient Perspective-n-Point Camera Pose Estimation --- p.31 / Chapter 4.5.3 --- Linear N-Point Camera Pose Determination --- p.33 / Chapter 4.5.4 --- Pose Estimation from Orthography and Scaling with Iterations --- p.36 / Chapter 4.6 --- Experimental Results of Camera Pose Estimation Algorithms --- p.40 / Chapter 4.6.1 --- Simulation Test --- p.40 / Chapter 4.6.2 --- Real Images Test --- p.43 / Chapter 4.6.3 --- Summary --- p.46 / Chapter Chapter 5 --- Kalman Filter --- p.47 / Chapter 5.1 --- Linear Dynamic System Model --- p.48 / Chapter 5.2 --- Time Update --- p.48 / Chapter 5.3 --- Measurement Update --- p.49 / Chapter 5.3.1 --- Maximum a Posterior Probability --- p.49 / Chapter 5.3.2 --- Batch Least-Square Estimation --- p.51 / Chapter 5.3.3 --- Measurement Update in Kalman Filter --- p.54 / Chapter 5.4 --- Summary of Kalman Filter --- p.56 / Chapter Chapter 6 --- Extended Kalman Filter --- p.58 / Chapter 6.1 --- Linearization of Nonlinear Systems --- p.58 / Chapter 6.2 --- Extended Kalman Filter --- p.59 / Chapter Chapter 7 --- Unscented Kalman Filter --- p.61 / Chapter 7.1 --- Least-square Estimator Structure --- p.61 / Chapter 7.2 --- Unscented Transform --- p.62 / Chapter 7.3 --- Unscented Kalman Filter --- p.64 / Chapter Chapter 8 --- Data Fusion Algorithm --- p.68 / Chapter 8.1 --- Traditional Multi-Sensor Data Fusion --- p.69 / Chapter 8.1.1 --- Measurement Fusion --- p.69 / Chapter 8.1.2 --- Track-to-Track Fusion --- p.71 / Chapter 8.2 --- Multi-Sensor Data Fusion using Extended Kalman Filter --- p.72 / Chapter 8.2.1 --- Time Update Model --- p.73 / Chapter 8.2.2 --- Measurement Update Model --- p.74 / Chapter 8.3 --- Multi-Sensor Data Fusion using Unscented Kalman Filter --- p.75 / Chapter 8.3.1 --- Time Update Model --- p.75 / Chapter 8.3.2 --- Measurement Update Model --- p.76 / Chapter 8.4 --- Simulation Test --- p.76 / Chapter 8.5 --- Experimental Test --- p.80 / Chapter 8.5.1 --- Rotational Test --- p.81 / Chapter 8.5.2 --- Translational Test --- p.86 / Chapter Chapter 9 --- Future Work --- p.93 / Chapter 9.1 --- Zero Velocity Compensation --- p.93 / Chapter 9.1.1 --- Stroke Segmentation --- p.93 / Chapter 9.1.2 --- Zero Velocity Compensation (ZVC) --- p.94 / Chapter 9.1.3 --- Experimental Results --- p.94 / Chapter 9.2 --- Random Sample Consensus Algorithm (RANSAC) --- p.96 / Chapter Chapter 10 --- Conclusion --- p.97 / Bibliography --- p.99
|
14 |
Segmentation based variational model for accurate optical flow estimation.January 2009 (has links)
Chen, Jianing. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 47-54). / Abstract also in Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Related Work --- p.3 / Chapter 1.3 --- Thesis Organization --- p.5 / Chapter 2 --- Review on Optical Flow Estimation --- p.6 / Chapter 2.1 --- Variational Model --- p.6 / Chapter 2.1.1 --- Basic Assumptions and Constraints --- p.6 / Chapter 2.1.2 --- More General Energy Functional --- p.9 / Chapter 2.2 --- Discontinuity Preserving Techniques --- p.9 / Chapter 2.2.1 --- Data Term Robustification --- p.10 / Chapter 2.2.2 --- Diffusion Based Regularization --- p.11 / Chapter 2.2.3 --- Segmentation --- p.15 / Chapter 2.3 --- Chapter Summary --- p.15 / Chapter 3 --- Segmentation Based Optical Flow Estimation --- p.17 / Chapter 3.1 --- Initial Flow --- p.17 / Chapter 3.2 --- Color-Motion Segmentation --- p.19 / Chapter 3.3 --- Parametric Flow Estimating Incorporating Segmentation --- p.21 / Chapter 3.4 --- Confidence Map Construction --- p.24 / Chapter 3.4.1 --- Occlusion detection --- p.24 / Chapter 3.4.2 --- Pixel-wise motion coherence --- p.24 / Chapter 3.4.3 --- Segment-wise model confidence --- p.26 / Chapter 3.5 --- Final Combined Variational Model --- p.28 / Chapter 3.6 --- Chapter Summary --- p.28 / Chapter 4 --- Experiment Results --- p.30 / Chapter 4.1 --- Quantitative Evaluation --- p.30 / Chapter 4.2 --- Warping Results --- p.34 / Chapter 4.3 --- Chapter Summary --- p.35 / Chapter 5 --- Application - Single Image Animation --- p.37 / Chapter 5.1 --- Introduction --- p.37 / Chapter 5.2 --- Approach --- p.38 / Chapter 5.2.1 --- Pre-Process Stage --- p.39 / Chapter 5.2.2 --- Coordinate Transform --- p.39 / Chapter 5.2.3 --- Motion Field Transfer --- p.41 / Chapter 5.2.4 --- Motion Editing and Apply --- p.41 / Chapter 5.2.5 --- Gradient-domain composition --- p.42 / Chapter 5.3 --- Experiments --- p.43 / Chapter 5.3.1 --- Active Motion Transfer --- p.43 / Chapter 5.3.2 --- Animate Stationary Temporal Dynamics --- p.44 / Chapter 5.4 --- Chapter Summary --- p.45 / Chapter 6 --- Conclusion --- p.46 / Bibliography --- p.47
|
15 |
Development of an algorithmic method for the recognition of biological objectsBernier, Thomas. January 1997 (has links)
An algorithmic method for the recognition of fungal spore cells in microscopic images, as well as its development and its origin, are described and demonstrated. The process is designed for a machine vision project which automatically identifies fungal spores within field samples for epidemiological simulation models. The method consists of a three-pass system that successfully recognizes spores in any position and which is tolerant of occlusion. / The algorithm, as implemented, demonstrated an accuracy of $ pm$5.3% on low quality images which is less than the assumed error of humans performing the same task. The processing speed also compared favorably with the performance of humans. / The method developed presents a framework of description that, through the first two passes, highlights certain distinctive aspects within an image. Those highlighted aspects are then recognized by the third pass. The system is loosely based on biological vision, is extremely versatile and could be adapted for the recognition of virtually any object in a digitized image.
|
16 |
Learning Structured Representations for Understanding Visual and Multimedia DataZareian, Alireza January 2021 (has links)
Recent advances in Deep Learning (DL) have achieved impressive performance in a variety of Computer Vision (CV) tasks, leading to an exciting wave of academic and industrial efforts to develop Artificial Intelligence (AI) facilities for every aspect of human life. Nevertheless, there are inherent limitations in the understanding ability of DL models, which limit the potential of AI in real-world applications, especially in the face of complex, multimedia input. Despite tremendous progress in solving basic CV tasks, such as object detection and action recognition, state-of-the-art CV models can merely extract a partial summary of visual content, which lacks a comprehensive understanding of what happens in the scene. This is partly due to the oversimplified definition of CV tasks, which often ignore the compositional nature of semantics and scene structure. It is even less studied how to understand the content of multiple modalities, which requires processing visual and textual information in a holistic and coordinated manner, and extracting interconnected structures despite the semantic gap between the two modalities.
In this thesis, we argue that a key to improve the understanding capacity of DL models in visual and multimedia domains is to use structured, graph-based representations, to extract and convey semantic information more comprehensively. To this end, we explore a variety of ideas to define more realistic DL tasks in both visual and multimedia domains, and propose novel methods to solve those tasks by addressing several fundamental challenges, such as weak supervision, discovery and incorporation of commonsense knowledge, and scaling up vocabulary. More specifically, inspired by the rich literature of semantic graphs in Natural Language Processing (NLP), we explore innovative scene understanding tasks and methods that describe images using semantic graphs, which reflect the scene structure and interactions between objects. In the first part of this thesis, we present progress towards such graph-based scene understanding solutions, which are more accurate, need less supervision, and have more human-like common sense compared to the state of the art.
In the second part of this thesis, we extend our results on graph-based scene understanding to the multimedia domain, by incorporating the recent advances in NLP and CV, and developing a new task and method from the ground up, specialized for joint information extraction in the multimedia domain. We address the inherent semantic gap between visual content and text by creating high-level graph-based representations of images, and developing a multitask learning framework to establish a common, structured semantic space for representing both modalities. In the third part of this thesis, we explore another extension of our scene understanding methodology, to open-vocabulary settings, in order to make scene understanding methods more scalable and versatile. We develop visually grounded language models that use naturally supervised data to learn the meaning of all words, and transfer that knowledge to CV tasks such as object detection with little supervision. Collectively, the proposed solutions and empirical results set a new state of the art for the semantic comprehension of visual and multimedia content in a structured way, in terms of accuracy, efficiency, scalability, and robustness.
|
17 |
Development of an algorithmic method for the recognition of biological objectsBernier, Thomas. January 1997 (has links)
No description available.
|
18 |
A General Framework for Model Adaptation to Meet Practical Constraints in Computer VisionHuang, Shiyuan January 2024 (has links)
Recent advances in deep learning models have shown impressive capabilities in various computer vision tasks, which encourages the integration of these models into real-world vision systems such as smart devices. This integration presents new challenges as models need to meet complex real-world requirements. This thesis is dedicated to building practical deep learning models, where we focus on two main challenges in vision systems: data efficiency and variability. We address these issues by providing a general model adaptation framework that extends models with practical capabilities.
In the first part of the thesis, we explore model adaptation approaches for efficient representation. We illustrate the benefits of different types of efficient data representations, including compressed video modalities from video codecs, low-bit features and sparsified frames and texts. By using such efficient representation, the system complexity such as data storage, processing and computation can be greatly reduced. We systematically study various methods to extract, learn and utilize these representations, presenting new methods to adapt machine learning models for them. The proposed methods include a compressed-domain video recognition model with coarse-to-fine distillation training strategy, a task-specific feature compression framework for low-bit video-and-language understanding, and a learnable token sparsification approach for sparsifying human-interpretable video inputs. We demonstrate new perspectives of representing vision data in a more practical and efficient way in various applications.
The second part of the thesis focuses on open environment challenges, where we explore model adaptation for new, unseen classes and domains. We examine the practical limitations in current recognition models, and introduce various methods to empower models in addressing open recognition scenarios. This includes a negative envisioning framework for managing new classes and outliers, and a multi-domain translation approach for dealing with unseen domain data. Our study shows a promising trajectory towards models exhibiting the capability to navigate through diverse data environments in real-world applications.
|
19 |
Calculating degenerate structures via convex optimization with applications in computer vision and pattern recognition. / CUHK electronic theses & dissertations collectionJanuary 2012 (has links)
在諸多電腦視覺和模式識別的問題中,採集到的圖像和視頻資料通常是高維的。直接計算這些高維資料常常面臨計算可行性和穩定性等方面的困難。然而,現實世界中的資料通常由少數物理因素產生,因而本質上存在退化的結構。例如,它們可以用子空間、子空間的集合、流形或者分層流形等模型來描述。計算並運用這些內在退化結構不僅有助於深入理解問題的本質,而且能夠幫助解決實際應用中的難題。 / 隨著近些年凸優化理論和應用的發展,一些NP難題諸如低稚矩陣的計算和稀疏表示的問題已經有了近乎完美和高效的求解方法。本論文旨在研究如何應用這些技術來計算高維資料中的退化結構,並著重研究子空間和子空間的集合這兩種結構,以及它們在現實應用方面的意義。這些應用包括:人臉圖像的配准、背景分離以及自動植物辨別。 / 在人臉圖像配准的問題中,同一人臉在不同光照下的面部圖像經過逐圖元配准後應位於一個低維的子空間中。基於此假設,我們提出了一個新的圖像配准方法,能夠對某未知人臉的多副不同光照、表情和姿態下的圖像進行聯合配准,使得每一幅面部圖像的圖元與事先訓練的一般人臉模型相匹配。其基本思想是追尋一個低維的且位於一般人臉子空間附近的仿射子空間。相比于傳統的基於外觀模型的配准方法(例如主動外觀模型)依賴于準確的外觀模型的缺點,我們提出的方法僅需要一個一般人臉模型就可以很好地對該未知人臉的多副圖像進行聯合配准,即使該人臉與訓練該模型的樣本相差很大。實驗結果表明,該方法的配准精度在某些情況下接近于理想情形,即:當該目標人臉的模型事先已知時,傳統方法所能夠達到的配准精度。 / In a wide range of computer vision and pattern recognition problems, the captured images and videos often live in high-dimensional observation spaces. Directly computing them may suffer from computational infeasibility and numerical instability. On the other hand, the data in the real world are often generated due to limited number of physical causes, and thus embed degenerate structures in the nature. For instance, they can be modeled by a low-dimensional subspace, a union of subspaces, a manifold or even a manifold stratification. Discovering and harnessing such intrinsic structures not only brings semantic insight into the problems at hand, but also provides critical information to overcome challenges encountered in the practice. / Recent years have witnessed great development in both the theory and application of convex optimization. Efficient and elegant solutions have been found for NP-hard problems such as low-rank matrix recovery and sparse representation. In this thesis, we study the problem of discovering degenerate structures of high-¬dimensional inputs using these techniques. Especially we focus ourselves on low-dimensional subspaces and their unions, and address their application in overcoming the challenges encoun-tered under three practical scenarios: face image alignment, background subtraction and automatic plant identification. / In facial image alignment, we propose a method that jointly brings multiple images of an unseen face into alignment with a pre-trained generic appearance model despite different poses, expressions and illumination conditions of the face in the images. The idea is to pursue an intrinsic affine subspace of the target face that is low-dimensional while at the same time lies close to the generic subspace. Compared with conventional appearance-based methods that rely on accurate appearance mod-els, ours works well with only a generic one and performs much better on unseen faces even if they significantly differ from those for training the generic model. The result is approximately good as that in an idealistic case where a specific model for the target face is provided. / For background subtraction, we propose a background model that captures the changes caused by the background switching among a few configurations, like traffic lights statuses. The background is modeled as a union of low-dimensional subspaces, each characterizing one configuration of the background, and the proposed algorithm automatically switches among them and identifies violating elements as foreground pixels. Moreover, we propose a robust learning approach that can work with foreground-present training samples at the background modeling stage it builds a correct background model with outlying foreground pixels automatically pruned out. This is practically important when foreground-free training samples are difficult to obtain in scenarios such as traffic monitoring. / For automatic plant identification, we propose a novel and practical method that recognizes plants based on leaf shapes extracted from photographs. Different from existing studies that are mostly focused on simple leaves, the proposed method is de-signed to recognize both simple and compound leaves. The key to that is, instead of either measuring geometric features or matching shape features as in conventional methods, we describe leaves by counting on them the numbers of certain shape patterns. The patterns are learned in a way that they form a degenerate polytope (a spe-cial union of affine subspaces) in the feature space, and can simulate, to some extent, the "keys" used by botanists - each pattern reflects a common feature of several dif-ferent species and all the patterns together can form a discriminative rule for recog-nition. Experiments conducted on a variety of datasets show that our algorithm sig-nificantly outperforms the state-of-art methods in terms of recognition accuracy, ef-ficiency and storage, and thus has a good promise for practicing. / In conclusion, our performed studies show that: 1) the visual data with semantic meanings are often not random - although they can be high-dimensional, they typically embed degenerate structures in the observation space. 2) With appropriate assumptions made and clever computational tools developed, these structures can be efficiently and stably calculated. 3) The employment of these intrinsic structures helps overcoming practical challenges and is critical for computer vision and pattern recognition algorithms to achieve good performance. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / 在背景分離的問題中,靜態場景在不同光照情形下的背景可以被描述為一個線性子空間。然而在實際應用中,背景的局部和突然的變化有可能違背此假設,尤其是當背景在幾個狀態之間切換的情形下,例如交通燈在不同組合狀態之間切換。為了解決該問題,本論文中提出了一個新的背景模型,它將背景描述為一些子空間的集合,每個子空間對應一個背景狀態。我們將背景分離的問題轉化為稀疏逼近的問題,因此演算法能夠自動在多個狀態中切換並成功檢測出前景物體。此外,本論文提出了一個魯棒的字典學習方法。在訓練背景模型的過程中,它能夠處理含有前景物體的圖像,並在訓練過程中自動將前景部分去掉。這個優點在難以收集完整背景訓練樣本的應用情形(譬如交通監視等)下有明顯的優勢。 / 在植物種類自動辨別的問題中,本論文中提出了一個新的有效方法,它通過提取和對比植物葉片的輪廓對植物進行識別和分類。不同于傳統的基於測量幾何特徵或者在形狀特徵之間配對的方法,我們提出使用葉子上某些外形模式的數量來表達樹葉。這些模式在特徵空間中形成一個退化的多面體結構(一種特殊的仿射空間的集合),而且在某種程度上能夠類比植物學中使用的分類檢索表每個模式都反映了一些不同植物的某個共性,例如某種邊緣、某種形狀、某種子葉的佈局等等;而所有模式組合在一起能夠形成具有很高區分度的分類準則。通過對演算法在四個數據庫上的測試,我們發現本論文提出的方法無論在識別精度還是在效率和存儲方面都相比于目前主流方法有顯著提高,因此具有很好的應用性。 / 總之,我們進行的一些列研究說明:(1) 有意義的視覺資料通常是內在相關的,儘管它們的維度可能很高,但是它們通常都具有某種退化的結構。(2) 合理的假設和運用計算工具可以高效、穩健地發現這些結構。(3) 利用這些結構有助於解決實際應用中的難題,且能夠使得電腦視覺和模式識別演算法達到好的性能。 / Zhao, Cong. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 107-121). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Dedication --- p.i / Acknowledgements --- p.ii / Abstract --- p.v / Abstract (in Chinese) --- p.viii / Publication List --- p.xi / Nomenclature --- p.xii / Contents --- p.xiv / List of Figures --- p.xviii / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Background --- p.2 / Chapter 1.2.1 --- Subspaces --- p.3 / Chapter 1.2.2 --- Unions of Subspaces --- p.6 / Chapter 1.2.3 --- Manifolds and Stratifications --- p.8 / Chapter 1.3 --- Thesis Outline --- p.10 / Chapter Chapter 2 --- Joint Face Image Alignment --- p.13 / Chapter 2.1 --- Introduction --- p.14 / Chapter 2.2 --- Related Works --- p.16 / Chapter 2.3 --- Background --- p.18 / Chapter 2.3.1 --- Active Appearance Model --- p.18 / Chapter 2.3.2 --- Multi-Image Alignment using AAM --- p.20 / Chapter 2.3.3 --- Limitations in Practice --- p.21 / Chapter 2.4 --- The Proposed Method --- p.23 / Chapter 2.4.1 --- Two Important Assumptions --- p.23 / Chapter 2.4.2 --- The Subspace Pursuit Problem --- p.27 / Chapter 2.4.3 --- Reformulation --- p.27 / Chapter 2.4.4 --- Efficient Solution --- p.30 / Chapter 2.4.5 --- Discussions --- p.32 / Chapter 2.5 --- Experiments --- p.34 / Chapter 2.5.1 --- Settings --- p.34 / Chapter 2.5.2 --- Results and Discussions --- p.36 / Chapter 2.6 --- Summary --- p.38 / Chapter Chapter 3 --- Background Subtraction --- p.40 / Chapter 3.1 --- Introduction --- p.41 / Chapter 3.2 --- Related Works --- p.43 / Chapter 3.3 --- The Proposed Method --- p.48 / Chapter 3.3.1 --- Background Modeling --- p.48 / Chapter 3.3.2 --- Background Subtraction --- p.49 / Chapter 3.3.3 --- Foreground Object Detection --- p.52 / Chapter 3.3.4 --- Background Modeling by Dictionary Learning --- p.53 / Chapter 3.4 --- Robust Dictionary Learning --- p.54 / Chapter 3.4.1 --- Robust Sparse Coding --- p.56 / Chapter 3.4.2 --- Robust Dictionary Update --- p.57 / Chapter 3.5 --- Experimentation --- p.59 / Chapter 3.5.1 --- Local and Sudden Changes --- p.59 / Chapter 3.5.2 --- Non-structured High-frequency Changes --- p.62 / Chapter 3.5.3 --- Discussions --- p.65 / Chapter 3.6 --- Summary --- p.66 / Chapter Chapter 4 --- Plant Identification using Leaves --- p.67 / Chapter 4.1 --- Introduction --- p.68 / Chapter 4.2 --- Related Works --- p.70 / Chapter 4.3 --- Review of IDSC Feature --- p.71 / Chapter 4.4 --- The Proposed Method --- p.73 / Chapter 4.4.1 --- Independent-IDSC Feature --- p.75 / Chapter 4.4.2 --- Common Shape Patterns --- p.77 / Chapter 4.4.3 --- Leaf Representation by Counts --- p.80 / Chapter 4.4.4 --- Leaf Recognition by NN Classifier --- p.82 / Chapter 4.5 --- Experiments --- p.82 / Chapter 4.5.1 --- Settings --- p.82 / Chapter 4.5.2 --- Performance --- p.83 / Chapter 4.5.3 --- Shared Dictionaries v.s. Shared Features --- p.88 / Chapter 4.5.4 --- Pooling --- p.89 / Chapter 4.6 --- Discussions --- p.90 / Chapter 4.6.1 --- Time Complexity --- p.90 / Chapter 4.6.2 --- Space Complexity --- p.91 / Chapter 4.6.3 --- System Description --- p.92 / Chapter 4.7 --- Summary --- p.92 / Chapter 4.8 --- Acknowledgement --- p.94 / Chapter Chapter 5 --- Conclusion and Future Work --- p.95 / Chapter 5.1 --- Thesis Contributions --- p.95 / Chapter 5.2 --- Future Work --- p.97 / Chapter 5.2.1 --- Theory Side --- p.98 / Chapter 5.2.2 --- Practice Side --- p.98 / Chapter Appendix-I --- Joint Face Alignment Results --- p.100 / Bibliography --- p.107
|
20 |
Feature based object rendering from sparse views. / CUHK electronic theses & dissertations collectionJanuary 2011 (has links)
The first part of this thesis presents a convenient and flexible calibration method to estimate the relative rotation and translation among multiple cameras. A simple planar pattern is used for accurate calibration and is not required to be simultaneously observed by all cameras. Thus the method is especially suitable for widely spaced camera array. In order to fairly evaluate the calibration results for different camera setups, a novel accuracy metric is introduced based on the deflection angles of projection rays, which is insensitive to a number of setup factors. / The objective of this thesis is to develop a multiview system that can synthesize photorealistic novel views of the scene captured by sparse cameras distributed in a wide area. The system cost is largely reduced due to the small number of required cameras, and the image capture is greatly facilitated because the cameras are allowed to be widely spaced and flexibly placed. The key techniques to achieve this goal are investigated in this thesis. / Cui, Chunhui. / "November 2010." / Adviser: Ngan King Ngi. / Source: Dissertation Abstracts International, Volume: 73-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 140-155). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
|
Page generated in 0.148 seconds