Spelling suggestions: "subject:"computer disision"" "subject:"computer decisision""
661 |
Orientation and recognition of both noisy and partially occluded 3-D objects from single 2-D imagesIlling, Diane Patricia January 1990 (has links)
This work is concerned with the problem of 3-D object recognition and orientation determination from single 2-D image frames in which objects may be noisy, partially occluded or both. Global descriptors of shape such as moments and Fourier descriptors rely on the whole shape being present. If part of a shape is missing then all of the descriptors will be affected. Consequently, such approaches are not suitable when objects are partially occluded, as results presented here show. Local methods of describing shape, where distortion of part of the object affects only the descriptors associated with that particular region, and nowhere else, are more likely to provide a successful solution to the problem. One such method is to locate points of maximum curvature on object boundaries. These are commonly believed to be the most perceptually significant points on digital curves. However, results presented in this thesis will show that estimators of point curvature become highly unreliable in the presence of noise. Rather than attempting to locate such high curvature points directly, an approach is presented which searches for boundary segments which exhibit significant linearity; curvature discontinuities are then assigned to the junctions between boundary segments. The resulting object descriptions are more stable in the presence of noise. Object orientation and recognition is achieved through a directed search and comparison to a database of similar 2-D model descriptions stored at various object orientations. Each comparison of sensed and model data is realised through a 2-D pose-clustering procedure, solving for the coordinate transformation which maps model features onto image features. Object features are used both to control the amount of computation and to direct the search of the database. In conditions of noise and occlusion objects can be recognised and their orientation determined to within less than 7 degrees of arc, on average.
|
662 |
Use of multiple views for human pose estimation. / CUHK electronic theses & dissertations collectionJanuary 2012 (has links)
人體姿態估計系統是用於從視頻圖像中判斷人體在空間中姿態的系統。該系統面臨的主要的問題有:人體姿態空間的維度高;人體四肢的深度信息不確定;人體可以穿多種衣服;人體經常會被自身遮擋。多攝像頭系統可以觀察到人體同一姿態的更多數據,因此可以有效的克服人體姿態估計的不確定性。在本研究中,我們採用多種方法研究多攝像頭人體姿態估計系統,並提出了一種融合多種約束的框架。 / 在多攝像頭系統中,可以用到的約束包括:(一)圖像觀測約束:估計的人體姿態投影到圖像中需要和所有視角的觀察一致,(二)人體姿態可行性約束:人體部位之間要滿足身體連接約束並且估計所得的人體姿態要符合真實人體的要求,(三)三維剛體約束:從不同視角觀察到的人體要保持空間一致性,(四)行為約束:人體的姿態應與先驗的行為信息保持一致。本研究的目標是開發出一個可以同時利用上述約束的多攝像頭系統,該系統將可以同時無縫的整合多個攝像頭,並且可以穩定有效的估計人體的三維姿態。本文研究了基於單目系統的三維人體姿態估計方法,並基於約束一和約束二提出了一個新的人體模型估計人體姿態;本文提出仿射立體投影模型,並將該模型用於整合多個視角的觀察數據,從而使姿態估計同時得到約束一,約束二和約束三的支持;本文展示了如何使用多視角行為流形庫同時應用以上提到的四種約束,並有效的估計三維人體姿態;最後我們提出了基於流形庫的部分輸入高斯過程處理人體姿態估計腫的遮擋問題。 / 本論文有以下貢獻:(1)首次提出了仿射立體投影模型並將其用於描述三維剛體約束。使用這種方法,可以方便的將三維剛體約束集成於由底向上的人體姿態估計框架。(2)將人體姿態可行性約束以及三維剛體約束同時集成於多視角流型庫。即使在多行為的環境中,該方法也可以直接把多視角觀察數據映射至人體姿態空間。(3)通過綜合分析多個視角的數據,該系統可以有效的克服自我遮擋問題。(4)該系統易於擴展,基於仿射立體投影模型的方法和基於多視角流形庫的方法都可以用在多於三個攝像頭的系統中。 / A human pose estimation system is to determine the full human body pose in space from merely video data. Key difficulties of this problem include: full body kinematics is of high dimensionality, limb depths are ambiguous, people can impose various clothes, and there are often self-occlusions. The use of multiple views could enhance robustness of the solution toward uncertainties, as more data are collected about the same pose. In this research, we study multi-view based human pose estimation by exploring a variety of approaches and propose a framework that integrates multiple constraints. / In a multiple view system, the constraints that could be applied for human pose estimation include: (1) Image evidence: the projection of the estimated 3D human body should satisfy the 2D observations in all views, (2) Feasible human pose: neighboring body parts should be connected according to the body articulation and all joints angles should stay feasible, (3) 3D object rigidity: the corresponding parts over all views should satisfy the multi-view consistency, and (4) Action context: the detected results should be in line with prior knowledge about the possible “activities“. The objective of this research is to develop a multiple view system that could embed all the above constraints in a natural way while integrate more cameras into the system seamlessly to enhance robustness. Specifically, we investigate the part based monocular 3D estimation algorithm and develop a novel human model to assist the pose inference based on the constraint (1) and (2); we propose an affine stereo model to associate multiple views’ data so that body pose inference is supported by constraint (1), (2) and (3) simultaneously; we present how to apply multi-view activity manifold library to associate multiple views and estimate human pose in 3D efficiently so that all the four constraints are integrated into one framework; and we finally propose a partial-input Gaussian process to handle the body occlusion problem within the manifold library framework. / The thesis has four contributions: (1), an affine stereo approach is developed to efficiently explore the object rigidity, and this constraint is integrated into a bottom-up framework smoothly. (2), a multi-view visual manifold library is proposed to capture the human body articulation and rigidity in the multi-activity context, simplifying the pose estimation into a direct mapping from multi-view image evidence to 3D pose. (3), the multi-view system efficiently solves the self-occlusion problem by analyzing multi-view’s data. (4), the multi-view system is designed to be scalable; both the affine stereo based approach and the multi-view visual manifold library based approach could be applied to systems with more than 3 cameras. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Wang, Zibin. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 144-150). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / ABSTRACT --- p.i / 摘要 --- p.iii / ACKNOWLEDGEMENTS --- p.v / TABLE OF CONTENTS --- p.vi / LIST OF FIGURES --- p.xi / LIST OF TABLES --- p.xviii / Chapter Chapter One : --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Goals --- p.3 / Chapter 1.3 --- Challenges --- p.3 / Chapter 1.3.1 --- High Dimensional State Space --- p.3 / Chapter 1.3.2 --- Observations --- p.4 / Chapter 1.3.3 --- Multiple Views Integration --- p.6 / Chapter 1.4 --- Summary of the Approach --- p.7 / Chapter 1.5 --- Thesis Overview --- p.9 / Chapter Chapter Two : --- Background --- p.10 / Chapter 2.1 --- Top-down Framework --- p.11 / Chapter 2.1.1 --- Background Subtraction --- p.11 / Chapter 2.1.2 --- Deterministic Approach --- p.13 / Chapter 2.1.3 --- Sampling based Approach --- p.14 / Chapter 2.1.4 --- Regression based Method --- p.16 / Chapter 2.2 --- Bottom-up Framework --- p.17 / Chapter 2.2.1 --- Efficient Pictorial Structure --- p.18 / Chapter 2.2.2 --- Discriminative Part Detector --- p.19 / Chapter 2.2.3 --- Sampling based Inference --- p.20 / Chapter 2.2.4 --- Temporal Information --- p.21 / Chapter 2.3 --- Human Pose Estimation using Range Sensor --- p.21 / Chapter 2.4 --- Conclusion --- p.22 / Chapter Chapter Three : --- Pose Estimation from Single View --- p.23 / Chapter 3.1 --- Related Works --- p.25 / Chapter 3.2 --- The 3D Human Model --- p.26 / Chapter 3.3 --- Acquiring the Appearance Facet --- p.29 / Chapter 3.3.1 --- 2D Appearance Extraction from Each Training Image --- p.30 / Chapter 3.3.2 --- Acquiring 3D Appearance --- p.31 / Chapter 3.4 --- Data Driven Belief Propagation for Pose Estimation --- p.32 / Chapter 3.4.1 --- A Bayesian Formulation --- p.32 / Chapter 3.4.2 --- Belief Propagation --- p.34 / Chapter 3.4.3 --- Importance Function Sampling --- p.37 / Chapter 3.5 --- Experimental Results --- p.40 / Chapter 3.6 --- Conclusion --- p.45 / Chapter Chapter Four : --- Integrating Multiple Views using Affine Stereo Model --- p.46 / Chapter 4.1 --- Related Works --- p.48 / Chapter 4.2 --- Human Model and Problem Formulation --- p.50 / Chapter 4.3 --- Associating Multiple Image Streams --- p.53 / Chapter 4.3.1 --- Linear Relation of Multiple Views --- p.54 / Chapter 4.3.2 --- Rank Constraint --- p.58 / Chapter 4.4 --- Human Pose Estimation System using Multi-view and Other Constraints --- p.62 / Chapter 4.4.1 --- Body Part Candidates from Discriminative Body Part Detector --- p.63 / Chapter 4.4.2 --- From Body Part Candidates to Body Candidates in each view --- p.65 / Chapter 4.4.3 --- Associating Body Candidates across Views --- p.67 / Chapter 4.5 --- Experimental Results --- p.74 / Chapter 4.5.1 --- Evaluation of the Multi-view Linear Relationship --- p.74 / Chapter 4.5.2 --- Performance over the HumanEva Dataset --- p.79 / Chapter 4.6 --- Conclusion --- p.86 / Chapter Chapter Five : --- Integrating Multiple Views using Activity Manifold Library --- p.88 / Chapter 5.1 --- Related Works --- p.90 / Chapter 5.2 --- Multi-view Manifold Library --- p.93 / Chapter 5.2.1 --- Body Representation in Space and Views --- p.94 / Chapter 5.2.2 --- Human-orientation-dependent Multi-view Visual Manifold --- p.95 / Chapter 5.3 --- Human Pose Estimation in 3D via Multi-view Manifold --- p.97 / Chapter 5.3.1 --- Find Multi-view Body Hypothesis in 2D --- p.97 / Chapter 5.3.2 --- Mutual Selection between Multi-view Body Hypothesises and Manifolds --- p.99 / Chapter 5.4 --- Experimental Results --- p.102 / Chapter 5.4.1 --- Synthetic Data Test --- p.103 / Chapter 5.4.2 --- Real Image Evaluation --- p.108 / Chapter 5.4.3 --- Qualitative Test for Generalization Capability --- p.110 / Chapter 5.4.4 --- Calculation Speed --- p.114 / Chapter 5.5 --- Conclusion --- p.115 / Chapter Chapter Six : --- Partial-Input Gaussian Process for Inferring Occluded Human Pose --- p.116 / Chapter 6.1 --- Related Works --- p.118 / Chapter 6.2 --- Human-orientation-invariant Multi-view Visual Manifold --- p.119 / Chapter 6.3 --- Human Pose estimation in 3D via Multi-view Manifold --- p.121 / Chapter 6.3.1 --- 2D Pre-processing --- p.121 / Chapter 6.3.2 --- Mutual Selection between Multi-view Body Hypothesises and Manifolds --- p.121 / Chapter 6.3.3 --- Occlusion Detection and Partial-input Gaussian Process --- p.122 / Chapter 6.4 --- Experimental Results --- p.126 / Chapter 6.4.1 --- Multi-view Manifolds and Evaluations for Different Views --- p.126 / Chapter 6.4.2 --- Evaluation for Occlusion Data --- p.131 / Chapter 6.4.3 --- Evaluation for Gavrila’s Dataset --- p.132 / Chapter 6.4.4 --- Qualitative Test for Generalization Capability --- p.134 / Chapter 6.5 --- Conclusion --- p.139 / Chapter Chapter Seven : --- Conclusions and Future Works --- p.140 / Chapter 7.1 --- Conclusion --- p.140 / Chapter 7.2 --- Limitation --- p.142 / Chapter 7.3 --- Future Directions --- p.142 / Bibliography --- p.144
|
663 |
Binocular geometry and camera motion directly from normal flows. / CUHK electronic theses & dissertations collectionJanuary 2009 (has links)
Active vision systems are about mobile platform equipped with one or more than one cameras. They perceive what happens in their surroundings from the image streams the cameras grab. Such systems have a few fundamental tasks to tackle---they need to determine from time to time what their motion in space is, and should they have multiple cameras, they need to know how the cameras are relatively positioned so that visual information collected by the respective cameras can be related. In the simplest form, the tasks are about finding the motion of a camera, and finding the relative geometry of every two cameras, from the image streams the cameras collect. / On determining the ego-motion of a camera, there have been many previous works as well. However, again, most of the works require to track distinct features in the image stream or to infer the full optical flow field from the normal flow field. Different from the traditional works, utilizing no motion correspondence nor the epipolar geometry, a new method is developed that operates again on the normal flow data directly. The method has a number of features. It can employ the use of every normal flow data, thus requiring less texture from the image scene. A novel formulation of what the normal flow direction at an image position has to offer on the camera motion is given, and this formulation allows a locus of the possible camera motion be outlined from every data point. With enough data points or normal flows over the image domain, a simple voting scheme would allow the various loci intersect and pinpoint the camera motion. / On determining the relative geometry of two cameras, there already exist a number of calibration techniques in the literature. They are based on the presence of either some specific calibration objects in the imaged scene, or a portion of the scene that is observable by both cameras. However, in active vision, because of the "active" nature of the cameras, it could happen that a camera pair do not share much or anything in common in their visual fields. In the first part of this thesis, we propose a new solution method to the problem. The method demands image data under a rigid motion of the camera pair, but unlike the existing motion correspondence-based calibration methods it does not estimate the optical flows or motion correspondences explicitly. Instead it estimates the inter-camera geometry from the monocular normal flows. Moreover, we propose a strategy on selecting optimal groups of normal flow vectors to improve the accuracy and efficiency of the estimation. / The relative motion between a camera and the imaged environment generally induces a flow field in the image stream captured by the camera. The flow field, which is about motion correspondences of the various image positions over the image frames, is referred to as the optical flows in the literature. If the optical flow field of every camera can be made available, the motion of a camera can be readily determined, and so can the relative geometry of two cameras. However, due to the well-known aperture problem, directly observable at any image position is generally not the full optical flow, but only the component of it that is normal to the iso-brightness contour of the intensity profile at the position. The component is widely referred to as the normal flow. It is not impossible to infer the full flow field from the normal flow field, but then it requires some specific assumptions about the imaged scene, like it is smooth almost everywhere etc. / This thesis aims at exploring how the above two fundamental tasks can be tackled by operating on the normal flow field directly. The objective is, without the full flow inferred explicitly in the process, and in turn no specific assumption made about the imaged scene, the developed methods can be applicable to a wider set of scenes. The thesis consists of two parts. The first part is about how the inter-camera geometry of two cameras can be determined from the two monocular normal flow fields. The second part is about how a camera's ego-motion can be determined by examining only the normal flows the camera observes. / We have tested the methods on both synthetic image data and real image sequences. Experimental results show that the developed methods are effective in determining inter-camera geometry and camera motion from normal flow fields. / Yuan, Ding. / Adviser: Ronald Chung. / Source: Dissertation Abstracts International, Volume: 70-09, Section: B, page: . / Thesis submitted in: October 2008. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 121-131). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307.
|
664 |
Robust stereo motion and structure estimation scheme. / CUHK electronic theses & dissertations collectionJanuary 2006 (has links)
Another important contribution of this thesis is that we propose another novel and highly robust estimator: Kernel Density Estimation Sample Consensus (KDESAC) which employs Random Sample Consensus algorithm combined with Kernel Density Estimation (KDE). The main advantage of KDESAC is that no prior information and no scale estimators are required in the estimation of the parameters. The computational load of KDESAC is much lower than the robust algorithms which estimate the scale in every sample loop. The experiments on synthetic data show that the proposed method is more robust to the heavily corrupted data than other algorithms. KDESAC can tolerate more than 80% outliers and multiple structures. Although Adaptive Scale Sample Consensus (ASSC) can obtain such good performance as KDESAC, ASSC is much slower than KDESAC. KDESAC is also applied to SFM problem and multi-motion estimation with real data. The experiments demonstrate that KDESAC is robust and efficient. / Structure from motion (SFM), the problem of estimating 3D structure from 2D images hereof, is one of the most popular and well studied problems within computer vision. This thesis is a study within the area of SFM. The main objective of this work is to improve the robustness of the SFM algorithm so as to make it capable of tolerating a great number of outliers in the correspondences. For improving the robustness, a stereo image sequence is processed, so the random sampling algorithms can be employed in the structure and motion estimation. With this strategy, we employ Random Sample Consensus (RANSAC) in motion and structure estimation to exclude outliers. Since the RANSAC method needs the prior information about the scale of the inliers, we proposed an auto-scale RANSAC algorithm which determines the inliers by analyzing the probability density of the residuals. The experimental results demonstrate that SFM by the proposed auto-scale RANSAC is more robust and accurate than that by RANSAC. / Chan Tai. / "September 2006." / Adviser: Yun Hui Liu. / Source: Dissertation Abstracts International, Volume: 68-03, Section: B, page: 1716. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2006. / Includes bibliographical references (p. 113-120). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307.
|
665 |
An active vision system for tracking and mosaicking on UAV.January 2011 (has links)
Lin, Kai Wun. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 120-127). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview of the UAV Project --- p.1 / Chapter 1.2 --- Challenges on Vision System for UAV --- p.2 / Chapter 1.3 --- Contributions of this Work --- p.4 / Chapter 1.4 --- Organization of Thesis --- p.6 / Chapter 2 --- Image Sensor Selection and Evaluation --- p.8 / Chapter 2.1 --- Image Sensor Overview --- p.8 / Chapter 2.1.1 --- Comparing Sensor Features and Performance --- p.9 / Chapter 2.1.2 --- Rolling Shutter vsGlobal Shutter --- p.10 / Chapter 2.2 --- Sensor Evaluation through USB Peripheral --- p.11 / Chapter 2.2.1 --- Interfacing Image Sensor and USB Controller --- p.12 / Chapter 2.2.2 --- Image Sensor Configuration --- p.14 / Chapter 2.3 --- Image Data Transmitting and Processing --- p.17 / Chapter 2.3.1 --- Data Transfer Mode and Buffering on USB Controller --- p.18 / Chapter 2.3.2 --- Demosaicking of Bayer Image Data --- p.20 / Chapter 2.4 --- Splitting Images and Exposure Problem --- p.22 / Chapter 2.4.1 --- Buffer Overflow on USB Controller --- p.22 / Chapter 2.4.2 --- Image Luminance and Exposure Adjustment --- p.24 / Chapter 3 --- Embedded System for Vision Processing --- p.26 / Chapter 3.1 --- Overview of the Embedded System --- p.26 / Chapter 3.1.1 --- TI OMAP3530 Processor --- p.27 / Chapter 3.1.2 --- Gumstix Overo Fire Computer-on-Module --- p.27 / Chapter 3.2 --- Interfacing Camera Module to the Embedded System --- p.28 / Chapter 3.2.1 --- Image Signal Processing Subsystem --- p.29 / Chapter 3.2.2 --- Camera Module Adapting Board --- p.30 / Chapter 3.2.3 --- Image Sensor Driver and Program Development --- p.31 / Chapter 3.3 --- View-stabilizing Biaxial Camera Platform --- p.34 / Chapter 3.3.1 --- The New Camera System iv --- p.35 / Chapter 3.3.2 --- View-stabilizing Pan-tilt Platform --- p.41 / Chapter 3.4 --- Overall System Architecture and UAV Integration --- p.46 / Chapter 4 --- Target Tracking and Geo-locating --- p.50 / Chapter 4.1 --- Camera Calibration --- p.51 / Chapter 4.1.1 --- The Perspective Camera Model --- p.51 / Chapter 4.1.2 --- Camera Lens Distortions --- p.53 / Chapter 4.1.3 --- Calibration Toolbox and Results --- p.54 / Chapter 4.2 --- Selection of Object Features and Trackers --- p.56 / Chapter 4.2.1 --- Harris Corner Detection --- p.58 / Chapter 4.2.2 --- Color Histogram --- p.59 / Chapter 4.2.3 --- KLT and Mean-shift Tracker --- p.59 / Chapter 4.3 --- Target Auto-centering --- p.64 / Chapter 4.3.1 --- Formulation of the PID Controller --- p.65 / Chapter 4.3.2 --- Control Gain Settings and Tuning --- p.69 / Chapter 4.4 --- Geo-locating of Tracked Target --- p.69 / Chapter 4.4.1 --- Coordinate Frame Transformation --- p.70 / Chapter 4.4.2 --- Depth Estimation and Target Locating --- p.74 / Chapter 4.5 --- Results and Discussion --- p.77 / Chapter 5 --- Real-time Aerial Mosaic Building --- p.89 / Chapter 5.1 --- Motion Model Selection --- p.90 / Chapter 5.1.1 --- Planar Perspective Motion Model --- p.90 / Chapter 5.2 --- Feature-based Image Alignment --- p.91 / Chapter 5.2.1 --- Image Preprocessing --- p.91 / Chapter 5.2.2 --- Feature Extraction and Matching --- p.92 / Chapter 5.2.3 --- Image Alignment using RANSAC Algorithm --- p.94 / Chapter 5.3 --- Image Composition --- p.95 / Chapter 5.3.1 --- Image Blending with Distance Map --- p.96 / Chapter 5.3.2 --- Overall Stitching Process --- p.98 / Chapter 5.4 --- Mosaic Simulation using Google Earth --- p.99 / Chapter 5.5 --- Results and Discussion --- p.100 / Chapter 6 --- Conclusion and Further Work --- p.108 / Chapter A --- System Schematics --- p.111 / Chapter B --- Image Sensor Sensitivity --- p.118 / Bibliography --- p.120
|
666 |
Transferring a generic pedestrian detector towards specific scenes.January 2012 (has links)
近年來,在公開的大規模人工標注數據集上訓練通用行人檢測器的方法有了顯著的進步。然而,當通用行人檢測器被應用到一個特定的,未公開過的場景中時,它的性能會不如預期。這是由待檢測的數據(源樣本)與訓練數據(目標樣本)的不匹配,以及新場景中視角、光照、分辨率和背景噪音的變化擾動造成的。 / 在本論文中,我們提出一個新的自動將通用行人檢測器適應到特定場景中的框架。這個框架分為兩個階段。在第一階段,我們探索監控錄像場景中提供的特定表征。利用這些表征,從目標場景中選擇正負樣本並重新訓練行人檢測器,該過程不斷迭代直至收斂。在第二階段,我們提出一個新的機器學習框架,該框架綜合每個樣本的標簽和比重。根據這些比重,源樣本和目標樣本被重新權重,以優化最終的分類器。這兩種方法都屬於半監督學習,僅僅需要非常少的人工干預。 / 使用提出的方法可以顯著提高通用行人檢測器的准確性。實驗顯示,由方法訓練出來的檢測器可以和使用大量手工標注的目標場景數據訓練出來的媲美。與其它解決類似問題的方法比較,該方法同樣好於許多已有方法。 / 本論文的工作已經分別於朲朱朱年和朲朱朲年在杉杅杅杅計算機視覺和模式識別會議(权杖材杒)中發表。 / In recent years, significant progress has been made in learning generic pedestrian detectors from publicly available manually labeled large scale training datasets. However, when a generic pedestrian detector is applied to a specific, previously undisclosed scene where the testing data (target examples) does not match with the training data (source examples) because of variations of viewpoints, resolutions, illuminations and backgrounds, its accuracy may decrease greatly. / In this thesis, a new framework is proposed automatically adapting a pre-trained generic pedestrian detector to a specific traffic scene. The framework is two-phased. In the first phase, scene-specific cues in the video surveillance sequence are explored. Utilizing the multi-cue information, both condent positive and negative examples from the target scene are selected to re-train the detector iteratively. In the second phase, a new machine learning framework is proposed, incorporating not only example labels but also example confidences. Source and target examples are re-weighted according to their confidence, optimizing the performance of the final classifier. Both methods belong to semi-supervised learning and require very little human intervention. / The proposed approaches significantly improve the accuracy of the generic pedestrian detector. Their results are comparable with the detector trained using a large number of manually labeled frames from the target scene. Comparison with other existing approaches tackling similar problems shows that the proposed approaches outperform many contemporary methods. / The works have been published on the IEEE Conference on Computer Vision and Pattern Recognition in 2011 and 2012, respectively. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Wang, Meng. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 42-45). / Abstracts also in Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- PedestrianDetection --- p.1 / Chapter 1.1.1 --- Overview --- p.1 / Chapter 1.1.2 --- StatisticalLearning --- p.1 / Chapter 1.1.3 --- ObjectRepresentation --- p.2 / Chapter 1.1.4 --- SupervisedStatisticalLearninginObjectDetection --- p.3 / Chapter 1.2 --- PedestrianDetectioninVideoSurveillance --- p.4 / Chapter 1.2.1 --- ProblemSetting --- p.4 / Chapter 1.2.2 --- Challenges --- p.4 / Chapter 1.2.3 --- MotivationsandContributions --- p.5 / Chapter 1.3 --- RelatedWork --- p.6 / Chapter 1.4 --- OrganizationsofChapters --- p.9 / Chapter 2 --- Label Inferring by Multi-Cues --- p.10 / Chapter 2.1 --- DataSet --- p.10 / Chapter 2.2 --- Method --- p.12 / Chapter 2.2.1 --- CondentPositiveExamplesofPedestrians --- p.13 / Chapter 2.2.2 --- CondentNegativeExamplesfromtheBackground --- p.17 / Chapter 2.2.3 --- CondentNegativeExamplesfromVehicles --- p.17 / Chapter 2.2.4 --- FinalSceneSpecicPedestrianDetector --- p.19 / Chapter 2.3 --- ExperimentResults --- p.20 / Chapter 3 --- Transferring a Detector by Condence Propagation --- p.24 / Chapter 3.1 --- Method --- p.25 / Chapter 3.1.1 --- Overview --- p.25 / Chapter 3.1.2 --- InitialEstimationofCondenceScores --- p.27 / Chapter 3.1.3 --- Re-weightingSourceSamples --- p.27 / Chapter 3.1.4 --- Condence-EncodedSVM --- p.30 / Chapter 3.2 --- Experiments --- p.33 / Chapter 3.2.1 --- Datasets --- p.33 / Chapter 3.2.2 --- ParameterSetting --- p.35 / Chapter 3.2.3 --- Results --- p.36 / Chapter 4 --- Conclusions and Future Work --- p.40
|
667 |
Hardware acceleration for a projector-camera system.January 2012 (has links)
投影機相機(projector camera)系統近年相當流行,主要原因是它能夠靈活地展示影像,使用戶有更大的自由度作出操作。手提式投影機的技術在過往幾年急速發展、漸見成熟,知名的家用電子産品生産廠閱始推出内置迷你投影機的手機和攝影機。另一方面手機的運算能力正急劇地提升,它們多都配置不同種類且功能强大的周邊設備。 / 本論文提出並討論一種基於現場可编程邏輯閘陣列(Field Programmable Gate Array, FPGA),並適用於嵌入式系统的特殊處理器。該特殊處理器專門處理來自相機的資料串流,透過一系列的象素圖像處理運算如圖像梯度和高斯模糊,去找出相中物件的邊緣,藉此分擔微處器在運算上的負擔。實驗結果明這特殊處理器可實現於低端的FPGA上並和普遍的微處器一起運作。 / 本論文第二個探討的主題是一個利用多模卡爾曼濾波器(Multiple Model Kalman Filter)的直線追踪器,並利用多個直線追踪器去作投影面板的追踪。利用卡爾曼濾波器只需要很低的運算能力的優點,我們的直線追踪器在嵌入式系统實測時能達到每秒200幀的速度。多模卡爾曼濾波器在實驗中有滿意的成績並較單卡爾曼濾波器和擴展卡爾曼濾波器優異。 / Projector-camera (ProCam in short) systems are getting very popular since the user can change the display area dynamically and enjoy more freedom in handling the device. In recent years, the mobile projector technology is becoming mature and manufacturers are shipping mobile phones and digital cameras with projectors. On the other hand, the computation power of a cell phone had dramatically increased and the cell phones are accompanied with large number of powerful peripherals. / In this thesis, the possibility of making an embedded Projector-camera (ProCam) system is investigated. A ProCam system is developed by our research group previously and designed for desktop Personal Computers(PCs). The system uses computer vision techniques to detect a white cardboard as the projection screen and uses particle filter to trace the screen in subsequent frames. The system demands a large computation power, unfortunately the power of low cost embedded system is still not powerful enough to implement the ProCam system.Therefore, specially designed hardware and computationally efficient algorithm are required in order to implement the ProCam system on an embedded system. / An FPGA based special processor to share the workload of the microcontroller in the embedded system is proposed and tested. This special processor will take the data stream of the camera as the inputs and apply pixel-wise image operators such as image gradient and Gaussian blur in order to extract the edge pixels. As a result, the workload of the microcontroller in the embedded system is reduced. The experiments show that the design can be implement on a low-end FPGA with a simple microcontroller. / A line tracker using Multiple Model Kalman lter is also proposed in this thesis. The aim of this tracker is to reduce the time on tracking the board. Benet from the low computation requirement of Kalman filter, the proposed line tracker can run in 200 fps on our testing embedded system. The experiments also show that the robustness of the Multiple Model Kalman filter is satisfactory and it outperforms the line trackers using single Kalman filter or extended Kalman filter alone. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Fung, Hung Kwan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 115-124). / Abstracts also in Chinese. / Abstract --- p.ii / Acknowledgement --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation and Objective --- p.1 / Chapter 1.2 --- Contributions --- p.3 / Chapter 1.3 --- Thesis Organization --- p.5 / Chapter 2 --- Background --- p.7 / Chapter 2.1 --- Introduction --- p.7 / Chapter 2.2 --- Projector-Camera System --- p.8 / Chapter 2.2.1 --- Static Projector-Screen --- p.9 / Chapter 2.2.2 --- Dynamic Projector-Screen --- p.13 / Chapter 2.3 --- Embedded Vision --- p.15 / Chapter 2.4 --- Summary --- p.25 / Chapter 3 --- System Overview --- p.26 / Chapter 3.1 --- System Design --- p.26 / Chapter 3.2 --- Our Approach --- p.28 / Chapter 3.2.1 --- Projector-camera system --- p.28 / Chapter 3.2.2 --- Smart Camera --- p.31 / Chapter 3.2.3 --- Quadrangle Detection and Tracking Module --- p.32 / Chapter 3.2.4 --- Projection Module --- p.32 / Chapter 3.3 --- Extension --- p.33 / Chapter 4 --- Smart Camera --- p.34 / Chapter 4.1 --- Introduction --- p.34 / Chapter 4.2 --- Hardware Overview --- p.35 / Chapter 4.3 --- Image Acquisition --- p.40 / Chapter 4.4 --- Image Processing --- p.42 / Chapter 4.4.1 --- RGB-to-Gray Conversion Module . --- p.44 / Chapter 4.4.2 --- Image Smoothing Module --- p.45 / Chapter 4.4.3 --- Image Gradient Module --- p.49 / Chapter 4.4.4 --- Non-maximum Suppression and Hysteresis Thresholding --- p.53 / Chapter 4.5 --- Summary --- p.55 / Chapter 5 --- Quadrangle Detection and Tracking --- p.57 / Chapter 5.1 --- Introduction --- p.57 / Chapter 5.2 --- Line Feature Extraction --- p.61 / Chapter 5.3 --- Automatic Quadrangle Detection --- p.62 / Chapter 5.4 --- Real-time Quadrangle Tracking --- p.68 / Chapter 5.4.1 --- Line Tracker --- p.69 / Chapter 5.5 --- Tracking Lose Strategy --- p.76 / Chapter 5.6 --- Recover from Tracking Failure --- p.77 / Chapter 5.7 --- Summary --- p.77 / Chapter 6 --- Implementation and Experiment Result --- p.79 / Chapter 6.1 --- Introduction --- p.79 / Chapter 6.2 --- Smart Camera --- p.79 / Chapter 6.3 --- Line Tracking --- p.87 / Chapter 7 --- Limitation and Discussion --- p.101 / Chapter 7.1 --- Introduction --- p.101 / Chapter 7.2 --- Limitation --- p.101 / Chapter 7.3 --- Summary --- p.105 / Chapter 8 --- Application --- p.107 / Chapter 8.1 --- Introduction --- p.107 / Chapter 8.2 --- Portable Projector-Camera System --- p.107 / Chapter 8.3 --- Summary --- p.110 / Chapter 9 --- Conclusion --- p.112 / Bibliography --- p.115
|
668 |
Generalized surface geometry estimation in photometric stereo and two-view stereo matching.January 2011 (has links)
Hung, Chun Ho. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (p. 58-63). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Generalized Photometric Stereo --- p.6 / Chapter 2.1 --- Problem Description --- p.6 / Chapter 2.2 --- Related Work --- p.9 / Chapter 2.3 --- Photometric Stereo with Environment Lighting --- p.11 / Chapter 2.4 --- Estimating Surface Normals --- p.13 / Chapter 2.4.1 --- Surface Normal and Albedo Estimation --- p.14 / Chapter 2.5 --- Data Acquisition Configuration --- p.17 / Chapter 2.6 --- Issues --- p.19 / Chapter 2.7 --- Outlier Removal --- p.22 / Chapter 2.8 --- Experimental Results --- p.23 / Chapter 3 --- Generalized Stereo Matching --- p.30 / Chapter 3.1 --- Problem Description --- p.30 / Chapter 3.2 --- Related Work --- p.32 / Chapter 3.3 --- Our Approach --- p.33 / Chapter 3.3.1 --- Notations and Problem Introduction --- p.33 / Chapter 3.3.2 --- Depth and Motion Initialization --- p.35 / Chapter 3.3.3 --- Volume-based Structure Prior --- p.38 / Chapter 3.3.4 --- Objective Function with Volume-based Priors --- p.43 / Chapter 3.3.5 --- Numerical Solution --- p.46 / Chapter 3.4 --- Results --- p.48 / Chapter 4 --- Conclusion --- p.56 / Bibliography --- p.57
|
669 |
Learning based person re-identication across camera views.January 2013 (has links)
行人再識別的主要任務是匹配不交叉的監控攝像頭中觀測到的行人。隨著監控攝像頭的普遍,這是一個非常重要的任務。並且,它是其他很多任務的重要子任務,例如跨攝像頭的跟蹤。行人再識別的難度存在於不同攝像頭中觀測到的同一個人會有很大的變化。這些變化來自於觀察角度的不同,光照的不同,和行人姿態的變化等等。在本文中,我們希望從如下的方面來重新思考並解決這個問題。 / 首先,我們發現當待匹配集合增大的時候,匹配的難度大幅度增加。在實際應用中,我們可以通過時間上的推演來減少待匹配集合的大小,簡化行人再識別這個問題。現有通過機器學習的方法來解決這個問題的算法基本會假設一個全局固定的度量。我們的方法來自提出於對於不同的待匹配集合應該有不同的度量的新觀點。因此,我們把這個問題重新定義在一個遷移學習的框架下。給定一個較大的訓練集合,我們通過訓練集合的樣本與當前的查詢集合和待匹配集合的相似程度,重新對訓練集合進行加權。這樣,我們提出一個加權的最大化邊界的度量學習方法,而這個度量較全訓練集共享的整體度量更加的具體。 / 我們進一步發現,在兩個不同的鏡頭中,物體形態的變換很難通過一個單一模型來進行描述。為了解決這一個問題,我們提出一個混合專家模型,要將圖片的空間進行進一步細化。我們的算法將剖分圖形空間和在每個細分後的空間中學習一個跨鏡頭的變換來將特征進行對齊。測試時,新樣本會與現有的“專家“模型進行匹配,選擇合適的變換。 我們通過一個稀疏正則項和最小信息損失正則項來進行約束。 / 在對上面各種方法的分析中,我們發現提取特征和訓練模型總是分開進行。一個更好的方法是將模型的訓練和特征提取同時進行。為此,我們希望能夠使用卷積神經網絡 來實現這個目標。通過精心設計網絡結構,底層網絡能夠通過兩組一一對應的特征來描 述圖像的局部信息。而這種信息對於匹配人的顏色紋理等特徵更加適合。在較高的層我 們希望學習到人在空間上的位移來判斷局部的位移是符合於人在不同攝像頭中的位移。 通過這些信息,我們的模型來決定這兩張圖片是否來自于同一個人。 / 在以上三個部分中,我們都同最先進的度量學習和其他基于特征設計的行人再識別方法進行比較。我們在不同的數據集上均取得了較為優秀的效果。我們進一步建立了一 個大規模的數據集,這個數據集包含更多的視角、更多的人且每個人在不同的視角下有 更多的圖片。 / Person re-identification is to match persons observed in non-overlapping camera views with visual features. This is an important task in video surveillance by itself and serves as metatask for other problems like inter-camera tracking. Challenges lie in the dramatic intra-person variation introduced by viewpoint change, illumination change and pose variation etc. In this thesis, we are trying to tackle this problem in the following aspects: / Firstly, we observe that the ambiguity increases with the number of candidates to be distinguished. In real world scenario, temporal reasoning is available and can simplify the problem by pruning the candidate set to be matched. Existing approaches adopt a fixed metric for matching all the subjects. Our approach is motivated by the insight that different visual metrics should be optimally learned for different candidate sets. The problem is further formulated under a transfer learning framework. Given a large training set, the training samples are selected and re-weighted according to their visual similarities with the query sample and its candidate set. A weighted maximum margin metric is learned and transferred from a generic metric to a candidate-set-specific metric. / Secondly, we observe that the transformations between two camera views may be too complex to be uni-modal. To tackle this, we propose to partition the image space and formulate the problem into a mixture of expert framework. Our algorithm jointly partitions the image spaces of two camera views into different configurations according to the similarity of cross-view transforms. The visual features of an image pair from different views are locally aligned by being projected to a common feature space and then matched with softly assigned metrics which are locally optimized. The features optimal for recognizing identities are different from those for clustering cross-view transforms. They are jointly learned by utilizing sparsity-inducing norm and information theoretical regularization. / In all the above analysis, feature extraction and learning models are separately designed. A better idea is to directly learn features from training samples and those features can be applied to directly train a discriminative models. We propose a new model where feature extraction is jointly learned with a discriminative convolutional neural network. Local filters at the bottom layer can well extract the information useful for matching persons across camera views like color and texture. Higher layers will capture the spatial shift of those local patches. Finally, we will test whether the shift patterns of those local patches conform to the intra-camera variation of the same person. / In all three parts, comparisons with the state-of-the-art metric learning algorithms and person re-identification methods are carried out and our approach shows the superior performance on public benchmark dataset. Furthermore, we are building a much larger dataset that addresses the real-world scenario which contains much more camera views, identities, and images perview. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Li, Wei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 63-68). / Abstracts also in Chinese. / Acknowledgments --- p.iii / Abstract --- p.vii / Contents --- p.xii / List of Figures --- p.xiv / List of Tables --- p.xv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Person Re-Identification --- p.1 / Chapter 1.2 --- Challenge in Person Re-Identification --- p.2 / Chapter 1.3 --- Literature Review --- p.4 / Chapter 1.3.1 --- Feature Based Person Re-Identification --- p.4 / Chapter 1.3.2 --- Learning Based Person Re-Identification --- p.7 / Chapter 1.4 --- Thesis Organization --- p.8 / Chapter 2 --- Tranferred Metric Learning for Person Re-Identification --- p.10 / Chapter 2.1 --- Introduction --- p.10 / Chapter 2.2 --- Related Work --- p.12 / Chapter 2.2.1 --- Transfer Learning --- p.12 / Chapter 2.3 --- Our Method --- p.13 / Chapter 2.3.1 --- Visual Features --- p.13 / Chapter 2.3.2 --- Searching and Weighting Training Samples --- p.13 / Chapter 2.3.3 --- Learning Adaptive Metrics by Maximizing Weighted Margins --- p.15 / Chapter 2.4 --- Experimental Results --- p.17 / Chapter 2.4.1 --- Dataset Description --- p.17 / Chapter 2.4.2 --- Generic Metric Learning --- p.18 / Chapter 2.4.3 --- Transferred Metric Learning --- p.19 / Chapter 2.5 --- Conclusions and Discussions --- p.21 / Chapter 3 --- Locally Aligned Feature Transforms for Person Re-Identification --- p.23 / Chapter 3.1 --- Introduction --- p.23 / Chapter 3.2 --- Related Work --- p.24 / Chapter 3.2.1 --- Localized Methods --- p.25 / Chapter 3.3 --- Model --- p.26 / Chapter 3.4 --- Learning --- p.27 / Chapter 3.4.1 --- Priors --- p.27 / Chapter 3.4.2 --- Objective Function --- p.29 / Chapter 3.4.3 --- Training Model --- p.29 / Chapter 3.4.4 --- Multi-Shot Extension --- p.30 / Chapter 3.4.5 --- Discriminative Metric Learning --- p.31 / Chapter 3.5 --- Experiment --- p.32 / Chapter 3.5.1 --- Identification with Two Fixed Camera Views --- p.33 / Chapter 3.5.2 --- More General Camera Settings --- p.37 / Chapter 3.6 --- Conclusions --- p.38 / Chapter 4 --- Deep Neural Network for Person Re-identification --- p.39 / Chapter 4.1 --- Introduction --- p.39 / Chapter 4.2 --- Related Work --- p.43 / Chapter 4.3 --- Introduction of the New Dataset --- p.44 / Chapter 4.4 --- Model --- p.46 / Chapter 4.4.1 --- Architecture Overview --- p.46 / Chapter 4.4.2 --- Convolutional and Max-Pooling Layer --- p.48 / Chapter 4.4.3 --- Patch Matching Layer --- p.49 / Chapter 4.4.4 --- Maxout Grouping Layer --- p.52 / Chapter 4.4.5 --- Part Displacement --- p.52 / Chapter 4.4.6 --- Softmax Layer --- p.53 / Chapter 4.5 --- Training Strategies --- p.54 / Chapter 4.5.1 --- Data Augmentation and Balancing --- p.55 / Chapter 4.5.2 --- Bootstrapping --- p.55 / Chapter 4.6 --- Experiment --- p.56 / Chapter 4.6.1 --- Model Specification --- p.56 / Chapter 4.6.2 --- Validation on Single Pair of Cameras --- p.57 / Chapter 4.7 --- Conclusion --- p.58 / Chapter 5 --- Conclusion --- p.60 / Chapter 5.1 --- Conclusion --- p.60 / Chapter 5.2 --- Future Work --- p.61 / Bibliography --- p.63
|
670 |
Learning mid-level representations for scene understanding.January 2013 (has links)
本論文包括了對場景分類框架的描述,并針對自然場景中學習中間層特徵表達的問題做了深入的探討。 / 當前的場景分類框架主要包括特徵提取,特稱編碼,空間信息整合和分類器學習幾個步驟。在這些步驟中,特徵提取是圖像理解的基礎環節。局部特徵表達被認為是計算機視覺在實際應用中成功的關鍵。但是近年來,中間層信息表達逐漸吸引了這個領域的眾多目光。本論文從兩個方面來理解中間層特徵。一個是局部底層信息的整合,另外一個是語義信息的嵌入。本文中,我們的工作同時覆蓋了“整合“和“語意“兩個方面。 / 在自然圖像的統計特徵中,我們發現圖像底層響應的相關性代表了局部結構信息。基於這個發現,我們構造了一個兩層學習模型。第一層是長得類似邊響應的底層信息,第二層是過完備的協方差特徵層,同時也是本文中提到的中間層信息。從“整合局部底層信息“的角度看,我們的方法在在這個方向上更進一步。我們將中間層特徵用到了場景分類中,并取得了良好的效果。特別是與人工設計的特徵相比,我們的特徵完全來自于自動學習。我們的協方差特徵的有效性為未來的特徵學習提供了一個新的思路:對於低層響應的相互關係的研究可以幫助構造表達能力更強的特徵。 / 爲了將語義信息加入到中間層特徵的學習中,我們定義了一個名詞叫做“信息化組分“。 所謂的信息化組分指的是那些能夠用來描述一類場景同時又能用來區分不同場景的結構化信息。基於固定秩的產生式模型的假設,我們設計了產生式模型和判別式分類器聯合學習的優化模型。通過將學習得到的信息化組分用到場景分類的實驗中,這類信息化結構的有效性得到了充分地證實。我們同時發現,如果將這一類信息化結構和底層的特徵表達聯合起來作為新的特徵表達,會使得分類的準確率得到進一步地提升。這個發現為我們未來的工作指引了方向:通過嘗試合併多層的特徵表達來提高整體的分類效果。 / This thesis contains the review of state-of-the-art scene classification frameworks and study about learning mid-level representations for scene understanding. / Current scene classification pipeline consists of feature extraction, feature encoding, spatial aggregation, and classifier learning. Among these steps, feature extraction is the most fundamental one for scene understanding. Beyond low level features, obtaining effective mid-level representations catches eyes in the scene understanding field in recent years. We interpret mid-level representations from two perspectives. One is the aggregation from low level cues and the other is embedding semantic information. In this thesis, our work harvests both properties of “aggregation“ and “semantic“. / Given the observation from natural image statistics that correlations among patch-level responses contain strong structure information, we build a two-layer model. The first layer is the patch level response with edge-let appearance, and the second layer contains sparse covariance patterns, which is considered as the mid-level representation. From the view of “aggregation from low level cues“, our work moves one step further in this direction. We use learned covariance patterns in scene classification. It shows promising performance even compared with those human-designed features. The efficiency of our covariance patterns gives a new clue for feature learning, that is, correlations among lower-layer responses can help build more powerful feature representations. / With the motivation of coupling semantic information into building the mid-level representation, we define a new “informative components“ term in this thesis. Informative components refer to those regions that are descriptive within one class and also distinctive among different classes. Based on a generative assumption that descriptive regions can fit a fixed rank model, we provide an integrated optimization framework, which combines generative modeling and discriminative learning together. Experiments on scene classification bear out the efficiency of our informative components. We also find that by simply concatenating informative components with low level responses, the classification performance can be further improved. This throws light on the future direction to improve representation power via the combination of multiple-layer representations. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Wang, Liwei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 62-72). / Abstracts also in Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Scene Classification Pipeline --- p.1 / Chapter 1.2 --- Learning Mid-Level Representations --- p.6 / Chapter 1.3 --- Contributions and Organization --- p.7 / Chapter 2 --- Background --- p.9 / Chapter 2.1 --- Mid-level Representations --- p.9 / Chapter 2.1.1 --- Aggregation FromLow Level Cues --- p.10 / Chapter 2.1.2 --- Embedding Semantic Information --- p.13 / Chapter 2.2 --- Scene Data Sets Description --- p.16 / Chapter 3 --- Learning Sparse Covariance Patterns --- p.20 / Chapter 3.1 --- Introduction --- p.20 / Chapter 3.2 --- Model --- p.26 / Chapter 3.3 --- Learning and Inference --- p.28 / Chapter 3.3.1 --- Inference --- p.28 / Chapter 3.3.2 --- Learning --- p.30 / Chapter 3.4 --- Experiments --- p.31 / Chapter 3.4.1 --- Structure Mapping --- p.33 / Chapter 3.4.2 --- 15-Scene Classification --- p.34 / Chapter 3.4.3 --- Indoor Scene Recognition --- p.36 / Chapter 3.5 --- Summary --- p.38 / Chapter 4 --- Learning Informative Components --- p.39 / Chapter 4.1 --- Introduction --- p.39 / Chapter 4.2 --- RelatedWork --- p.43 / Chapter 4.3 --- OurModel --- p.45 / Chapter 4.3.1 --- Component Level Representation --- p.45 / Chapter 4.3.2 --- Fixed Rank Modeling --- p.46 / Chapter 4.3.3 --- Informative Component Learning --- p.47 / Chapter 4.4 --- Experiments --- p.52 / Chapter 4.4.1 --- Informative Components Learning --- p.54 / Chapter 4.4.2 --- Scene Classification --- p.55 / Chapter 4.5 --- Summary --- p.58 / Chapter 5 --- Conclusion --- p.60 / Bibliography --- p.62
|
Page generated in 0.0925 seconds