• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1941
  • 313
  • 150
  • 112
  • 108
  • 69
  • 56
  • 46
  • 25
  • 20
  • 14
  • 13
  • 13
  • 13
  • 13
  • Tagged with
  • 3583
  • 3583
  • 974
  • 871
  • 791
  • 791
  • 646
  • 618
  • 578
  • 539
  • 530
  • 525
  • 479
  • 451
  • 448
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
641

Local deformation modelling for non-rigid structure from motion

Kavamoto Fayad, João Renato January 2013 (has links)
Reconstructing the 3D geometry of scenes based on monocular image sequences is a long-standing problem in computer vision. Structure from motion (SfM) aims at a data-driven approach without requiring a priori models of the scene. When the scene is rigid, SfM is a well understood problem with solutions widely used in industry. However, if the scene is non-rigid, monocular reconstruction without additional information is an ill-posed problem and no satisfactory solution has yet been found. Current non-rigid SfM (NRSfM) methods typically aim at modelling deformable motion globally. Additionally, most of these methods focus on cases where deformable motion is seen as small variations from a mean shape. In turn, these methods fail at reconstructing highly deformable objects such as a flag waving in the wind. Additionally, reconstructions typically consist of low detail, sparse point-cloud representation of objects. In this thesis we aim at reconstructing highly deformable surfaces by modelling them locally. In line with a recent trend in NRSfM, we propose a piecewise approach which reconstructs local overlapping regions independently. These reconstructions are merged into a global object by imposing 3D consistency of the overlapping regions. We propose our own local model – the Quadratic Deformation model – and show how patch division and reconstruction can be formulated in a principled approach by alternating at minimizing a single geometric cost – the image re-projection error of the reconstruction. Moreover, we extend our approach to dense NRSfM, where reconstructions are preformed at the pixel level, improving the detail of state of the art reconstructions. Finally we show how our principled approach can be used to perform simultaneous segmentation and reconstruction of articulated motion, recovering meaningful segments which provide a coarse 3D skeleton of the object.
642

Reasoning scene geometry from single images

Liu, Yixian January 2014 (has links)
Holistic scene understanding is one of the major goals in recent research of computer vision. Most popular recognition algorithms focus on semantic understanding and are incapable of providing the global depth information of the scene structure from the 2D projection of the world. Yet it is obvious that recovery of scene surface layout could be used to help many practical 3D-based applications, including 2D-to-3D movie re-production, robotic navigation, view synthesis, etc. Therefore, we identify scene geometric reasoning as the key problem of scene understanding. This PhD work makes a contribution to the reconstruction problem of 3D shape of scenes from monocular images. We propose an approach to recognise and reconstruct the geometric structure of the scene from a single image. We have investigated several typical scene geometries and built a few corresponding reference models in a hierarchical order for scene representation. The framework is set up based on the analysis of image statistical features and scene geometric features. Correlation is introduced to theoretically integrate these two types of features. Firstly, an image is categorized into one of the reference geometric models using the spatial pattern classi cation. Then, we estimate the depth pro le of the speci c scene by proposing an algorithm for adaptive automatic scene reconstruction. This algorithm employs speci cally developed reconstruction approaches for di erent geometric models. The theory and algorithms are instantiated in a system for the scene classi cation and visualization. The system is able to fi nd the best fi t model for most of the images from several benchmark datasets. Our experiments show that un-calibrated low-quality monocular images could be e fficiently and realistically reconstructed in simulated 3D space. By our approach, computers could interpret a single still image as its underlying geometry straightforwardly, avoiding usual object occlusion, semantic overlapping and defi ciency problems.
643

Stereo vision without the scene-smoothness assumption: the homography-based approach.

January 1998 (has links)
by Andrew L. Arengo. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (leaves 65-66). / Abstract also in Chinese. / Acknowledgments --- p.ii / List Of Figures --- p.v / Abstract --- p.vii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation and Objective --- p.2 / Chapter 1.2 --- Approach of This Thesis and Contributions --- p.3 / Chapter 1.3 --- Organization of This Thesis --- p.4 / Chapter 2 --- Previous Work --- p.6 / Chapter 2.1 --- Using Grouped Features --- p.6 / Chapter 2.2 --- Applying Additional Heuristics --- p.7 / Chapter 2.3 --- Homography and Related Works --- p.9 / Chapter 3 --- Theory and Problem Formulation --- p.10 / Chapter 3.1 --- Overview of the Problems --- p.10 / Chapter 3.1.1 --- Preprocessing --- p.10 / Chapter 3.1.2 --- Establishing Correspondences --- p.11 / Chapter 3.1.3 --- Recovering 3D Depth --- p.14 / Chapter 3.2 --- Solving the Correspondence Problem --- p.15 / Chapter 3.2.1 --- Epipolar Constraint --- p.15 / Chapter 3.2.2 --- Surface-Continuity and Feature-Ordering Heuristics --- p.16 / Chapter 3.2.3 --- Using the Concept of Homography --- p.18 / Chapter 3.3 --- Concept of Homography --- p.20 / Chapter 3.3.1 --- Barycentric Coordinate System --- p.20 / Chapter 3.3.2 --- Image to Image Mapping of the Same Plane --- p.22 / Chapter 3.4 --- Problem Formulation --- p.23 / Chapter 3.4.1 --- Preliminaries --- p.23 / Chapter 3.4.2 --- Case of Single Planar Surface --- p.24 / Chapter 3.4.3 --- Case of Multiple Planar Surfaces --- p.28 / Chapter 3.5 --- Subspace Clustering --- p.28 / Chapter 3.6 --- Overview of the Approach --- p.30 / Chapter 4 --- Experimental Results --- p.33 / Chapter 4.1 --- Synthetic Images --- p.33 / Chapter 4.2 --- Aerial Images --- p.36 / Chapter 4.2.1 --- T-shape building --- p.38 / Chapter 4.2.2 --- Rectangular Building --- p.39 / Chapter 4.2.3 --- 3-layers Building --- p.40 / Chapter 4.2.4 --- Pentagon --- p.44 / Chapter 4.3 --- Indoor Scenes --- p.52 / Chapter 4.3.1 --- Stereo Motion Pair --- p.53 / Chapter 4.3.2 --- Hallway Scene --- p.56 / Chapter 5 --- Summary and Conclusions --- p.63
644

Stereo vision and motion analysis in complement.

January 1998 (has links)
by Ho Pui-Kuen, Patrick. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (leaves 57-59). / Abstract also in Chinese. / Acknowledgments --- p.ii / List Of Figures --- p.v / List Of Tables --- p.vi / Abstract --- p.vii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Moviation of Problem --- p.1 / Chapter 1.2 --- Our Approach and Summary of Contributions --- p.3 / Chapter 1.3 --- Organization of this Thesis --- p.4 / Chapter 2 --- Previous Work --- p.5 / Chapter 3 --- Structure Recovery from Stereo-Motion Images --- p.7 / Chapter 3.1 --- Motion Model --- p.8 / Chapter 3.2 --- Stereo-Motion Model --- p.10 / Chapter 3.3 --- Inferring Stereo Correspondences --- p.13 / Chapter 3.4 --- Determining 3D Structure from One Stereo Pair --- p.17 / Chapter 3.5 --- Computational Complexity of Inference Process --- p.18 / Chapter 4 --- Experimental Results --- p.19 / Chapter 4.1 --- Synthetic Images and Statistical Results --- p.19 / Chapter 4.2 --- Real Image Sequences --- p.21 / Chapter 4.2.1 --- House Model' Image Sequences --- p.22 / Chapter 4.2.2 --- Oscilloscope and Soda Can' Image Sequences --- p.23 / Chapter 4.2.3 --- Bowl' Image Sequences --- p.24 / Chapter 4.2.4 --- Building' Image Sequences --- p.27 / Chapter 4.3 --- Computational Time of Experiments --- p.28 / Chapter 5 --- Determining Motion and Structure from All Stereo Pairs --- p.30 / Chapter 5.1 --- Determining Motion and Structure --- p.31 / Chapter 5.2 --- Identifying Incorrect Motion Correspondences --- p.33 / Chapter 6 --- More Experiments --- p.34 / Chapter 6.1 --- Synthetic Cube' Images --- p.34 / Chapter 6.2 --- Snack Bag´ة Image Sequences --- p.35 / Chapter 6.3 --- Comparison with Structure Recovered from One Stereo Pair --- p.37 / Chapter 7 --- Conclusion --- p.41 / Chapter A --- Basic Concepts in Computer Vision --- p.43 / Chapter A.1 --- Camera Projection Model --- p.43 / Chapter A.2 --- Epipolar Constraint in Stereo Vision --- p.47 / Chapter B --- Inferring Stereo Correspondences with Matrices of Rank < 4 --- p.49 / Chapter C --- Generating Image Reprojection --- p.51 / Chapter D --- Singular Value Decomposition --- p.53 / Chapter E --- Quaternion --- p.55
645

Deep neural networks in computer vision and biomedical image analysis

Xie, Weidi January 2017 (has links)
This thesis proposes different models for a variety of applications, such as semantic segmentation, in-the-wild face recognition, microscopy cell counting and detection, standardized re-orientation of 3D ultrasound fetal brain and Magnetic Resonance (MR) cardiac video segmentation. Our approach is to employ the large-scale machine learning models, in particular deep neural networks. Expert knowledge is either mathematically modelled as a differentiable hidden layer in the Artificial Neural Networks, or we tried to break the complex tasks into several small and easy-to-solve tasks. Multi-scale contextual information plays an important role in pixel-wise predic- tion, e.g. semantic segmentation. To capture the spatial contextual information, we present a new block for learning receptive field adaptively by within-layer recurrence. While interleaving with the convolutional layers, receptive fields are effectively enlarged, reaching across the entire feature map or image. The new block can be initialized as identity and inserted into any pre-trained networks, therefore taking benefit from the "pre-train and fine-tuning" paradigm. Current face recognition systems are mostly driven by the success of image classification, where the models are trained to by identity classification. We propose a multi-column deep comparator networks for face recognition. The architecture takes two sets (each contains an arbitrary number of faces) of images or frames as inputs, facial part-based (e.g. eyes, noses) representations of each set are pooled out, dynamically calibrated based on the quality of input images, and further compared with local "experts" in a pairwise way. Unlike the computer vision applications, collecting data and annotation is usually more expensive in biomedical image analysis. Therefore, the models that can be trained with fewer data and weaker annotations are of great importance. We approach the microscopy cell counting and detection based on density estimation, where only central dot annotations are needed. The proposed fully convolutional regression networks are first trained on a synthetic dataset of cell nuclei, later fine-tuned and shown to generalize to real data. In 3D fetal ultrasound neurosonography, establishing a coordinate system over the fetal brain serves as a precursor for subsequent tasks, e.g. localization of anatomical landmarks, extraction of standard clinical planes for biometric assessment of fetal growth, etc. To align brain volumes into a common reference coordinate system, we decompose the complex transformation into several simple ones, which can be easily tackled with Convolutional Neural Networks. The model is therefore designed to leverage the closely related tasks by sharing low-level features, and the task-specific predictions are then combined to reproduce the transformation matrix as the desired output. Finally, we address the problem of MR cardiac video analysis, in which we are interested in assisting clinical diagnosis based on the fine-grained segmentation. To facilitate segmentation, we present one end-to-end trainable model that achieves multi-view structure detection, alignment (standardized re-orientation), and fine- grained segmentation simultaneously. This is motivated by the fact that the CNNs in essence is not rotation equivariance or invariance, therefore, adding the pre-alignment into the end-to-end trainable pipeline can effectively decrease the complexity of segmentation for later stages of the model.
646

Synthesis of view invariance for high-level object features. / CUHK electronic theses & dissertations collection

January 2013 (has links)
Hui, Ka Yu. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 101-106). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese.
647

Intelligent surveillance system employing object detection, recognition, segmentation, and object-based coding. / CUHK electronic theses & dissertations collection

January 2013 (has links)
視頻監控通常是指為了管理、引導和保護人們,通過電子設備監視和人們有關的行為、活動或者信息變化,例如使用閉路電視或者攔截遠距離電子傳輸的信息,如網絡流量,手機通信。視頻監控的潛在應用包括國土安全,反犯罪,交通控製,小孩、老人和病人的遠程看護。視頻監控技術為打擊恐怖主义和异常事件提供一小重要的防護。通過使用闭路电視摄像机等廉份的現代电子技朮使得視頻監控可成為可能。但是,除非一直監視著來自這些攝像機的反饋,否則它們提供僅僅是一些心理上安全。僱用足夠的人員來監督這些成千上萬的屏幕是不現實的,然而使用具有高度智能的現代自動化系統可以彌補這一空缺。 / 對于全天候地準確地管理成千上萬地攝像機,人工智能化的視頻監控是非常必要而且重要的。通常來說,智能監控包括以下部分: 1 信息獲取,如利用一個或者多個攝像機或者熱感成像或深度成像攝像機; 2 視頻分析,如目標檢測,識別,跟蹤,再識別或分割。3 存儲和傳輸,如編碼,分類和製片。在本文中,我們構建一個智能監控系統,其包括三個相互協作的摄像機用來估計感興趣物體的3D位置並且進行研究和跟蹤。為了識別物體,我們提出級聯頭肩檢測器尋找人臉區域進行識別。感興趣物體分割出來用于任意形狀物體編碼器對物體進行壓縮。 / 在第一部分中,我們討論如何使多個攝像頭在一起工作。在我們系統中,兩個固定的攝像機像人眼一樣註視著整個監控場景,搜尋非正常事件。如果有警報被非正常事件激活, PTZ攝像機會用來處理該事件,例如去跟蹤或者調查不明物體。利用相機標定技術,我們可以估計出物體的3D信息并將其傳輪到三個攝像機。 / 在第二部分中,我們提出級聯頭肩檢測器來檢測正面的頭肩并進行高級別的物體分析,例如識別和異常行為分析。在檢測器中,我們提出利用級聯結構融閤兩種強大的特徵, Harar-like 特微和HOG特徽,他們能有傚的檢測人臉和行人。利用Harr-like特徵,頭肩檢測器能夠在初期用有限的計算去除非頭肩區域。檢測的區域可以用來識別和分割。 / 在第三部分中,利用訓練的糢型,人臉區域可以從檢測到的頭肩區域中提取。利用CAMshift對人臉區域進行細化。在視頻監控的環境中,人臉識別是十分具有挑戰性的,因為人臉圖像受到多種因素的影響,例如在不均勻光綫條件下變化姿態和非聚焦糢糊的人臉。基于上述觀測,我們提出一種使用OLPF特微結閤AGMM糢型的人臉識別方法,其中OLPF特徵不僅不受糢糊圖像的影響,而且對人臉的姿態很魯棒。AGMM能夠很好地構建多種人臉。對標準測試集和實際數據的實驗結果證明了我們提出的方法一直地优于其它最先進的人臉識別方法。 / 在第四部分中,我們提出一種自動人體分割系統。首先,我們用檢測到的人臉或者人體對graph cut分割模型初始化并使用max-flow /min-cut算法對graph進行優化。針對有缺點的檢測目標的情況,采用一種基于coarse-to-fine的分割策略。我們提出抹除背景差別技術和自適應初始化level set 技術來解決存在于通用模型中的讓人頭疼的分割問題,例如發生在高差別的物體邊界區域或者在物體和背景中存在相同顏色的錯誤分割。實驗結果證明了我們的人體分割系統在實時視頻圖像和具有復雜背景的標準測試序列中都能很好的運作。 / 在最后部分中,我們專註于怎么樣對視頻內容進行智能的壓縮。在最近幾十年里,視頻編碼研究取得了巨大的成就,例如H.264/AVC標準和下一代的HEVC標準,它們的壓縮性能大大的超過以往的標準,高于50% 。但是相對于MPEG-4 ,在最新的編碼標準中缺少了壓縮任意形狀物體的能力。雖然在現在的H.264/AVC 中提供了片組結構和彈性模塊組閤技術,但是它仍然不能準確地高效地處理任意形狀區域。為了解決H.264/AVC 的這一缺點,我們提出基于H.264/AVC編碼框架的任意形狀物體編碼,它包括二值圖像編碼,運動補償和紋理編碼。在我們系統里,我們采用了1) 用新的運動估計改進的二值圖像編碼,它對二值塊的預測很有用。2) 在紋理編碼中,采用新的任意形狀整型變換來壓縮紋理信息,它是一種從4x4的ICT衍生出來的變換。3)和一些讓該編碼器勻新的框架兼容的相關編碼技術。我們把編碼器應用到高清視頻序列並且從客觀方便和主觀方面對編碼器進行評估。實驗結果證明了我們的編碼器遠遠超越以前的物體編碼方法並且十分接近H.264/AVC 的編碼性能。 / Surveillance is the process of monitoring the behaviour, activities, or changing information, usually of people for the purpose of managing, directing or protecting by means of electronic equipment, such as closed-circuit television (CCTV) camera or interception of electronically transmitted information from a distance, such as Internet or phone calls. Some potential surveillance applications are homeland security, anti-crime, traffic control, monitoring children, elderly and patients at a distance. Surveillance technology provides a shield against terrorism and abnormal event, and cheap modern electronics makes it possible to implement with CCTV cameras. But unless the feeds from those cameras are constantly monitored, they only provide an illusion of security. Finding enough observers to watch thousands of screens simply is impractical, yet modern automated systems can solve the problems with a surprising degree of intelligence. / Surveillance with intelligence is necessary and important to accurately mange the information from millions of sensors in 7/24 hours. Generally, intelligent surveillance includes: 1. information acquirement, like a single or the collaboration of multiple cameras, thermal or depth camera; 2. video analysis, like object detection, recognition, tracking, re-identification and segmentation; 3. storage and transmission, like coding, classification, and footage. In this thesis, we build an intelligent surveillance system, in which three cameras working collaboratively to estimate the position of the object of interest (OOI) in 3D space, investigate and track it. In order to identify the OOI, Cascade Head-Shoulder Detector is proposed to find the face region for recognition. The object can be segmented out and compressed by arbitrarily shaped object coding (ASOC). / In the first part, we discuss how to make the multiple cameras work together. In our system, two stationary cameras, like human eyes, are focusing on the whole scene of the surveillance region to observe abnormal events. If an alarm is triggered by abnormal instance, a PTZ camera will be assigned to deal with it, such as tracking orinvestigating the object. With calibrated cameras, the 3D information of the object can be estimated and communicated among the three cameras. / In the second part, cascade head-shoulder detector (CHSD) is proposed to detect the frontal head-shoulder region in the surveillance videos. The high-level object analysis will be performed on the detected region, e.g., recognition and abnormal behaviour analysis. In the detector, we propose a cascading structure that fuses the two powerful features: Haar-like feature and HOG feature, which have been used to detect face and pedestrian efficiently. With the Haar-like feature, CHSD can reject most of non-headshoulder regions in the earlier stages with limited computations. The detected region can be used for recognition and segmentation. / In the third part, the face region can be extracted from the detected head-shoulder region with training the body model. Continuously adaptive mean shift (CAMshift) is proposed to refine the face region. Face recognition is a very challenging problem in surveillance environment because the face image suffers from the concurrence of multiple factors, such as a variant pose with out-of-focused blurring under non-uniform lighting condition. Based on this observations, we propose a face recognition method using overlapping local phase feature (OLPF) feature and adaptive Gaussian mixture model (AGMM). OLPF feature is not only invariant to blurring but also robust to pose variations and AGMM can robustly model the various faces. Experiments conducted on standard dataset and real data demonstrate that the proposed method consistently outperforms the state-of-art face recognition methods. / In the forth part, we propose an automatic human body segmentation system. We first initialize graph cut using the detected face/body and optimize the graph by maxflow/ min-cut. And then a coarse-to-fine segmentation strategy is employed to deal with the imperfectly detected object. Background contrast removal (BCR) and selfadaptive initialization level set (SAILS) are proposed to solve the tough problems that exist in the general graph cut model, such as errors occurred at object boundary with high contrast and similar colors in the object and background. Experimental results demonstrate that our body segmentation system works very well in live videos and standard sequences with complex background. / In the last part, we concentrate on how to intelligently compress the video context. In recent decades, video coding research has achieved great progress, such as inH.264/AVC and next generation HEVC whose compression performance significantly exceeds previous standards by more than 50%. But as compared with the MPEG-4, the capability of coding arbitrarily shaped objects is absent from the following standards. Despite of the provision of slice group structures and flexible macroblock ordering (FMO) in the current H.264/AVC, it cannot deal with arbitrarily shaped regions accurately and efficiently. To solve the limitation of H.264/AVC, we propose the arbitrarily shaped object coding (ASOC) based on the framework H.264/AVC, which includes binary alpha coding, motion compensation and texture coding. In our ASOC, we adopt (1) an improved binary alpha Coding with a novel motion estimation to facilitate the binary alpha blocks prediction, (2) an arbitrarily shaped integer transform derivative from the 4×4 ICT in H.264/AVC to code texture and (3) associated coding techniques to make ASOC more compatible with the new framework. We extent ASOC to HD video and evaluate it objectively and subjectively. Experimental results prove that our ASOC significantly outperforms previous object-coding methods and performs close to the H.264/AVC. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Liu, Qiang. / "November 2012." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 123-135). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstracts in English and Chinese. / Dedication --- p.ii / Acknowledgments --- p.iii / Abstract --- p.vii / Publications --- p.x / Nomenclature --- p.xii / Contents --- p.xviii / List of Figures --- p.xxii / List of Tables --- p.xxiii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation and objectives --- p.1 / Chapter 1.2 --- A brief review of camera calibration --- p.2 / Chapter 1.3 --- Object detection --- p.5 / Chapter 1.3.1 --- Face detection --- p.5 / Chapter 1.3.2 --- Pedestrian detection --- p.7 / Chapter 1.4 --- Recognition --- p.8 / Chapter 1.5 --- Segmentation --- p.10 / Chapter 1.5.1 --- Thresholding-based methods --- p.11 / Chapter 1.5.2 --- Clustering-based methods --- p.11 / Chapter 1.5.3 --- Histogram-based methods --- p.12 / Chapter 1.5.4 --- Region-growing methods --- p.12 / Chapter 1.5.5 --- Level set methods --- p.13 / Chapter 1.5.6 --- Graph cut methods --- p.13 / Chapter 1.5.7 --- Neural network-based methods --- p.14 / Chapter 1.6 --- Object-based video coding --- p.14 / Chapter 1.7 --- Organization of thesis --- p.16 / Chapter 2 --- Cameras Calibration --- p.18 / Chapter 2.1 --- Introduction --- p.18 / Chapter 2.2 --- Basic Equations --- p.21 / Chapter 2.2.1 --- Parameters of Camera Model --- p.22 / Chapter 2.2.2 --- Two-view homography induced by a Plane --- p.22 / Chapter 2.3 --- Pair-wise pose estimation --- p.23 / Chapter 2.3.1 --- Homography estimation --- p.24 / Chapter 2.3.2 --- Calculation of n and λ --- p.24 / Chapter 2.3.3 --- (R,t) Estimation --- p.25 / Chapter 2.4 --- Distortion analysis and correction --- p.27 / Chapter 2.5 --- Feature detection and matching --- p.28 / Chapter 2.6 --- 3D point estimation and evaluation --- p.30 / Chapter 2.7 --- Conclusion --- p.34 / Chapter 3 --- Cascade Head-Shoulder Detector --- p.35 / Chapter 3.1 --- Introduction --- p.35 / Chapter 3.2 --- Cascade head-shoulder detection --- p.36 / Chapter 3.2.1 --- Initial feature rejecter --- p.37 / Chapter 3.2.2 --- Haar-like rejecter --- p.39 / Chapter 3.2.3 --- HOG feature classifier --- p.40 / Chapter 3.2.4 --- Cascade of classifiers --- p.45 / Chapter 3.3 --- Experimental results and analysis --- p.46 / Chapter 3.3.1 --- CHSD training --- p.46 / Chapter 3.4 --- Conclusion --- p.49 / Chapter 4 --- A Robust Face Recognition in Surveillance --- p.50 / Chapter 4.1 --- Introduction --- p.50 / Chapter 4.2 --- Cascade head-shoulder detection --- p.53 / Chapter 4.2.1 --- Body model training --- p.53 / Chapter 4.2.2 --- Face region refinement --- p.54 / Chapter 4.3 --- Face recognition --- p.56 / Chapter 4.3.1 --- Overlapping local phase feature (OLPF) --- p.56 / Chapter 4.3.2 --- Fixed Gaussian Mixture Model (FGMM) --- p.59 / Chapter 4.3.3 --- Adaptive Gaussian mixture model --- p.61 / Chapter 4.4 --- Experimental verification --- p.62 / Chapter 4.4.1 --- Preprocessing --- p.62 / Chapter 4.4.2 --- Face recognition --- p.63 / Chapter 4.5 --- Conclusion --- p.66 / Chapter 5 --- Human Body Segmentation --- p.68 / Chapter 5.1 --- Introduction --- p.68 / Chapter 5.2 --- Proposed automatic human body segmentation system --- p.70 / Chapter 5.2.1 --- Automatic human body detection --- p.71 / Chapter 5.2.2 --- Object Segmentation --- p.73 / Chapter 5.2.3 --- Self-adaptive initialization level set --- p.79 / Chapter 5.2.4 --- Object Updating --- p.86 / Chapter 5.3 --- Experimental results --- p.87 / Chapter 5.3.1 --- Evaluation using real-time videos and standard sequences --- p.87 / Chapter 5.3.2 --- Comparison with Other Methods --- p.87 / Chapter 5.3.3 --- Computational complexity analysis --- p.91 / Chapter 5.3.4 --- Extensions --- p.93 / Chapter 5.4 --- Conclusion --- p.93 / Chapter 6 --- Arbitrarily Shaped Object Coding --- p.94 / Chapter 6.1 --- Introduction --- p.94 / Chapter 6.2 --- Arbitrarily shaped object coding --- p.97 / Chapter 6.2.1 --- Shape coding --- p.97 / Chapter 6.2.2 --- Lossy alpha coding --- p.99 / Chapter 6.2.3 --- Motion compensation --- p.102 / Chapter 6.2.4 --- Texture coding --- p.105 / Chapter 6.3 --- Performance evaluation --- p.108 / Chapter 6.3.1 --- Objective evaluations --- p.108 / Chapter 6.3.2 --- Extension on HD sequences --- p.112 / Chapter 6.3.3 --- Subjective evaluations --- p.115 / Chapter 6.4 --- Conclusions --- p.119 / Chapter 7 --- Conclusions and future work --- p.120 / Chapter 7.1 --- Contributions --- p.120 / Chapter 7.1.1 --- 3D object positioning --- p.120 / Chapter 7.1.2 --- Automatic human body detection --- p.120 / Chapter 7.1.3 --- Human face recognition --- p.121 / Chapter 7.1.4 --- Automatic human body segmentation --- p.121 / Chapter 7.1.5 --- Arbitrarily shaped object coding --- p.121 / Chapter 7.2 --- Future work --- p.122 / Bibliography --- p.123
648

Object Detection Using Convolutional Neural Network Trained on Synthetic Images

Vi, Margareta January 2018 (has links)
Training data is the bottleneck for training Convolutional Neural Networks. A larger dataset gives better accuracy though also needs longer training time. It is shown by finetuning neural networks on synthetic rendered images, that the mean average precision increases. This method was applied to two different datasets with five distinctive objects in each. The first dataset consisted of random objects with different geometric shapes. The second dataset contained objects used to assemble IKEA furniture. The neural network with the best performance, trained on 5400 images, achieved a mean average precision of 0.81 on a test which was a sample of a video sequence. Analysis of the impact of the factors dataset size, batch size, and numbers of epochs used in training and different network architectures were done. Using synthetic images to train CNN’s is a promising path to take for object detection where access to large amount of annotated image data is hard to come by.
649

A random forest approach to segmenting and classifying gestures

Joshi, Ajjen Das 12 March 2016 (has links)
This thesis investigates a gesture segmentation and recognition scheme that employs a random forest classification model. A complete gesture recognition system should localize and classify each gesture from a given gesture vocabulary, within a continuous video stream. Thus, the system must determine the start and end points of each gesture in time, as well as accurately recognize the class label of each gesture. We propose a unified approach that performs the tasks of temporal segmentation and classification simultaneously. Our method trains a random forest classification model to recognize gestures from a given vocabulary, as presented in a training dataset of video plus 3D body joint locations, as well as out-of-vocabulary (non-gesture) instances. Given an input video stream, our trained model is applied to candidate gestures using sliding windows at multiple temporal scales. The class label with the highest classifier confidence is selected, and its corresponding scale is used to determine the segmentation boundaries in time. We evaluated our formulation in segmenting and recognizing gestures from two different benchmark datasets: the NATOPS dataset of 9,600 gesture instances from a vocabulary of 24 aircraft handling signals, and the CHALEARN dataset of 7,754 gesture instances from a vocabulary of 20 Italian communication gestures. The performance of our method compares favorably with state-of-the-art methods that employ Hidden Markov Models or Hidden Conditional Random Fields on the NATOPS dataset. We conclude with a discussion of the advantages of using our model.
650

Understanding Human Activities at Large Scale

Caba Heilbron, Fabian David 03 1900 (has links)
With the growth of online media, surveillance and mobile cameras, the amount and size of video databases are increasing at an incredible pace. For example, YouTube reported that over 400 hours of video are uploaded every minute to their servers. Arguably, people are the most important and interesting subjects of such videos. The computer vision community has embraced this observation to validate the crucial role that human action recognition plays in building smarter surveillance systems, semantically aware video indexes and more natural human-computer interfaces. However, despite the explosion of video data, the ability to automatically recognize and understand human activities is still somewhat limited. In this work, I address four different challenges at scaling up action understanding. First, I tackle existing dataset limitations by using a flexible framework that allows continuous acquisition, crowdsourced annotation, and segmentation of online videos, thus, culminating in a large-scale, rich, and easy-to-use activity dataset, known as ActivityNet. Second, I develop an action proposal model that takes a video and directly generates temporal segments that are likely to contain human actions. The model has two appealing properties: (a) it retrieves temporal locations of activities with high recall, and (b) it produces these proposals quickly. Thirdly, I introduce a model, which exploits action-object and action-scene relationships to improve the localization quality of a fast generic action proposal method and to prune out irrelevant activities in a cascade fashion quickly. These two features lead to an efficient and accurate cascade pipeline for temporal activity localization. Lastly, I introduce a novel active learning framework for temporal localization that aims to mitigate the data dependency issue of contemporary action detectors. By creating a large-scale video benchmark, designing efficient action scanning methods, enriching approaches with high-level semantics for activity localization, and an effective strategy to build action detectors with limited data, this thesis is making a step closer towards general video understanding.

Page generated in 0.109 seconds