Return to search

Intelligent surveillance system employing object detection, recognition, segmentation, and object-based coding. / CUHK electronic theses & dissertations collection

視頻監控通常是指為了管理、引導和保護人們,通過電子設備監視和人們有關的行為、活動或者信息變化,例如使用閉路電視或者攔截遠距離電子傳輸的信息,如網絡流量,手機通信。視頻監控的潛在應用包括國土安全,反犯罪,交通控製,小孩、老人和病人的遠程看護。視頻監控技術為打擊恐怖主义和异常事件提供一小重要的防護。通過使用闭路电視摄像机等廉份的現代电子技朮使得視頻監控可成為可能。但是,除非一直監視著來自這些攝像機的反饋,否則它們提供僅僅是一些心理上安全。僱用足夠的人員來監督這些成千上萬的屏幕是不現實的,然而使用具有高度智能的現代自動化系統可以彌補這一空缺。 / 對于全天候地準確地管理成千上萬地攝像機,人工智能化的視頻監控是非常必要而且重要的。通常來說,智能監控包括以下部分: 1 信息獲取,如利用一個或者多個攝像機或者熱感成像或深度成像攝像機; 2 視頻分析,如目標檢測,識別,跟蹤,再識別或分割。3 存儲和傳輸,如編碼,分類和製片。在本文中,我們構建一個智能監控系統,其包括三個相互協作的摄像機用來估計感興趣物體的3D位置並且進行研究和跟蹤。為了識別物體,我們提出級聯頭肩檢測器尋找人臉區域進行識別。感興趣物體分割出來用于任意形狀物體編碼器對物體進行壓縮。 / 在第一部分中,我們討論如何使多個攝像頭在一起工作。在我們系統中,兩個固定的攝像機像人眼一樣註視著整個監控場景,搜尋非正常事件。如果有警報被非正常事件激活, PTZ攝像機會用來處理該事件,例如去跟蹤或者調查不明物體。利用相機標定技術,我們可以估計出物體的3D信息并將其傳輪到三個攝像機。 / 在第二部分中,我們提出級聯頭肩檢測器來檢測正面的頭肩并進行高級別的物體分析,例如識別和異常行為分析。在檢測器中,我們提出利用級聯結構融閤兩種強大的特徵, Harar-like 特微和HOG特徽,他們能有傚的檢測人臉和行人。利用Harr-like特徵,頭肩檢測器能夠在初期用有限的計算去除非頭肩區域。檢測的區域可以用來識別和分割。 / 在第三部分中,利用訓練的糢型,人臉區域可以從檢測到的頭肩區域中提取。利用CAMshift對人臉區域進行細化。在視頻監控的環境中,人臉識別是十分具有挑戰性的,因為人臉圖像受到多種因素的影響,例如在不均勻光綫條件下變化姿態和非聚焦糢糊的人臉。基于上述觀測,我們提出一種使用OLPF特微結閤AGMM糢型的人臉識別方法,其中OLPF特徵不僅不受糢糊圖像的影響,而且對人臉的姿態很魯棒。AGMM能夠很好地構建多種人臉。對標準測試集和實際數據的實驗結果證明了我們提出的方法一直地优于其它最先進的人臉識別方法。 / 在第四部分中,我們提出一種自動人體分割系統。首先,我們用檢測到的人臉或者人體對graph cut分割模型初始化并使用max-flow /min-cut算法對graph進行優化。針對有缺點的檢測目標的情況,采用一種基于coarse-to-fine的分割策略。我們提出抹除背景差別技術和自適應初始化level set 技術來解決存在于通用模型中的讓人頭疼的分割問題,例如發生在高差別的物體邊界區域或者在物體和背景中存在相同顏色的錯誤分割。實驗結果證明了我們的人體分割系統在實時視頻圖像和具有復雜背景的標準測試序列中都能很好的運作。 / 在最后部分中,我們專註于怎么樣對視頻內容進行智能的壓縮。在最近幾十年里,視頻編碼研究取得了巨大的成就,例如H.264/AVC標準和下一代的HEVC標準,它們的壓縮性能大大的超過以往的標準,高于50% 。但是相對于MPEG-4 ,在最新的編碼標準中缺少了壓縮任意形狀物體的能力。雖然在現在的H.264/AVC 中提供了片組結構和彈性模塊組閤技術,但是它仍然不能準確地高效地處理任意形狀區域。為了解決H.264/AVC 的這一缺點,我們提出基于H.264/AVC編碼框架的任意形狀物體編碼,它包括二值圖像編碼,運動補償和紋理編碼。在我們系統里,我們采用了1) 用新的運動估計改進的二值圖像編碼,它對二值塊的預測很有用。2) 在紋理編碼中,采用新的任意形狀整型變換來壓縮紋理信息,它是一種從4x4的ICT衍生出來的變換。3)和一些讓該編碼器勻新的框架兼容的相關編碼技術。我們把編碼器應用到高清視頻序列並且從客觀方便和主觀方面對編碼器進行評估。實驗結果證明了我們的編碼器遠遠超越以前的物體編碼方法並且十分接近H.264/AVC 的編碼性能。 / Surveillance is the process of monitoring the behaviour, activities, or changing information, usually of people for the purpose of managing, directing or protecting by means of electronic equipment, such as closed-circuit television (CCTV) camera or interception of electronically transmitted information from a distance, such as Internet or phone calls. Some potential surveillance applications are homeland security, anti-crime, traffic control, monitoring children, elderly and patients at a distance. Surveillance technology provides a shield against terrorism and abnormal event, and cheap modern electronics makes it possible to implement with CCTV cameras. But unless the feeds from those cameras are constantly monitored, they only provide an illusion of security. Finding enough observers to watch thousands of screens simply is impractical, yet modern automated systems can solve the problems with a surprising degree of intelligence. / Surveillance with intelligence is necessary and important to accurately mange the information from millions of sensors in 7/24 hours. Generally, intelligent surveillance includes: 1. information acquirement, like a single or the collaboration of multiple cameras, thermal or depth camera; 2. video analysis, like object detection, recognition, tracking, re-identification and segmentation; 3. storage and transmission, like coding, classification, and footage. In this thesis, we build an intelligent surveillance system, in which three cameras working collaboratively to estimate the position of the object of interest (OOI) in 3D space, investigate and track it. In order to identify the OOI, Cascade Head-Shoulder Detector is proposed to find the face region for recognition. The object can be segmented out and compressed by arbitrarily shaped object coding (ASOC). / In the first part, we discuss how to make the multiple cameras work together. In our system, two stationary cameras, like human eyes, are focusing on the whole scene of the surveillance region to observe abnormal events. If an alarm is triggered by abnormal instance, a PTZ camera will be assigned to deal with it, such as tracking orinvestigating the object. With calibrated cameras, the 3D information of the object can be estimated and communicated among the three cameras. / In the second part, cascade head-shoulder detector (CHSD) is proposed to detect the frontal head-shoulder region in the surveillance videos. The high-level object analysis will be performed on the detected region, e.g., recognition and abnormal behaviour analysis. In the detector, we propose a cascading structure that fuses the two powerful features: Haar-like feature and HOG feature, which have been used to detect face and pedestrian efficiently. With the Haar-like feature, CHSD can reject most of non-headshoulder regions in the earlier stages with limited computations. The detected region can be used for recognition and segmentation. / In the third part, the face region can be extracted from the detected head-shoulder region with training the body model. Continuously adaptive mean shift (CAMshift) is proposed to refine the face region. Face recognition is a very challenging problem in surveillance environment because the face image suffers from the concurrence of multiple factors, such as a variant pose with out-of-focused blurring under non-uniform lighting condition. Based on this observations, we propose a face recognition method using overlapping local phase feature (OLPF) feature and adaptive Gaussian mixture model (AGMM). OLPF feature is not only invariant to blurring but also robust to pose variations and AGMM can robustly model the various faces. Experiments conducted on standard dataset and real data demonstrate that the proposed method consistently outperforms the state-of-art face recognition methods. / In the forth part, we propose an automatic human body segmentation system. We first initialize graph cut using the detected face/body and optimize the graph by maxflow/ min-cut. And then a coarse-to-fine segmentation strategy is employed to deal with the imperfectly detected object. Background contrast removal (BCR) and selfadaptive initialization level set (SAILS) are proposed to solve the tough problems that exist in the general graph cut model, such as errors occurred at object boundary with high contrast and similar colors in the object and background. Experimental results demonstrate that our body segmentation system works very well in live videos and standard sequences with complex background. / In the last part, we concentrate on how to intelligently compress the video context. In recent decades, video coding research has achieved great progress, such as inH.264/AVC and next generation HEVC whose compression performance significantly exceeds previous standards by more than 50%. But as compared with the MPEG-4, the capability of coding arbitrarily shaped objects is absent from the following standards. Despite of the provision of slice group structures and flexible macroblock ordering (FMO) in the current H.264/AVC, it cannot deal with arbitrarily shaped regions accurately and efficiently. To solve the limitation of H.264/AVC, we propose the arbitrarily shaped object coding (ASOC) based on the framework H.264/AVC, which includes binary alpha coding, motion compensation and texture coding. In our ASOC, we adopt (1) an improved binary alpha Coding with a novel motion estimation to facilitate the binary alpha blocks prediction, (2) an arbitrarily shaped integer transform derivative from the 4×4 ICT in H.264/AVC to code texture and (3) associated coding techniques to make ASOC more compatible with the new framework. We extent ASOC to HD video and evaluate it objectively and subjectively. Experimental results prove that our ASOC significantly outperforms previous object-coding methods and performs close to the H.264/AVC. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Liu, Qiang. / "November 2012." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 123-135). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstracts in English and Chinese. / Dedication --- p.ii / Acknowledgments --- p.iii / Abstract --- p.vii / Publications --- p.x / Nomenclature --- p.xii / Contents --- p.xviii / List of Figures --- p.xxii / List of Tables --- p.xxiii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation and objectives --- p.1 / Chapter 1.2 --- A brief review of camera calibration --- p.2 / Chapter 1.3 --- Object detection --- p.5 / Chapter 1.3.1 --- Face detection --- p.5 / Chapter 1.3.2 --- Pedestrian detection --- p.7 / Chapter 1.4 --- Recognition --- p.8 / Chapter 1.5 --- Segmentation --- p.10 / Chapter 1.5.1 --- Thresholding-based methods --- p.11 / Chapter 1.5.2 --- Clustering-based methods --- p.11 / Chapter 1.5.3 --- Histogram-based methods --- p.12 / Chapter 1.5.4 --- Region-growing methods --- p.12 / Chapter 1.5.5 --- Level set methods --- p.13 / Chapter 1.5.6 --- Graph cut methods --- p.13 / Chapter 1.5.7 --- Neural network-based methods --- p.14 / Chapter 1.6 --- Object-based video coding --- p.14 / Chapter 1.7 --- Organization of thesis --- p.16 / Chapter 2 --- Cameras Calibration --- p.18 / Chapter 2.1 --- Introduction --- p.18 / Chapter 2.2 --- Basic Equations --- p.21 / Chapter 2.2.1 --- Parameters of Camera Model --- p.22 / Chapter 2.2.2 --- Two-view homography induced by a Plane --- p.22 / Chapter 2.3 --- Pair-wise pose estimation --- p.23 / Chapter 2.3.1 --- Homography estimation --- p.24 / Chapter 2.3.2 --- Calculation of n and λ --- p.24 / Chapter 2.3.3 --- (R,t) Estimation --- p.25 / Chapter 2.4 --- Distortion analysis and correction --- p.27 / Chapter 2.5 --- Feature detection and matching --- p.28 / Chapter 2.6 --- 3D point estimation and evaluation --- p.30 / Chapter 2.7 --- Conclusion --- p.34 / Chapter 3 --- Cascade Head-Shoulder Detector --- p.35 / Chapter 3.1 --- Introduction --- p.35 / Chapter 3.2 --- Cascade head-shoulder detection --- p.36 / Chapter 3.2.1 --- Initial feature rejecter --- p.37 / Chapter 3.2.2 --- Haar-like rejecter --- p.39 / Chapter 3.2.3 --- HOG feature classifier --- p.40 / Chapter 3.2.4 --- Cascade of classifiers --- p.45 / Chapter 3.3 --- Experimental results and analysis --- p.46 / Chapter 3.3.1 --- CHSD training --- p.46 / Chapter 3.4 --- Conclusion --- p.49 / Chapter 4 --- A Robust Face Recognition in Surveillance --- p.50 / Chapter 4.1 --- Introduction --- p.50 / Chapter 4.2 --- Cascade head-shoulder detection --- p.53 / Chapter 4.2.1 --- Body model training --- p.53 / Chapter 4.2.2 --- Face region refinement --- p.54 / Chapter 4.3 --- Face recognition --- p.56 / Chapter 4.3.1 --- Overlapping local phase feature (OLPF) --- p.56 / Chapter 4.3.2 --- Fixed Gaussian Mixture Model (FGMM) --- p.59 / Chapter 4.3.3 --- Adaptive Gaussian mixture model --- p.61 / Chapter 4.4 --- Experimental verification --- p.62 / Chapter 4.4.1 --- Preprocessing --- p.62 / Chapter 4.4.2 --- Face recognition --- p.63 / Chapter 4.5 --- Conclusion --- p.66 / Chapter 5 --- Human Body Segmentation --- p.68 / Chapter 5.1 --- Introduction --- p.68 / Chapter 5.2 --- Proposed automatic human body segmentation system --- p.70 / Chapter 5.2.1 --- Automatic human body detection --- p.71 / Chapter 5.2.2 --- Object Segmentation --- p.73 / Chapter 5.2.3 --- Self-adaptive initialization level set --- p.79 / Chapter 5.2.4 --- Object Updating --- p.86 / Chapter 5.3 --- Experimental results --- p.87 / Chapter 5.3.1 --- Evaluation using real-time videos and standard sequences --- p.87 / Chapter 5.3.2 --- Comparison with Other Methods --- p.87 / Chapter 5.3.3 --- Computational complexity analysis --- p.91 / Chapter 5.3.4 --- Extensions --- p.93 / Chapter 5.4 --- Conclusion --- p.93 / Chapter 6 --- Arbitrarily Shaped Object Coding --- p.94 / Chapter 6.1 --- Introduction --- p.94 / Chapter 6.2 --- Arbitrarily shaped object coding --- p.97 / Chapter 6.2.1 --- Shape coding --- p.97 / Chapter 6.2.2 --- Lossy alpha coding --- p.99 / Chapter 6.2.3 --- Motion compensation --- p.102 / Chapter 6.2.4 --- Texture coding --- p.105 / Chapter 6.3 --- Performance evaluation --- p.108 / Chapter 6.3.1 --- Objective evaluations --- p.108 / Chapter 6.3.2 --- Extension on HD sequences --- p.112 / Chapter 6.3.3 --- Subjective evaluations --- p.115 / Chapter 6.4 --- Conclusions --- p.119 / Chapter 7 --- Conclusions and future work --- p.120 / Chapter 7.1 --- Contributions --- p.120 / Chapter 7.1.1 --- 3D object positioning --- p.120 / Chapter 7.1.2 --- Automatic human body detection --- p.120 / Chapter 7.1.3 --- Human face recognition --- p.121 / Chapter 7.1.4 --- Automatic human body segmentation --- p.121 / Chapter 7.1.5 --- Arbitrarily shaped object coding --- p.121 / Chapter 7.2 --- Future work --- p.122 / Bibliography --- p.123

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328099
Date January 2013
ContributorsLiu, Qiang, Chinese University of Hong Kong Graduate School. Division of Electronic Engineering.
Source SetsThe Chinese University of Hong Kong
LanguageChinese, English, English
Detected LanguageEnglish
TypeText, bibliography
Formatelectronic resource, electronic resource, remote, 1 online resource (xxiii, 135 leaves) : ill. (chiefly col.)
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.1961 seconds