• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 38
  • 13
  • 6
  • 5
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 91
  • 91
  • 30
  • 19
  • 19
  • 17
  • 14
  • 13
  • 12
  • 12
  • 12
  • 11
  • 10
  • 10
  • 9
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Multi-person tracking system for complex outdoor environments

Tanase, Cristina-Madalina January 2015 (has links)
The thesis represents the research in the domain of modern video tracking systems and presents the details of the implementation of such a system. Video surveillance is a high point of interest and it relies on robust systems that interconnect several critical modules: data acquisition, data processing, background modeling, foreground detection and multiple object tracking. The present work analyzes different state of the art methods that are suitable for each module. The emphasis of the thesis is on the background subtraction stage, as the final accuracy and performance of the person tracking dramatically dependent on it. The experimental results show the performance of four different foreground detection algorithms, including two variations of self-organizing feature maps for background modeling, a machine learning technique. The undertaken work provides a comprehensive view of the actual state of the research in the foreground detection field and multiple object tracking and offers solution for common problems that occur when tracking in complex scenes. The chosen data set for experiments covers extremely different and complex scenes (outdoor environments) that allow a detailed study of the appropriate approaches and emphasize the weaknesses and strengths of each algorithm. The proposed system handles problems like: dynamic backgrounds, illumination changes, camouflage, cast shadows, frequent occlusions and crowded scenes. The tracking obtains a maximum Multiple Object Tracking Accuracy of 92,5% for the standard video sequence MWT and a minimum of 32,3% for an extremely difficult sequence that challenges every method.
12

Einstellung zur Videoüberwachung als Habituation

Mühler, Kurt 27 May 2014 (has links) (PDF)
Bürger weisen eine positive Einstellung gegenüber Videoüberwachung auf, obwohl sie sehr wenig über Videoüberwachung nachdenken, wenig über die Zahl und Verteilung der Videokameras in ihrer Stadt wissen, Videoüberwachung nicht mit ihren Bürgerrechten in Beziehung bringen sowie dem Staat „blind\\\\\\\\\\\\\\\" vertrauen. Klocke resümiert: Das Unwissen über die Kamerawirklichkeit ist als ein Anzeichen für bürgerrechtliche Unmotiviertheit und mangelnde Freiheitssensibilität anzusehen. Daraus ergibt sich die Forschungsfrage dieses Aufsatzes, welche darauf abzielt nicht die Einstellung zur Videoüberwachung, sondern die (geringe) Aufmerksamkeit gegenüber Videoüberwachung zu erklären: Warum sind Menschen gleichgültig gegenüber Videoüberwachung, obwohl dadurch eines ihrer Grundrechte beeinträchtigt wird?
13

The presence and perceived impact of video surveillance technology in Indiana public schools as reported by building principals

Willey, James R. January 2008 (has links)
Thesis (D. Ed.)--Ball State University, 2008. / Title from PDF t.p. (viewed on Nov. 09, 2009). Includes bibliographical references (p. 182-191).
14

Design and Evaluation of Contextualized Video Interfaces

Wang, Yi 29 September 2010 (has links)
If “a picture is worth a thousand words,” then a video may be worth a thousand pictures. Videos have been increasingly used in multiple applications, including surveillance, teleconferencing, learning and experience sharing. Since a video captures a scene from a particular viewpoint, it can often be understood better if presented within a larger spatial context. We call such interactive visualizations that combine videos with their spatial context "Contextualized Videos". Over recent years, multiple innovative Contextualized Video interfaces have been proposed to taking advantage of the latest computer graphics and video processing technologies. These interfaces opened a huge design space with numerous design possibilities, each with its own benefits and limitations. To avoid piecemeal understanding of the design space, this dissertation systematically designs and evaluates Contextualized Video interfaces based on a taxonomy of tasks that can potentially benefit from Contextualized Videos. This dissertation first formalizes a design space. New designs are created incrementally along the four major dimensions of the design space. These designs are then empirically compared through a series of controlled experiments using multiple tasks. The tasks are carefully selected from a task taxonomy, which helps to avoid piecemeal understanding of the effect of the designs. Our design practices and empirical evaluations result in a set of design guidelines on how to choose proper designs according to the characteristics of the tasks and the users. Finally, we demonstrate how to apply the design guidelines to prototype a complex interface for a specific video surveillance application. / Ph. D.
15

Local deformation modelling for non-rigid structure from motion

Kavamoto Fayad, João Renato January 2013 (has links)
Reconstructing the 3D geometry of scenes based on monocular image sequences is a long-standing problem in computer vision. Structure from motion (SfM) aims at a data-driven approach without requiring a priori models of the scene. When the scene is rigid, SfM is a well understood problem with solutions widely used in industry. However, if the scene is non-rigid, monocular reconstruction without additional information is an ill-posed problem and no satisfactory solution has yet been found. Current non-rigid SfM (NRSfM) methods typically aim at modelling deformable motion globally. Additionally, most of these methods focus on cases where deformable motion is seen as small variations from a mean shape. In turn, these methods fail at reconstructing highly deformable objects such as a flag waving in the wind. Additionally, reconstructions typically consist of low detail, sparse point-cloud representation of objects. In this thesis we aim at reconstructing highly deformable surfaces by modelling them locally. In line with a recent trend in NRSfM, we propose a piecewise approach which reconstructs local overlapping regions independently. These reconstructions are merged into a global object by imposing 3D consistency of the overlapping regions. We propose our own local model – the Quadratic Deformation model – and show how patch division and reconstruction can be formulated in a principled approach by alternating at minimizing a single geometric cost – the image re-projection error of the reconstruction. Moreover, we extend our approach to dense NRSfM, where reconstructions are preformed at the pixel level, improving the detail of state of the art reconstructions. Finally we show how our principled approach can be used to perform simultaneous segmentation and reconstruction of articulated motion, recovering meaningful segments which provide a coarse 3D skeleton of the object.
16

Intelligent surveillance system employing object detection, recognition, segmentation, and object-based coding. / CUHK electronic theses & dissertations collection

January 2013 (has links)
視頻監控通常是指為了管理、引導和保護人們,通過電子設備監視和人們有關的行為、活動或者信息變化,例如使用閉路電視或者攔截遠距離電子傳輸的信息,如網絡流量,手機通信。視頻監控的潛在應用包括國土安全,反犯罪,交通控製,小孩、老人和病人的遠程看護。視頻監控技術為打擊恐怖主义和异常事件提供一小重要的防護。通過使用闭路电視摄像机等廉份的現代电子技朮使得視頻監控可成為可能。但是,除非一直監視著來自這些攝像機的反饋,否則它們提供僅僅是一些心理上安全。僱用足夠的人員來監督這些成千上萬的屏幕是不現實的,然而使用具有高度智能的現代自動化系統可以彌補這一空缺。 / 對于全天候地準確地管理成千上萬地攝像機,人工智能化的視頻監控是非常必要而且重要的。通常來說,智能監控包括以下部分: 1 信息獲取,如利用一個或者多個攝像機或者熱感成像或深度成像攝像機; 2 視頻分析,如目標檢測,識別,跟蹤,再識別或分割。3 存儲和傳輸,如編碼,分類和製片。在本文中,我們構建一個智能監控系統,其包括三個相互協作的摄像機用來估計感興趣物體的3D位置並且進行研究和跟蹤。為了識別物體,我們提出級聯頭肩檢測器尋找人臉區域進行識別。感興趣物體分割出來用于任意形狀物體編碼器對物體進行壓縮。 / 在第一部分中,我們討論如何使多個攝像頭在一起工作。在我們系統中,兩個固定的攝像機像人眼一樣註視著整個監控場景,搜尋非正常事件。如果有警報被非正常事件激活, PTZ攝像機會用來處理該事件,例如去跟蹤或者調查不明物體。利用相機標定技術,我們可以估計出物體的3D信息并將其傳輪到三個攝像機。 / 在第二部分中,我們提出級聯頭肩檢測器來檢測正面的頭肩并進行高級別的物體分析,例如識別和異常行為分析。在檢測器中,我們提出利用級聯結構融閤兩種強大的特徵, Harar-like 特微和HOG特徽,他們能有傚的檢測人臉和行人。利用Harr-like特徵,頭肩檢測器能夠在初期用有限的計算去除非頭肩區域。檢測的區域可以用來識別和分割。 / 在第三部分中,利用訓練的糢型,人臉區域可以從檢測到的頭肩區域中提取。利用CAMshift對人臉區域進行細化。在視頻監控的環境中,人臉識別是十分具有挑戰性的,因為人臉圖像受到多種因素的影響,例如在不均勻光綫條件下變化姿態和非聚焦糢糊的人臉。基于上述觀測,我們提出一種使用OLPF特微結閤AGMM糢型的人臉識別方法,其中OLPF特徵不僅不受糢糊圖像的影響,而且對人臉的姿態很魯棒。AGMM能夠很好地構建多種人臉。對標準測試集和實際數據的實驗結果證明了我們提出的方法一直地优于其它最先進的人臉識別方法。 / 在第四部分中,我們提出一種自動人體分割系統。首先,我們用檢測到的人臉或者人體對graph cut分割模型初始化并使用max-flow /min-cut算法對graph進行優化。針對有缺點的檢測目標的情況,采用一種基于coarse-to-fine的分割策略。我們提出抹除背景差別技術和自適應初始化level set 技術來解決存在于通用模型中的讓人頭疼的分割問題,例如發生在高差別的物體邊界區域或者在物體和背景中存在相同顏色的錯誤分割。實驗結果證明了我們的人體分割系統在實時視頻圖像和具有復雜背景的標準測試序列中都能很好的運作。 / 在最后部分中,我們專註于怎么樣對視頻內容進行智能的壓縮。在最近幾十年里,視頻編碼研究取得了巨大的成就,例如H.264/AVC標準和下一代的HEVC標準,它們的壓縮性能大大的超過以往的標準,高于50% 。但是相對于MPEG-4 ,在最新的編碼標準中缺少了壓縮任意形狀物體的能力。雖然在現在的H.264/AVC 中提供了片組結構和彈性模塊組閤技術,但是它仍然不能準確地高效地處理任意形狀區域。為了解決H.264/AVC 的這一缺點,我們提出基于H.264/AVC編碼框架的任意形狀物體編碼,它包括二值圖像編碼,運動補償和紋理編碼。在我們系統里,我們采用了1) 用新的運動估計改進的二值圖像編碼,它對二值塊的預測很有用。2) 在紋理編碼中,采用新的任意形狀整型變換來壓縮紋理信息,它是一種從4x4的ICT衍生出來的變換。3)和一些讓該編碼器勻新的框架兼容的相關編碼技術。我們把編碼器應用到高清視頻序列並且從客觀方便和主觀方面對編碼器進行評估。實驗結果證明了我們的編碼器遠遠超越以前的物體編碼方法並且十分接近H.264/AVC 的編碼性能。 / Surveillance is the process of monitoring the behaviour, activities, or changing information, usually of people for the purpose of managing, directing or protecting by means of electronic equipment, such as closed-circuit television (CCTV) camera or interception of electronically transmitted information from a distance, such as Internet or phone calls. Some potential surveillance applications are homeland security, anti-crime, traffic control, monitoring children, elderly and patients at a distance. Surveillance technology provides a shield against terrorism and abnormal event, and cheap modern electronics makes it possible to implement with CCTV cameras. But unless the feeds from those cameras are constantly monitored, they only provide an illusion of security. Finding enough observers to watch thousands of screens simply is impractical, yet modern automated systems can solve the problems with a surprising degree of intelligence. / Surveillance with intelligence is necessary and important to accurately mange the information from millions of sensors in 7/24 hours. Generally, intelligent surveillance includes: 1. information acquirement, like a single or the collaboration of multiple cameras, thermal or depth camera; 2. video analysis, like object detection, recognition, tracking, re-identification and segmentation; 3. storage and transmission, like coding, classification, and footage. In this thesis, we build an intelligent surveillance system, in which three cameras working collaboratively to estimate the position of the object of interest (OOI) in 3D space, investigate and track it. In order to identify the OOI, Cascade Head-Shoulder Detector is proposed to find the face region for recognition. The object can be segmented out and compressed by arbitrarily shaped object coding (ASOC). / In the first part, we discuss how to make the multiple cameras work together. In our system, two stationary cameras, like human eyes, are focusing on the whole scene of the surveillance region to observe abnormal events. If an alarm is triggered by abnormal instance, a PTZ camera will be assigned to deal with it, such as tracking orinvestigating the object. With calibrated cameras, the 3D information of the object can be estimated and communicated among the three cameras. / In the second part, cascade head-shoulder detector (CHSD) is proposed to detect the frontal head-shoulder region in the surveillance videos. The high-level object analysis will be performed on the detected region, e.g., recognition and abnormal behaviour analysis. In the detector, we propose a cascading structure that fuses the two powerful features: Haar-like feature and HOG feature, which have been used to detect face and pedestrian efficiently. With the Haar-like feature, CHSD can reject most of non-headshoulder regions in the earlier stages with limited computations. The detected region can be used for recognition and segmentation. / In the third part, the face region can be extracted from the detected head-shoulder region with training the body model. Continuously adaptive mean shift (CAMshift) is proposed to refine the face region. Face recognition is a very challenging problem in surveillance environment because the face image suffers from the concurrence of multiple factors, such as a variant pose with out-of-focused blurring under non-uniform lighting condition. Based on this observations, we propose a face recognition method using overlapping local phase feature (OLPF) feature and adaptive Gaussian mixture model (AGMM). OLPF feature is not only invariant to blurring but also robust to pose variations and AGMM can robustly model the various faces. Experiments conducted on standard dataset and real data demonstrate that the proposed method consistently outperforms the state-of-art face recognition methods. / In the forth part, we propose an automatic human body segmentation system. We first initialize graph cut using the detected face/body and optimize the graph by maxflow/ min-cut. And then a coarse-to-fine segmentation strategy is employed to deal with the imperfectly detected object. Background contrast removal (BCR) and selfadaptive initialization level set (SAILS) are proposed to solve the tough problems that exist in the general graph cut model, such as errors occurred at object boundary with high contrast and similar colors in the object and background. Experimental results demonstrate that our body segmentation system works very well in live videos and standard sequences with complex background. / In the last part, we concentrate on how to intelligently compress the video context. In recent decades, video coding research has achieved great progress, such as inH.264/AVC and next generation HEVC whose compression performance significantly exceeds previous standards by more than 50%. But as compared with the MPEG-4, the capability of coding arbitrarily shaped objects is absent from the following standards. Despite of the provision of slice group structures and flexible macroblock ordering (FMO) in the current H.264/AVC, it cannot deal with arbitrarily shaped regions accurately and efficiently. To solve the limitation of H.264/AVC, we propose the arbitrarily shaped object coding (ASOC) based on the framework H.264/AVC, which includes binary alpha coding, motion compensation and texture coding. In our ASOC, we adopt (1) an improved binary alpha Coding with a novel motion estimation to facilitate the binary alpha blocks prediction, (2) an arbitrarily shaped integer transform derivative from the 4×4 ICT in H.264/AVC to code texture and (3) associated coding techniques to make ASOC more compatible with the new framework. We extent ASOC to HD video and evaluate it objectively and subjectively. Experimental results prove that our ASOC significantly outperforms previous object-coding methods and performs close to the H.264/AVC. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Liu, Qiang. / "November 2012." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 123-135). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstracts in English and Chinese. / Dedication --- p.ii / Acknowledgments --- p.iii / Abstract --- p.vii / Publications --- p.x / Nomenclature --- p.xii / Contents --- p.xviii / List of Figures --- p.xxii / List of Tables --- p.xxiii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation and objectives --- p.1 / Chapter 1.2 --- A brief review of camera calibration --- p.2 / Chapter 1.3 --- Object detection --- p.5 / Chapter 1.3.1 --- Face detection --- p.5 / Chapter 1.3.2 --- Pedestrian detection --- p.7 / Chapter 1.4 --- Recognition --- p.8 / Chapter 1.5 --- Segmentation --- p.10 / Chapter 1.5.1 --- Thresholding-based methods --- p.11 / Chapter 1.5.2 --- Clustering-based methods --- p.11 / Chapter 1.5.3 --- Histogram-based methods --- p.12 / Chapter 1.5.4 --- Region-growing methods --- p.12 / Chapter 1.5.5 --- Level set methods --- p.13 / Chapter 1.5.6 --- Graph cut methods --- p.13 / Chapter 1.5.7 --- Neural network-based methods --- p.14 / Chapter 1.6 --- Object-based video coding --- p.14 / Chapter 1.7 --- Organization of thesis --- p.16 / Chapter 2 --- Cameras Calibration --- p.18 / Chapter 2.1 --- Introduction --- p.18 / Chapter 2.2 --- Basic Equations --- p.21 / Chapter 2.2.1 --- Parameters of Camera Model --- p.22 / Chapter 2.2.2 --- Two-view homography induced by a Plane --- p.22 / Chapter 2.3 --- Pair-wise pose estimation --- p.23 / Chapter 2.3.1 --- Homography estimation --- p.24 / Chapter 2.3.2 --- Calculation of n and λ --- p.24 / Chapter 2.3.3 --- (R,t) Estimation --- p.25 / Chapter 2.4 --- Distortion analysis and correction --- p.27 / Chapter 2.5 --- Feature detection and matching --- p.28 / Chapter 2.6 --- 3D point estimation and evaluation --- p.30 / Chapter 2.7 --- Conclusion --- p.34 / Chapter 3 --- Cascade Head-Shoulder Detector --- p.35 / Chapter 3.1 --- Introduction --- p.35 / Chapter 3.2 --- Cascade head-shoulder detection --- p.36 / Chapter 3.2.1 --- Initial feature rejecter --- p.37 / Chapter 3.2.2 --- Haar-like rejecter --- p.39 / Chapter 3.2.3 --- HOG feature classifier --- p.40 / Chapter 3.2.4 --- Cascade of classifiers --- p.45 / Chapter 3.3 --- Experimental results and analysis --- p.46 / Chapter 3.3.1 --- CHSD training --- p.46 / Chapter 3.4 --- Conclusion --- p.49 / Chapter 4 --- A Robust Face Recognition in Surveillance --- p.50 / Chapter 4.1 --- Introduction --- p.50 / Chapter 4.2 --- Cascade head-shoulder detection --- p.53 / Chapter 4.2.1 --- Body model training --- p.53 / Chapter 4.2.2 --- Face region refinement --- p.54 / Chapter 4.3 --- Face recognition --- p.56 / Chapter 4.3.1 --- Overlapping local phase feature (OLPF) --- p.56 / Chapter 4.3.2 --- Fixed Gaussian Mixture Model (FGMM) --- p.59 / Chapter 4.3.3 --- Adaptive Gaussian mixture model --- p.61 / Chapter 4.4 --- Experimental verification --- p.62 / Chapter 4.4.1 --- Preprocessing --- p.62 / Chapter 4.4.2 --- Face recognition --- p.63 / Chapter 4.5 --- Conclusion --- p.66 / Chapter 5 --- Human Body Segmentation --- p.68 / Chapter 5.1 --- Introduction --- p.68 / Chapter 5.2 --- Proposed automatic human body segmentation system --- p.70 / Chapter 5.2.1 --- Automatic human body detection --- p.71 / Chapter 5.2.2 --- Object Segmentation --- p.73 / Chapter 5.2.3 --- Self-adaptive initialization level set --- p.79 / Chapter 5.2.4 --- Object Updating --- p.86 / Chapter 5.3 --- Experimental results --- p.87 / Chapter 5.3.1 --- Evaluation using real-time videos and standard sequences --- p.87 / Chapter 5.3.2 --- Comparison with Other Methods --- p.87 / Chapter 5.3.3 --- Computational complexity analysis --- p.91 / Chapter 5.3.4 --- Extensions --- p.93 / Chapter 5.4 --- Conclusion --- p.93 / Chapter 6 --- Arbitrarily Shaped Object Coding --- p.94 / Chapter 6.1 --- Introduction --- p.94 / Chapter 6.2 --- Arbitrarily shaped object coding --- p.97 / Chapter 6.2.1 --- Shape coding --- p.97 / Chapter 6.2.2 --- Lossy alpha coding --- p.99 / Chapter 6.2.3 --- Motion compensation --- p.102 / Chapter 6.2.4 --- Texture coding --- p.105 / Chapter 6.3 --- Performance evaluation --- p.108 / Chapter 6.3.1 --- Objective evaluations --- p.108 / Chapter 6.3.2 --- Extension on HD sequences --- p.112 / Chapter 6.3.3 --- Subjective evaluations --- p.115 / Chapter 6.4 --- Conclusions --- p.119 / Chapter 7 --- Conclusions and future work --- p.120 / Chapter 7.1 --- Contributions --- p.120 / Chapter 7.1.1 --- 3D object positioning --- p.120 / Chapter 7.1.2 --- Automatic human body detection --- p.120 / Chapter 7.1.3 --- Human face recognition --- p.121 / Chapter 7.1.4 --- Automatic human body segmentation --- p.121 / Chapter 7.1.5 --- Arbitrarily shaped object coding --- p.121 / Chapter 7.2 --- Future work --- p.122 / Bibliography --- p.123
17

Learning based person re-identication across camera views.

January 2013 (has links)
行人再識別的主要任務是匹配不交叉的監控攝像頭中觀測到的行人。隨著監控攝像頭的普遍,這是一個非常重要的任務。並且,它是其他很多任務的重要子任務,例如跨攝像頭的跟蹤。行人再識別的難度存在於不同攝像頭中觀測到的同一個人會有很大的變化。這些變化來自於觀察角度的不同,光照的不同,和行人姿態的變化等等。在本文中,我們希望從如下的方面來重新思考並解決這個問題。 / 首先,我們發現當待匹配集合增大的時候,匹配的難度大幅度增加。在實際應用中,我們可以通過時間上的推演來減少待匹配集合的大小,簡化行人再識別這個問題。現有通過機器學習的方法來解決這個問題的算法基本會假設一個全局固定的度量。我們的方法來自提出於對於不同的待匹配集合應該有不同的度量的新觀點。因此,我們把這個問題重新定義在一個遷移學習的框架下。給定一個較大的訓練集合,我們通過訓練集合的樣本與當前的查詢集合和待匹配集合的相似程度,重新對訓練集合進行加權。這樣,我們提出一個加權的最大化邊界的度量學習方法,而這個度量較全訓練集共享的整體度量更加的具體。 / 我們進一步發現,在兩個不同的鏡頭中,物體形態的變換很難通過一個單一模型來進行描述。為了解決這一個問題,我們提出一個混合專家模型,要將圖片的空間進行進一步細化。我們的算法將剖分圖形空間和在每個細分後的空間中學習一個跨鏡頭的變換來將特征進行對齊。測試時,新樣本會與現有的“專家“模型進行匹配,選擇合適的變換。 我們通過一個稀疏正則項和最小信息損失正則項來進行約束。 / 在對上面各種方法的分析中,我們發現提取特征和訓練模型總是分開進行。一個更好的方法是將模型的訓練和特征提取同時進行。為此,我們希望能夠使用卷積神經網絡 來實現這個目標。通過精心設計網絡結構,底層網絡能夠通過兩組一一對應的特征來描 述圖像的局部信息。而這種信息對於匹配人的顏色紋理等特徵更加適合。在較高的層我 們希望學習到人在空間上的位移來判斷局部的位移是符合於人在不同攝像頭中的位移。 通過這些信息,我們的模型來決定這兩張圖片是否來自于同一個人。 / 在以上三個部分中,我們都同最先進的度量學習和其他基于特征設計的行人再識別方法進行比較。我們在不同的數據集上均取得了較為優秀的效果。我們進一步建立了一 個大規模的數據集,這個數據集包含更多的視角、更多的人且每個人在不同的視角下有 更多的圖片。 / Person re-identification is to match persons observed in non-overlapping camera views with visual features. This is an important task in video surveillance by itself and serves as metatask for other problems like inter-camera tracking. Challenges lie in the dramatic intra-person variation introduced by viewpoint change, illumination change and pose variation etc. In this thesis, we are trying to tackle this problem in the following aspects: / Firstly, we observe that the ambiguity increases with the number of candidates to be distinguished. In real world scenario, temporal reasoning is available and can simplify the problem by pruning the candidate set to be matched. Existing approaches adopt a fixed metric for matching all the subjects. Our approach is motivated by the insight that different visual metrics should be optimally learned for different candidate sets. The problem is further formulated under a transfer learning framework. Given a large training set, the training samples are selected and re-weighted according to their visual similarities with the query sample and its candidate set. A weighted maximum margin metric is learned and transferred from a generic metric to a candidate-set-specific metric. / Secondly, we observe that the transformations between two camera views may be too complex to be uni-modal. To tackle this, we propose to partition the image space and formulate the problem into a mixture of expert framework. Our algorithm jointly partitions the image spaces of two camera views into different configurations according to the similarity of cross-view transforms. The visual features of an image pair from different views are locally aligned by being projected to a common feature space and then matched with softly assigned metrics which are locally optimized. The features optimal for recognizing identities are different from those for clustering cross-view transforms. They are jointly learned by utilizing sparsity-inducing norm and information theoretical regularization. / In all the above analysis, feature extraction and learning models are separately designed. A better idea is to directly learn features from training samples and those features can be applied to directly train a discriminative models. We propose a new model where feature extraction is jointly learned with a discriminative convolutional neural network. Local filters at the bottom layer can well extract the information useful for matching persons across camera views like color and texture. Higher layers will capture the spatial shift of those local patches. Finally, we will test whether the shift patterns of those local patches conform to the intra-camera variation of the same person. / In all three parts, comparisons with the state-of-the-art metric learning algorithms and person re-identification methods are carried out and our approach shows the superior performance on public benchmark dataset. Furthermore, we are building a much larger dataset that addresses the real-world scenario which contains much more camera views, identities, and images perview. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Li, Wei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 63-68). / Abstracts also in Chinese. / Acknowledgments --- p.iii / Abstract --- p.vii / Contents --- p.xii / List of Figures --- p.xiv / List of Tables --- p.xv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Person Re-Identification --- p.1 / Chapter 1.2 --- Challenge in Person Re-Identification --- p.2 / Chapter 1.3 --- Literature Review --- p.4 / Chapter 1.3.1 --- Feature Based Person Re-Identification --- p.4 / Chapter 1.3.2 --- Learning Based Person Re-Identification --- p.7 / Chapter 1.4 --- Thesis Organization --- p.8 / Chapter 2 --- Tranferred Metric Learning for Person Re-Identification --- p.10 / Chapter 2.1 --- Introduction --- p.10 / Chapter 2.2 --- Related Work --- p.12 / Chapter 2.2.1 --- Transfer Learning --- p.12 / Chapter 2.3 --- Our Method --- p.13 / Chapter 2.3.1 --- Visual Features --- p.13 / Chapter 2.3.2 --- Searching and Weighting Training Samples --- p.13 / Chapter 2.3.3 --- Learning Adaptive Metrics by Maximizing Weighted Margins --- p.15 / Chapter 2.4 --- Experimental Results --- p.17 / Chapter 2.4.1 --- Dataset Description --- p.17 / Chapter 2.4.2 --- Generic Metric Learning --- p.18 / Chapter 2.4.3 --- Transferred Metric Learning --- p.19 / Chapter 2.5 --- Conclusions and Discussions --- p.21 / Chapter 3 --- Locally Aligned Feature Transforms for Person Re-Identification --- p.23 / Chapter 3.1 --- Introduction --- p.23 / Chapter 3.2 --- Related Work --- p.24 / Chapter 3.2.1 --- Localized Methods --- p.25 / Chapter 3.3 --- Model --- p.26 / Chapter 3.4 --- Learning --- p.27 / Chapter 3.4.1 --- Priors --- p.27 / Chapter 3.4.2 --- Objective Function --- p.29 / Chapter 3.4.3 --- Training Model --- p.29 / Chapter 3.4.4 --- Multi-Shot Extension --- p.30 / Chapter 3.4.5 --- Discriminative Metric Learning --- p.31 / Chapter 3.5 --- Experiment --- p.32 / Chapter 3.5.1 --- Identification with Two Fixed Camera Views --- p.33 / Chapter 3.5.2 --- More General Camera Settings --- p.37 / Chapter 3.6 --- Conclusions --- p.38 / Chapter 4 --- Deep Neural Network for Person Re-identification --- p.39 / Chapter 4.1 --- Introduction --- p.39 / Chapter 4.2 --- Related Work --- p.43 / Chapter 4.3 --- Introduction of the New Dataset --- p.44 / Chapter 4.4 --- Model --- p.46 / Chapter 4.4.1 --- Architecture Overview --- p.46 / Chapter 4.4.2 --- Convolutional and Max-Pooling Layer --- p.48 / Chapter 4.4.3 --- Patch Matching Layer --- p.49 / Chapter 4.4.4 --- Maxout Grouping Layer --- p.52 / Chapter 4.4.5 --- Part Displacement --- p.52 / Chapter 4.4.6 --- Softmax Layer --- p.53 / Chapter 4.5 --- Training Strategies --- p.54 / Chapter 4.5.1 --- Data Augmentation and Balancing --- p.55 / Chapter 4.5.2 --- Bootstrapping --- p.55 / Chapter 4.6 --- Experiment --- p.56 / Chapter 4.6.1 --- Model Specification --- p.56 / Chapter 4.6.2 --- Validation on Single Pair of Cameras --- p.57 / Chapter 4.7 --- Conclusion --- p.58 / Chapter 5 --- Conclusion --- p.60 / Chapter 5.1 --- Conclusion --- p.60 / Chapter 5.2 --- Future Work --- p.61 / Bibliography --- p.63
18

Motion Detection for Video Surveillance

Rahman, Junaedur January 2008 (has links)
This thesis is related to the broad subject of automatic motion detection and analysis in videosurveillance image sequence. Besides, proposing the new unique solution, some of the previousalgorithms are evaluated, where some of the approaches are noticeably complementary sometimes.In real time surveillance, detecting and tracking multiple objects and monitoring their activities inboth outdoor and indoor environment are challenging task for the video surveillance system. Inpresence of a good number of real time problems limits scope for this work since the beginning. Theproblems are namely, illumination changes, moving background and shadow detection.An improved background subtraction method has been followed by foreground segmentation, dataevaluation, shadow detection in the scene and finally the motion detection method. The algorithm isapplied on to a number of practical problems to observe whether it leads us to the expected solution.Several experiments are done under different challenging problem environment. Test result showsthat under most of the problematic environment, the proposed algorithm shows the better qualityresult.
19

Embedded early vision techniques for efficient background modeling and midground detection

Valentine, Brian Evans 26 March 2010 (has links)
An automated vision system performs critical tasks in video surveillance, while decreasing costs and increasing efficiency. It can provide high quality scene monitoring without the limitations of human distraction and fatigue. Advances in embedded processors, wireless networks, and imager technology have enabled computer vision systems to be deployed pervasively in stationary surveillance monitors, hand-held devices, and vehicular sensors. However, the size, weight, power, and cost requirements of these platforms present a great challenge in developing real-time systems. This dissertation explores the development of background modeling algorithms for surveillance on embedded platforms. Our contributions are as follows: - An efficient pixel-based adaptive background model, called multimodal mean, which produces results comparable to the widely used mixture of Gaussians multimodal approach, at a much reduced computational cost and greater control of occluded object persistence. - A novel and efficient chromatic clustering-based background model for embedded vision platforms that leverages the color uniformity of large, permanent background objects to yield significant speedups in execution time. - A multi-scale temporal model for midground analysis which provides a means to "tune-in" to changes in the scene beyond the standard background/foreground framework, based on user-defined temporal constraints. Multimodal mean reduces instruction complexity with the use of fixed integer arithmetic and periodic long-term adaptation that occurs once every d frames. When combined with fixed thresholding, it performs 6.2 times faster than the mixture of Gaussians method while using 18% less storage. Furthermore, fixed thresholding compares favorably to standard deviation thresholding with a percentage difference in error less than five percent when used on scenes with stable lighting conditions and modest multimodal activity. The chromatic clustering-based approach to optimized background modeling takes advantage of the color distributions in large permanent background objects, such as a road, building, or sidewalk, to speedup execution time. It abstracts their colors to a small color palette and suppresses their adaptation during processing. When run on a representative embedded platform it reduces storage usage by 58% and increases runtime execution by 45%. Multiscale temporal modeling for midground analysis presents a unified approach for scene analysis that can be applied to several application domains. It extends scene analysis from the standard background/foreground framework to one that includes a temporal midground object saliency window that is defined by the user. When applied to stationary object detection, the midground model provides accurate results at low sampling frame rates (~ 1 fps) while using only 18 Mbytes of storage and 15 Mops/sec processing throughput.
20

Adaptive video defogging base on background modeling

Yuk, Shun-cho, Jacky, 郁順祖 January 2013 (has links)
The performance of intelligent video surveillance systems is always degraded under complicated scenarios, like dynamic changing backgrounds and extremely bad weathers. Dynamic changing backgrounds make the foreground/background segmentation, which is often the first step in vision-based algorithms, become unreliable. Bad weathers, such as foggy scenes, not only degrade the visual quality of the monitoring videos, but also seriously affect the accuracy of the vision-based algorithms. In this thesis, a fast and robust texture-based background modeling technique is first presented for tackling the problem of foreground/background segmentation under dynamic backgrounds. An adaptive multi-modal framework is proposed which uses a novel texture feature known as scale invariant local states (SILS) to model an image pixel. A pattern-less probabilistic measurement (PLPM) is also derived to estimate the probability of a pixel being background from its SILS. Experimental results show that texture-based background modeling is more robust than illumination-based approaches under dynamic backgrounds and lighting changes. Furthermore, the proposed background modeling technique can run much faster than the existing state-of-the-art texture-based method, without sacrificing the output quality. Two fast adaptive defogging techniques, namely 1) foreground decremental preconditioned conjugate gradient (FDPCG), and 2) adaptive guided image filtering are next introduced for removing the foggy effects on video scenes. These two methods allow the estimation of the background transmissions to converge over consecutive video frames, and then background-defog the video sequences using the background transmission map. Results show that foreground/background segmentation can be improved dramatically with such background-defogged video frames. With the reliable foreground/ background segmentation results, the foreground transmissions can then be recovered by the proposed 1) foreground incremental preconditioned conjugate gradient (FIPCG), or 2) on-demand guided image filtering. Experimental results show that the proposed methods can effectively improve the visual quality of surveillance videos under heavy fog and bad weathers. Comparing with state-of-the-art image defogging methods, the proposed methods are shown to be much more efficient. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy

Page generated in 0.0735 seconds