Global ETD Search

351	Continuous memories for representing sets of vectors and image collections / Mémoires continues représentant des ensembles de vecteurs et des collections d’images Iscen, Ahmet 25 September 2017 (has links) Cette thèse étudie l'indexation et le mécanisme d'expansion de requête en recherche d'image. L'indexation sacrifie la qualité de la recherche pour une plus grande efficacité; l'expansion de requête prend ce compromis dans l'autre sens : il améliore la qualité de la recherche avec un coût en complexité additionnel. Nous proposons des solutions pour les deux approches qui utilisent une représentation continue d'un ensemble de vecteurs. Pour l'indexation, notre solution est basée sur le test par groupe. Chaque vecteur image est assigné à un groupe, et chaque groupe est représenté par un seul vecteur. C'est la représentation continue de l'ensemble des vecteur du groupe. L'optimisation de cette représentation pour produire un bon test d'appartenance donne une solution basée sur la pseudo-inverse de Moore-Penrose. Elle montre des performances supérieures à celles d'une somme basique des vecteurs du groupe. Nous proposons aussi une alternative suivant au plus près les vecteurs-images de la base. Elle optimise conjointement l'assignation des vecteurs images à des groupes ainsi que la représentation vectorielle de ces groupes. La deuxième partie de la thèse étudie le mécanisme d'expansion de requête au moyen d'un graphe pondéré représentant les vecteurs images. Cela permet de retrouver des images similaires le long d'une même variété géométrique, mais éloignées en distance Euclidienne. Nous donnons une implémentation ultra-rapide de ce mécanisme en créant des représentations vectorielles incorporant la diffusion. Ainsi, le mécanisme d'expansion se réduit à un simple produit scalaire entre les représentations vectorielles lors de la requête. Les deux parties de la thèse fournissent une analyse théorique et un travail expérimental approfondi utilisant les protocoles et les jeux de données standards en recherche d'images. Les méthodes proposées ont des performances supérieures à l'état de l'art. / In this thesis, we study the indexing and query expansion problems in image retrieval. The former sacrifices the accuracy for efficiency, whereas the latter takes the opposite perspective and improves accuracy with additional cost. Our proposed solutions to both problems consist of utilizing continuous representations of a set of vectors. We turn our attention to indexing first, and follow the group testing scheme. We assign each dataset vector to a group, and represent each group with a single vector representation. We propose memory vectors, whose solution is optimized under the membership test hypothesis. The optimal solution for this problem is based on Moore-Penrose pseudo-inverse, and shows superior performance compared to basic sum pooling. We also provide a data-driven approach optimizing the assignment and representation jointly. The second half of the transcript focuses on the query expansion problem, representing a set of vectors with weighted graphs. This allows us to retrieve objects that lie on the same manifold, but further away in Euclidean space. We improve the efficiency of our technique even further, creating high-dimensional diffusion embeddings offline, so that they can be compared with a simple dot product in the query time. For both problems, we provide thorough experiments and analysis in well-known image retrieval benchmarks and show the improvements achieved by proposed methods. Vision par ordinateur Indexation Computer vision Indexing
352	Describable Visual Attributes for Face Images Kumar, Neeraj January 2011 (has links) We introduce the use of describable visual attributes for face images. Describable visual attributes are labels that can be given to an image to describe its appearance. This thesis focuses mostly on images of faces and the attributes used to describe them, although the concepts also apply to other domains. Examples of face attributes include gender, age, jaw shape, nose size, etc. The advantages of an attribute-based representation for vision tasks are manifold: they can be composed to create descriptions at various levels of specificity; they are generalizable, as they can be learned once and then applied to recognize new objects or categories without any further training; and they are efficient, possibly requiring exponentially fewer attributes (and training data) than explicitly naming each category. We show how one can create and label large datasets of real-world images to train classifiers which measure the presence, absence, or degree to which an attribute is expressed in images. These classifiers can then automatically label new images. We demonstrate the current effectiveness and explore the future potential of using attributes for image search, automatic face replacement in images, and face verification, via both human and computational experiments. To aid other researchers in studying these problems, we introduce two new large face datasets, named FaceTracer and PubFig, with labeled attributes and identities, respectively. Finally, we also show the effectiveness of visual attributes in a completely different domain: plant species identification. To this end, we have developed and publicly released the Leafsnap system, which has been downloaded by almost half a million users. The mobile phone application is a flexible electronic field guide with high-quality images of the tree species in the Northeast US. It also gives users instant access to our automatic recognition system, greatly simplifying the identification process. Computer science Computer vision Image processing
353	Adaptive visual servoing in uncalibrated environments. January 2004 (has links) Wang Hesheng. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 70-73). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Contents --- p.v / List of Figures --- p.vii / List of Tables --- p.viii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Visual Servoing --- p.1 / Chapter 1.1.1 --- Position-based Visual Servoing --- p.4 / Chapter 1.1.2 --- Image-based Visual Servoing --- p.5 / Chapter 1.1.3 --- Camera Configurations --- p.7 / Chapter 1.2 --- Problem Definitions --- p.10 / Chapter 1.3 --- Related Work --- p.11 / Chapter 1.4 --- Contribution of This Work --- p.15 / Chapter 1.5 --- Organization of This Thesis --- p.16 / Chapter 2 --- System Modeling --- p.18 / Chapter 2.1 --- The Coordinates Frames --- p.18 / Chapter 2.2 --- The System Kinematics --- p.20 / Chapter 2.3 --- The System Dynamics --- p.21 / Chapter 2.4 --- The Camera Model --- p.23 / Chapter 2.4.1 --- Eye-in-hand System --- p.28 / Chapter 2.4.2 --- Eye-and-hand System --- p.32 / Chapter 3 --- Adaptive Image-based Visual Servoing --- p.35 / Chapter 3.1 --- Controller Design --- p.35 / Chapter 3.2 --- Estimation of The Parameters --- p.38 / Chapter 3.3 --- Stability Analysis --- p.42 / Chapter 4 --- Simulation --- p.48 / Chapter 4.1 --- Simulation I --- p.49 / Chapter 4.2 --- Simulation II --- p.51 / Chapter 5 --- Experiments --- p.55 / Chapter 6 --- Conclusions --- p.63 / Chapter 6.1 --- Conclusions --- p.63 / Chapter 6.2 --- Feature Work --- p.64 / Appendix --- p.66 / Bibliography --- p.70 Robots--Control systems Computer vision Servomechanisms
354	From pose estimation to structure and motion. January 2004 (has links) Yu Ying-Kin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 108-116). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgements --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation and Objectives --- p.1 / Chapter 1.2 --- Problem Definition --- p.3 / Chapter 1.3 --- Contributions --- p.6 / Chapter 1.4 --- Related Publications --- p.8 / Chapter 1.5 --- Organization of the Paper --- p.9 / Chapter 2 --- Background --- p.11 / Chapter 2.1 --- Introduction --- p.11 / Chapter 2.2 --- Pose Estimation --- p.12 / Chapter 2.2.1 --- Overview --- p.12 / Chapter 2.2.2 --- Lowe's Method --- p.14 / Chapter 2.2.3 --- The Genetic Algorithm by Hati and Sen- gupta --- p.15 / Chapter 2.3 --- Structure and Motion --- p.17 / Chapter 2.3.1 --- Overview --- p.17 / Chapter 2.3.2 --- The Extended Lowe's Method --- p.20 / Chapter 2.3.3 --- The Extended Kalman Filter by Azarbaye- jani and Pentland --- p.23 / Chapter 3 --- Model-based Pose Tracking Using Genetic Algo- rithms --- p.27 / Chapter 3.1 --- Introduction --- p.27 / Chapter 3.2 --- Overview of the Algorithm --- p.28 / Chapter 3.3 --- Chromosome Encoding --- p.29 / Chapter 3.4 --- The Genetic Operators --- p.30 / Chapter 3.4.1 --- Mutation --- p.30 / Chapter 3.4.2 --- Crossover --- p.31 / Chapter 3.5 --- Fitness Evaluation --- p.31 / Chapter 3.6 --- The Roulette Wheel Proportionate Selection Scheme --- p.32 / Chapter 3.7 --- The Genetic Algorithm Parameters --- p.33 / Chapter 3.8 --- Experiments and Results --- p.34 / Chapter 3.8.1 --- Synthetic Data Experiments --- p.34 / Chapter 3.8.2 --- Real Scene Experiments --- p.38 / Chapter 4 --- Recursive 3D Structure Acquisition Based on Kalman Filtering --- p.42 / Chapter 4.1 --- Introduction --- p.42 / Chapter 4.2 --- Overview of the Algorithm --- p.43 / Chapter 4.2.1 --- Feature Extraction and Tracking --- p.44 / Chapter 4.2.2 --- Model Initialization --- p.44 / Chapter 4.2.3 --- Structure and Pose Updating --- p.45 / Chapter 4.3 --- Structure Updating --- p.46 / Chapter 4.4 --- Pose Estimation --- p.49 / Chapter 4.5 --- Handling of the Changeable Set of Feature Points --- p.52 / Chapter 4.6 --- Analytical Comparisons with Other Algorithms --- p.54 / Chapter 4.6.1 --- Comparisons with the Interleaved Bundle Adjustment Method --- p.54 / Chapter 4.6.2 --- Comparisons with the EKF by Azarbaye- jani and Pentland --- p.56 / Chapter 4.7 --- Experiments and Results --- p.57 / Chapter 4.7.1 --- Synthetic Data Experiments --- p.57 / Chapter 4.7.2 --- Real Scene Experiments --- p.58 / Chapter 5 --- Simultaneous Pose Tracking and Structure Acqui- sition Using the Interacting Multiple Model --- p.63 / Chapter 5.1 --- Introduction --- p.63 / Chapter 5.2 --- Overview of the Algorithm --- p.65 / Chapter 5.2.1 --- Feature Extraction and Tracking --- p.65 / Chapter 5.2.2 --- Model Initialization --- p.66 / Chapter 5.2.3 --- Structure and Pose Updating --- p.66 / Chapter 5.3 --- Pose Estimation --- p.67 / Chapter 5.3.1 --- The Interacting Multiple Model Algorithm --- p.67 / Chapter 5.3.2 --- Design of the Individual EKFs --- p.71 / Chapter 5.4 --- Structure Updating --- p.74 / Chapter 5.5 --- Handling of the Changeable Set of Feature Points --- p.76 / Chapter 5.6 --- Analytical Comparisons with Other EKF-Based Algorithms --- p.77 / Chapter 5.6.1 --- Computation Speed --- p.77 / Chapter 5.6.2 --- Accuracy of the Recovered Pose Sequences --- p.79 / Chapter 5.7 --- Experiments and Results --- p.80 / Chapter 5.7.1 --- Synthetic Data Experiments --- p.80 / Chapter 5.7.2 --- Real Scene Experiments --- p.80 / Chapter 6 --- Empirical Comparisons of the Structure and Mo- tion Algorithms --- p.87 / Chapter 6.1 --- Introduction --- p.87 / Chapter 6.2 --- Comparisons Using Synthetic Data --- p.88 / Chapter 6.2.1 --- Image Residual Errors --- p.88 / Chapter 6.2.2 --- Computation Efficiency --- p.89 / Chapter 6.2.3 --- Accuracy of Recovered Pose Sequences . . --- p.91 / Chapter 6.3 --- Comparisons Using Real Images --- p.92 / Chapter 6.4 --- Summary --- p.97 / Chapter 7 --- Future Work --- p.99 / Chapter 8 --- Conclusion --- p.101 / Chapter A --- Kalman Filtering --- p.103 / Bibliography --- p.107 Computer vision Image processing--Digital techniques
355	Modeling collective crowd behaviors in video. January 2012 (has links) 群體行為分析是一個跨學科的研究課題.理解群體協作行為的形成機制，是社會科學和自然科學的根本問題之一.群體行為分析的研究可以為很多關鍵的工程應用提供支持和解決方案，比如智能視頻監控系統，人群異常檢測和公共設施優化.在這篇論文中，說們通過研究和分析真實場景中採集的視頻數據，對群體行為提出了有效的計算框架和算法，來分析這視頻中出現的動態群體模式和行為. / 在第一個章節中，我們提出了一個基於馬爾科夫隨機場的圖模型框架，來分析場景中與群體行為相閥的語羲區域. 這個模型利用馬爾科夫隨機場來聯繫行人軌跡的時空關係，可以從高度分散的行人軌跡中進行數據挖掘，以形成完整的群體行為語義區域.其得到的這些語義區域完整地反映出了不同群體行為的進行模式，具有良好的準確性. 這項研究工作已經在IEEE 計算機視覺和模式識別會議(CVPR)2011 發表. / 為了探索語義區域形成的行為學機制，在第二個章節中，我們提出了一個新穎的動態行人代理人混合模型，來分析擁擠場景中出現的人群動態協作行為.每一種行人協作行為模式被建模成一個線性動態系統，行人在場景中的起始和結束位置放建模成這個動態系統的起始和結束狀態. 這個模型可以從高端分散的行人軌跡中分析出共有的協作行為模式。通過模擬行人的行動決策過程，該模型不僅可以分類不同的群體行為，還可以模擬和預測行人的未來可能路徑和目的地.這項研究工作已經在IEEE 計算機視覺和模式織別會議(CVPR) 2012 作為口頭報告發表. / 在第三個章節中，我們首先在協作動態運動中發現了一個先驗定律: 協作領域關係不變性.根據這個先驗定律，我們提出了一個簡單有效的動態聚類技術，稱為協作濾波器.這個動態聚類技術可以運用在多種動態系統中，並且在高密度噪聲下具有很強的魯棒性.在不同視頻中的實驗證明了協作領域關係不變性的存在以及協作濾波器的有效性.這項研究工作已經投稿歐洲計算機視覺會議(ECCV) 2012. / Crowd behavior analysis is an interdisciplinary topic. Understanding the collective crowd behaviors is one of the fundamental problems both in social science and natural science. Research of crowd behavior analysis can lead to a lot of critical applications, such as intelligent video surveillance, crowd abnormal detection, and public facility optimization. In this thesis, we study the crowd behaviors in the real scene videos, propose computational frameworks and techniques to analyze these dynamic patterns of the crowd, and apply them for a lot of visual surveillance applications. / Firstly we proposed Random Field Topic model for learning semantic regions of crowded scenes from highly fragmented trajectories. This model uses the Markov Random Field prior to capture the spatial and temporal dependency between tracklets and uses the source-sink prior to guide the learning of semantic regions. The learned semantic regions well capture the global structures of the scenes in long range with clear semantic interpretation. They are also able to separate different paths at fine scales with good accuracy. This work has been published in IEEE Conference on Computer Vision and PatternRecognition (CVPR) 2011 [70]. / To further explore the behavioral origin of semantic regions in crowded scenes, we proposed Mixture model of Dynamic Pedestrian-Agents to learn the collective dynamics from video sequences in crowded scenes. The collective dynamics of pedestrians are modeled as linear dynamic systems to capture long range moving patterns. Through modeling the beliefs of pedestrians and the missing states of observations, it can be well learned from highly fragmented trajectories caused by frequent tracking failures. By modeling the process of pedestrians making decisions on actions, it can not only classify collective behaviors, but also simulate and predict collective crowd behaviors. This work has been published in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012 as Oral [71]. The journal version of this work has been submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). / Moreover, based on a prior defined as Coherent Neighbor Invariance for coherent motions, we proposed a simple and effective dynamic clustering technique called Coherent Filtering for coherent motion detection. This generic technique could be used in various dynamic systems and work robustly under high-density noises. Experiments on different videos shows the existence of Coherent Neighbor Invariance and the effectiveness of our coherent motion detection technique. This work has been published in European Conference on Computer Vision (ECCV) 2012. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Zhou, Bolei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 67-73). / Abstracts also in Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background of Crowd Behavior Analysis --- p.1 / Chapter 1.2 --- Previous Approaches and Related Works --- p.2 / Chapter 1.2.1 --- Modeling Collective Motion --- p.2 / Chapter 1.2.2 --- Semantic Region Analysis --- p.3 / Chapter 1.2.3 --- Coherent Motion Detection --- p.5 / Chapter 1.3 --- Our Works for Crowd Behavior Analysis --- p.6 / Chapter 2 --- Semantic Region Analysis in Crowded Scenes --- p.9 / Chapter 2.1 --- Introduction of Semantic Regions --- p.9 / Chapter 2.1.1 --- Our approach --- p.11 / Chapter 2.2 --- Random Field Topic Model --- p.12 / Chapter 2.2.1 --- Pairwise MRF --- p.14 / Chapter 2.2.2 --- Forest of randomly spanning trees --- p.15 / Chapter 2.2.3 --- Inference --- p.16 / Chapter 2.2.4 --- Online tracklet prediction --- p.18 / Chapter 2.3 --- Experimental Results --- p.18 / Chapter 2.3.1 --- Learning semantic regions --- p.21 / Chapter 2.3.2 --- Tracklet clustering based on semantic regions --- p.22 / Chapter 2.4 --- Discussion and Summary --- p.24 / Chapter 3 --- Learning Collective Crowd Behaviors in Video --- p.26 / Chapter 3.1 --- Understand Collective Crowd Behaviors --- p.26 / Chapter 3.2 --- Mixture Model of Dynamic Pedestrian-Agents --- p.30 / Chapter 3.2.1 --- Modeling Pedestrian Dynamics --- p.30 / Chapter 3.2.2 --- Modeling Pedestrian Beliefs --- p.31 / Chapter 3.2.3 --- Mixture Model --- p.32 / Chapter 3.2.4 --- Model Learning and Inference --- p.32 / Chapter 3.2.5 --- Algorithms for Model Fitting and Sampling --- p.35 / Chapter 3.3 --- Modeling Pedestrian Timing of Emerging --- p.36 / Chapter 3.4 --- Experiments and Applications --- p.37 / Chapter 3.4.1 --- Model Learning --- p.37 / Chapter 3.4.2 --- Collective Crowd Behavior Simulation --- p.39 / Chapter 3.4.3 --- Collective Behavior Classification --- p.42 / Chapter 3.4.4 --- Behavior Prediction --- p.43 / Chapter 3.5 --- Discussion and Summary --- p.43 / Chapter 4 --- Detecting Coherent Motions from Clutters --- p.45 / Chapter 4.1 --- Coherent Motions in Nature --- p.45 / Chapter 4.2 --- A Prior of Coherent Motion --- p.46 / Chapter 4.2.1 --- Random Dot Kinematogram --- p.47 / Chapter 4.2.2 --- Invariance of Spatiotemporal Relationships --- p.49 / Chapter 4.2.3 --- Invariance of Velocity Correlations --- p.51 / Chapter 4.3 --- A Technique for Coherent Motion Detection --- p.52 / Chapter 4.3.1 --- Algorithm for detecting coherent motions --- p.53 / Chapter 4.3.2 --- Algorithm for associating continuous coherent motion --- p.53 / Chapter 4.4 --- Experimental Results --- p.54 / Chapter 4.4.1 --- Coherent Motion in Synthetic Data --- p.55 / Chapter 4.4.2 --- 3D Motion Segmentation --- p.57 / Chapter 4.4.3 --- Coherent Motions in Crowded Scenes --- p.60 / Chapter 4.4.4 --- Further Analysis of the Algorithm --- p.61 / Chapter 4.5 --- Discussion and Summary --- p.62 / Chapter 5 --- Conclusions --- p.65 / Chapter 5.1 --- Future Works --- p.66 Collective behavior Computer vision Visual perception
356	Adaptive visual servoing in uncalibrated environments. / CUHK electronic theses & dissertations collection / Digital dissertation consortium January 2002 (has links) Shen Yantao. / "April 2002." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (p. 117-122). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese. Robots--Control systems Computer vision Servomechanisms
357	Deformable surface recovery and its applications. / 可變形曲面恢復及應用 / CUHK electronic theses & dissertations collection / Ke bian xing qu mian hui fu ji ying yong January 2009 (has links) As for the 3D deformable surface recovery, the key challenge arises from the difficulty in estimating a large number of 3D shape parameters from noisy observations. In this thesis, 3D deformable surface tracking is formulated into an unconstrained quadratic problem that can be solved very efficiently by resolving a set of sparse linear equations. Furthermore, the robust progressive finite Newton method developed for nonrigid surface detection is employed to handle the large outliers. / For the appearance-based method, a deformable Lucas-Kanade algorithm is proposed which triangulates the template image into small patches and constrains the deformation through the second order derivatives of the mesh vertices. It is formulated into a sparse regularized least squares problem which is able to reduce the computational cost and the memory requirement. The inverse compositional algorithm is applied to efficiently solve the optimization problem. Furthermore, we present a fusion approach to take advantage of both the appearance information and the local features. / In addition to the methodologies studied and evaluated in computer vision, this thesis also investigates the nonrigid surface recovery in some real-world multimedia applications, such as Near-duplicate image retrieval and detection. In contrast to conventional approaches, the presented technique can recover an explicit mapping between two near-duplicate images with a few deformation parameters and find out the correct correspondences from noisy data effectively. To make the proposed technique applicable to large-scale applications, an effective multilevel ranking scheme is presented that filters out the irrelevant results in a coarse-to-fine manner. To overcome the extremely small training size challenge, a semi-supervised learning method is employed to improve the performance using unlabeled data. Extensive evaluations show that the presented method is clearly effective than conventional approaches. / Recovering deformable surfaces is an interesting and beneficial research problem for computer vision and image analysis. An effective deformable surface recovery technique can be applied in a variety of applications for surface reconstruction, digital entertainment, medical imaging and Augmented Reality. While considerable research efforts have been devoted to deformable surface modeling and fitting, there are only few schemes available to tackle the deformable surface recovery problem efficiently. This thesis proposes a set of methods to effectively solve the 2D nonrigid shape recovery and 3D deformable surface tracking based on a robust progressive optimization scheme. The presented techniques are also applied to a variety of real-world applications. / To tackle the 2D nonrigid shape recovery problem, this thesis first presents a novel progressive finite Newton optimization scheme, which is based on the local feature correspondences. The key of this approach is to formulate the nonrigid shape recovery as an unconstrained quadratic optimization problem which has a closed-form solution for a given set of observations. / Without resorting to an explicit deformable mesh model, the nonrigid surface detection can be treated as a generic regression problem. A novel velocity coherence constraint is imposed on the deformable shape model to regularize the ill-posed optimization problem. To handle the large outliers, a progressive optimization scheme is employed. / Zhu, Jianke. / Adviser: Michael R. Lyu. / Source: Dissertation Abstracts International, Volume: 70-09, Section: B, page: . / Thesis submitted in: December 2008. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 161-175). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Computer vision Image analysis Surfaces, Deformation of
358	Adaptive visual servoing of robots in uncalibrated environments. / CUHK electronic theses & dissertations collection January 2007 (has links) One of the major problems that obstruct the development of adaptive visual servoing is the fact that the image Jacobian or the interaction matrix cannot be linearly parameterized by the unknown parameters. To solve this problem, we propose a depth-independent interaction matrix, which is obtained by eliminating the depth in the traditional interaction matrix. Using this depth-independent interaction matrix in controller design, it is possible to make the unknown parameters appear linearly in the closed-loop dynamics. As a result, we can use an adaptive algorithm, similar to that proposed by Slotine and Li [1], to estimate the unknown parameters on-line. To guarantee the convergence of the image errors, in the parameter adaptation we combine the Slotine-Li algorithm with an on-line gradient descending minimization algorithm of the errors between the real and estimated image coordinates of the feature points. On the basis of the depth-independent interaction matrix and the new adaptive algorithm, we first propose an adaptive controller for image-based visual servoing of point features using both uncalibrated eye-in-hand and fixed cameras. Then, we extend the controller to visual servoing using line features with an eye-in-hand camera. Next, we present a dynamic controller for trajectory tracking of feature points on a robot manipulator in 3D general motion using fixed uncalibrated camera. To avoid performance decaying caused by measurement errors of the visual velocity, we also propose a new controller for dynamics visual tracking without using visual velocities. Finally, we design a new controller for locking a moving object in 3-D space at a particular position on the image plane of a camera mounted on a robot by actively moving the camera. The asymptotic stabilities of the system under the control of the proposed methods are rigorously proved by the Lyapunov theory with the nonlinear robot dynamics fully taken into account. The performances of the controllers have been verified by experiments on a 3 DOF robot manipulator. / The contribution of this thesis can be summarized as follows: First, a depth-independent interaction matrix is proposed for mapping the image errors onto the joint space. Second a new adaptive algorithm has been developed to estimate the unknown parameters. Finally, new methods to position and tracking control of robots with uncalibrated visual feedback in both eye-in-hand and fixed camera configuration are proposed. / Visual servoing is an approach to control motion of a robot manipulator using visual feedback signals from a vision system and has received extensive attention in recent years. Many existing methods work based on an assumption that the parameters of the vision system are accurately calibrated, while the calibration process is tedious. Furthermore, most of the controllers are designed using the kinematics relationship only, without considering the nonlinear dynamics of robots, so that they are not suitable for high performance and fast visual servoing tasks. Aiming at solving those two problems, this thesis addresses dynamic position and tracking control of robots with uncalibrated visual feedback. Both the fixed camera and eye-in-hand camera configurations are considered. / Wang, Hesheng. / "August 2007." / Adviser: Yun-Hui Liu. / Source: Dissertation Abstracts International, Volume: 69-02, Section: B, page: 1294. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 160-169). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307. Computer vision Robots--Control systems Servomechanisms
359	Embedding and hallucination for image and video. / 圖像視頻之嵌入與幻想研究 / CUHK electronic theses & dissertations collection / Tu xiang shi pin zhi kan ru yu huan xiang yan jiu January 2007 (has links) For face identification, especially by human, it is desirable to render a high-resolution (HR) face image from the low-resolution (LR) one, which is called face hallucination or face super-resolution. A number of super-resolution techniques have been proposed in recent years. However, for face hallucination the utilization of the special properties of the faces is conductive to generate the HR face images. / In this thesis, we propose a new face hallucination framework based on image patches, which integrates two novel statistical super-resolution models. Considering that image patches reflect the combined effect of personal characteristics and patch-location, we first formulate a TensorPatch model based on multilinear analysis to explicitly model the interaction between multiple constituent factors. Motivated by Locally Linear Embedding, we develop an enhanced multilinear patch hallucination algorithm, which efficiently exploits the local distribution structure in the sample space. To better preserve face subtle details, we derive the Coupled PCA algorithm to learn the relation between HR residue and LR residue, which is utilized for compensate the error residue in hallucinated images. Experiments demonstrate that our framework not only well maintains the global facial structures, but also recovers the detailed facial traits in high quality. (Abstract shortened by UMI.) / In this thesis, we propose a novel dimensionality reduction algorithm called graph-regularized projection (GRP) to tackle the problem of semi-supervised dimensionality reduction that is rarely investigated in the literature. Given partially labeled data points, GRP aims at learning a not only smooth but also discriminative projection from high-dimensional data vectors to their latent low-dimensional representations. Motivated by recent semi-supervised learning process: graph regularization, we develop a graph-based regularization framework to enforce smoothness along the graph of the desired projection initiated by margin maximization. As a result, GRP has a natural out-of-sample extension to novel examples and thus can be generalized to the entire high-dimensional space. Extensive experiments on a synthetic dataset and several real databases demonstrate the effectiveness of our algorithm. / Next, this thesis addresses the problem of how to learn an appropriate feature representation from video to benefit video-based face recognition. By simultaneously exploiting the spatial and temporal information, the problem is posed as learning Spatio-Temporal Embedding (STE) from raw videos. STE of a video sequence is defined as its condensed version capturing the essence of space-time characteristics of the video. Relying on the co-occurrence statistics and supervised signatures provided by training videos, STE preserves the intrinsic temporal structures hidden in video volume, meanwhile encodes the discriminative cues into the spatial domain. To conduct STE, we propose two novel techniques, Bayesian keyframe learning and nonparametric discriminant embedding (NDE), for temporal and spatial learning, respectively. In terms of learned STEs, we derive a statistical formulation to the recognition problem with a probabilistic fusion model. On a large face video database containing more than 200 training and testing sequences, our approach consistently outperforms the state-of-the-art methods, achieving a perfect recognition accuracy. / Liu, Wei. / "August 2007." / Advisers: Xiaoou Tang; Jianzhuang Liu. / Source: Dissertation Abstracts International, Volume: 69-02, Section: B, page: 1110. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 140-151). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307. Computer vision Image processing Optical pattern recognition
360	GPU-friendly visual computing. / CUHK electronic theses & dissertations collection January 2007 (has links) Real-time performance is necessary for video-rate processing. By utilizing GPU for acceleration, we propose an efficient technique for the warped display of surveillance video signal. Usually, there are regions of interest (ROIs) in video surveillance, such as entrance or exit, and moving objects or persons. The ii surveillant wants to see more of the ROIs, but also wants to have an overview of the whole surveillance scope. The warped display solves this conflict by locally zooming in the ROIs. / The above warped-display technique may not be able to capture more information. It only provides an efficient way to display the captured frame. To solve this problem, we propose a novel technique to automatically adjust the exposure and capture more information. Traditional automatic exposure control (AEC) is usually based on the intensity level. On the other hand, our technique is based on the information theory and the amount of information is measured by Shannon entropy. The computation of entropy is accelerated by GPU to achieve the video-rate performance. / Volume rendering is another hot research area. In this area, isosurfaces have been widely adopted to reveal the complex structures in volumetric data, due to its fine visual quality. We describe a GPU-based marching cubes (MC) algorithm to visualize multiple translucent isosurfaces. With the proposed parallel algorithm, we can naturally generate triangles in order, which facilitates the visibility-correct visualization of multiple translucent isosurfaces without computationally expensive sorting. Upon a commodity GPU, our implementation can extract isosurfaces from a high-resolution volume in real time and render the result. / We first present a GPU-friendly image rendering framework, which can achieve a wide range of non-photorealistic rendering (NPR) effects. Most of these effects usually require the tailor-made algorithms. By feeding with constant kernels, the usage of our framework is as simple as that of discrete linear filtering. However, our framework is non-linear and hence can mimic complex NPR effects, such as watercolor, painting, sketching, and so on. The core of our framework is the cellular neural networks (CNN). By relaxing the constraints in the traditional CNN, we demonstrate that various interesting and convincing results can be obtained. As CNN is locally connected and designed for massively parallel hardware, it fits nicely into the GPU hardware and the performance is improved a lot. / With the development of graphics processing unit (GPU), it is more and more efficient to solve complex algorithms with GPU because of its highly parallel structure and fast floating point operations. These complex algorithms were usually implemented with CPU previously. In this thesis, we propose several GPU-friendly concepts and algorithms to address some problems of visual computing, including: image rendering, video rendering, and volume rendering. / Wang Guangyu. / "September 2007." / Advisers: Pheng Am Heng; Tien-Tsin Wong. / Source: Dissertation Abstracts International, Volume: 69-08, Section: B, page: 4865. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 115-131). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Computer vision Image processing Rendering (Computer graphics)

Search results