Global ETD Search

191	Embedded early vision techniques for efficient background modeling and midground detection Valentine, Brian Evans 26 March 2010 (has links) An automated vision system performs critical tasks in video surveillance, while decreasing costs and increasing efficiency. It can provide high quality scene monitoring without the limitations of human distraction and fatigue. Advances in embedded processors, wireless networks, and imager technology have enabled computer vision systems to be deployed pervasively in stationary surveillance monitors, hand-held devices, and vehicular sensors. However, the size, weight, power, and cost requirements of these platforms present a great challenge in developing real-time systems. This dissertation explores the development of background modeling algorithms for surveillance on embedded platforms. Our contributions are as follows: - An efficient pixel-based adaptive background model, called multimodal mean, which produces results comparable to the widely used mixture of Gaussians multimodal approach, at a much reduced computational cost and greater control of occluded object persistence. - A novel and efficient chromatic clustering-based background model for embedded vision platforms that leverages the color uniformity of large, permanent background objects to yield significant speedups in execution time. - A multi-scale temporal model for midground analysis which provides a means to "tune-in" to changes in the scene beyond the standard background/foreground framework, based on user-defined temporal constraints. Multimodal mean reduces instruction complexity with the use of fixed integer arithmetic and periodic long-term adaptation that occurs once every d frames. When combined with fixed thresholding, it performs 6.2 times faster than the mixture of Gaussians method while using 18% less storage. Furthermore, fixed thresholding compares favorably to standard deviation thresholding with a percentage difference in error less than five percent when used on scenes with stable lighting conditions and modest multimodal activity. The chromatic clustering-based approach to optimized background modeling takes advantage of the color distributions in large permanent background objects, such as a road, building, or sidewalk, to speedup execution time. It abstracts their colors to a small color palette and suppresses their adaptation during processing. When run on a representative embedded platform it reduces storage usage by 58% and increases runtime execution by 45%. Multiscale temporal modeling for midground analysis presents a unified approach for scene analysis that can be applied to several application domains. It extends scene analysis from the standard background/foreground framework to one that includes a temporal midground object saliency window that is defined by the user. When applied to stationary object detection, the midground model provides accurate results at low sampling frame rates (~ 1 fps) while using only 18 Mbytes of storage and 15 Mops/sec processing throughput. Embedded processor Computer vision Computer vision Video surveillance
192	CFNet: A Synthesis for Video Colorization Ziyang Tang (6593525) 15 May 2019 (has links) Image to Image translation has been triggered a huge interests among the different topics in deep learning recent years. It provides a mapping function to encode the noisy input images into a high dimensional signal and translate it to the desired output images. The mapping can be one to one, many to one or one to many. Due to the uncertainty from the mapping functions, when extend the methods in video field, the flickering problems emerges. Even a slight change among the frames may bring a obvious change in the output images. In this thesis, we provide a two-stream solution as CFNet for the flickering problems in video colorizations. Compared with the frame-by-frame methods by the previous work, CFNet has a great improvement in allevaiting the flickering problems in video colorizations, especially for the video clips with large objects and still background. Compared with the baseline with frame by frame methods, CFNet improved the PSNR from 27 to 30, which is a great progress. Computer Vision GAN Computer Vision Colorization Deep learning
193	Deep Parameter Selection For Classic Computer Vision Applications Whitney, Michael 13 December 2021 (has links) A trend in computer vision today is to retire older, so-called "classic'' methods in favor of ones based on deep neural networks. This has led to tremendous improvements in many areas, but for some problems deep neural solutions may not yet exist or be of practical application. For this and other reasons, classic methods are still widely used in a variety of applications. This paper explores the possibility of using deep neural networks to improve these older methods instead of replace them. In particular, it addresses the issue of parameter selection in these algorithms by using a neural network to predict effective settings on a per-input basis. Specifically, we look at a straightforward and well-understood algorithm with one primary parameter: interactive graph-cut segmentation. This parameter balances region/boundary influences and heavily influences the resulting segmentation. Many approach tuning this parameter by using an ad hoc or empirically selected static setting, while others pre-analyze images to determine effective settings on a per-image basis. Tuning this parameter for each image, or even for each target selection within an image, is highly sensitive to properties of the image and object, suggesting that a network might be able to recognize these properties and predict settings that would improve performance. We employ a lightweight network with minimal layers to avoid adding significant computational overhead with this pre-analysis step. The network predicts the segmentation performance for each of a set of discretely sampled values for this parameter and selects the one with the highest predicted performance. Results demonstrate that this per-image prediction and tuning performs better than a single empirically selected setting. computer vision deep learning classic computer vision Physical Sciences and Mathematics
194	Lightweight and Sufficient Two Viewpoint Connections for Augmented Reality Chengyuan Lin (8793044) 05 May 2020 (has links) <p></p><p>Augmented Reality (AR) is a powerful computer to human visual interface that displays data overlaid onto the user's view of the real world. Compared to conventional visualization on a computer display, AR has the advantage of saving the user the cognitive effort of mapping the visualization to the real world. For example, a user wearing AR glasses can find a destination in an urban setting by following a virtual green line drawn by the AR system on the sidewalk, which is easier to do than having to rely on navigational directions displayed on a phone. Similarly, a surgeon looking at an operating field through an AR display can see graphical annotations authored by a remote mentor as if the mentor actually drew on the patient's body.</p> <p> </p> <p>However, several challenges remain to be addressed before AR can reach its full potential. This research contributes solutions to four such challenges. A first challenge is achieving visualization continuity for AR displays. Since truly transparent displays are not feasible, AR relies on simulating transparency by showing a live video on a conventional display. For correct transparency, the display should show exactly what the user would see if the display were not there. Since the video is not captured from the user viewpoint, simply displaying each frame as acquired results in visualization discontinuity and redundancy. A second challenge is providing the remote mentor with an effective visualization of the mentee's workspace in AR telementoring. Acquiring the workspace with a camera built into the mentee's AR headset is appealing since it captures the workspace from the mentee's viewpoint, and since it does not require external hardware. However, the workspace visualization is unstable as it changes frequently, abruptly, and substantially with each mentee head motion. A third challenge is occluder removal in diminished reality. Whereas in conventional AR the user's visualization of a real world scene is augmented with graphical annotations, diminished reality aims to aid the user's understanding of complex real world scenes by removing objects from the visualization. The challenge is to paint over occluder pixels using auxiliary videos acquired from different viewpoints, in real time, and with good visual quality. A fourth challenge is to acquire scene geometry from the user viewpoint, as needed in AR, for example, to integrate virtual annotations seamlessly into the real world scene through accurate depth compositing, and shadow and reflection casting and receiving.</p> <p> </p> <p>Our solutions are based on the thesis that images acquired from different viewpoints should not always be connected by computing a dense, per-pixel set of correspondences, but rather by devising custom, lightweight, yet sufficient connections between them, for each unique context. We have developed a self-contained phone-based AR display that aligns the phone camera and the user by views, reducing visualization discontinuity to less than 5% for scene distances beyond 5m. We have developed and validated in user studies an effective workspace visualization method by stabilizing the mentee first-person video feed through reprojection on a planar proxy of the workspace. We have developed a real-time occluder in-painting method for diminished reality based on a two-stage coarse-then-fine mapping between the user and the auxiliary view. The mapping is established in time linear with occluder contour length, and it achieves good continuity across the occluder boundary. We have developed a method for 3D scene acquisition from the user viewpoint based on single-image triangulation of correspondences between left and right eye corneal reflections. The method relies on a subpixel accurate calibration of the catadioptric imaging system defined by two corneas and a camera, which enables the extension of conventional epipolar geometry for a fast connection between corneal reflections.</p><p></p> Computer Graphics Computer Vision Augmented Reality Computer Graphics Computer Vision
195	FOCALSR: REVISITING IMAGE SUPER-RESOLUTION TRANSFORMERS WITH FFT-ENABLED CROSS ATTENTION LAYERS Botong Ou (17536914) 06 December 2023 (has links) <p dir="ltr">Motion blur arises from camera instability or swift movement of subjects within a scene. The objective of image deblurring is to eliminate these blur effects, thereby enhancing the image's quality. This task holds significant relevance, particularly in the era of smartphones and portable cameras. Yet, it remains a challenging issue, notwithstanding extensive research undertaken over many years. The fundamental concept in deblurring an image involves restoring a blurred pixel back to its initial state.</p><p dir="ltr">Deep learning (DL) algorithms, recognized for their capability to identify unique and significant features from datasets, have gained significant attention in the field of machine learning. These algorithms have been increasingly adopted in geoscience and remote sensing (RS) for analyzing large volumes of data. In these applications, low-level attributes like spectral and texture features form the foundational layer. The high-level feature representations derived from the upper layers of the network can be directly utilized in classifiers for pixel-based analysis. Thus, for enhancing the accuracy of classification using RS data, ensuring the clarity and quality of each collected data in the dataset is crucial for the effective construction of deep learning models.</p><p dir="ltr">In this thesis, we present the FFT-Cross Attention Transformer, an innovative approach amalgamating channel-focused and window-centric self-attention within a state-of-the-art(SOTA) Vision Transformer model. Augmented with a Fast Fourier Convolution Layer, this approach extends the Transformer's capability to capture intricate details in low-resolution images. Employing unified task pre-training during model development, we confirm the robustness of these enhancements through comprehensive testing, resulting in substantial performance gains. Notably, we achieve a remarkable 1dB improvement in the PSNR metric for remote sensing imagery, underscoring the transformative potential of the FFT-Cross Attention Transformer in advancing image processing and domain-specific vision tasks.</p> Computer vision Image processing Computer Vision Deep Learning Image Processing
196	Detection and Localization of Root Damages in Underground Sewer Systems using Deep Neural Networks and Computer Vision Techniques Muzi Zheng (14226701) 03 February 2023 (has links) <p> </p> <p>The maintenance of a healthy sewer infrastructure is a major challenge due to the root damages from nearby plants that grow through pipe cracks or loose joints, which may lead to serious pipe blockages and collapse. Traditional inspections based on video surveillance to identify and localize root damages within such complex sewer networks are inefficient, laborious, and error-prone. Therefore, this study aims to develop a robust and efficient approach to automatically detect root damages and localize their circumferential and longitudinal positions in CCTV inspection videos by applying deep neural networks and computer vision techniques. With twenty inspection videos collected from various resources, keyframes were extracted from each video according to the difference in a LUV color space with certain selections of local maxima. To recognize distance information from video subtitles, OCR models such as Tesseract and CRNN-CTC were implemented and led to a 90% of recognition accuracy. In addition, a pre-trained segmentation model was applied to detect root damages, but it also found many false positive predictions. By applying a well-tuned YoloV3 model on the detection of pipe joints leveraging the Convex Hull Overlap (<em>CHO</em>) feature, we were able to achieve a 20% improvement on the reliability and accuracy of damage identifications. Moreover, an end-to-end deep learning pipeline that involved Triangle Similarity Theorem (<em>TST</em>) was successfully designed to predict the longitudinal position of each identified root damage. The prediction error was less than 1.0 feet. </p> Computer vision Sewer Damage Deep Learning Computer Vision Distance Estimation
197	Multispectral analysis on a computer vision system Yan, Bolin, 1954- January 1989 (has links) A procedure of multispectral analysis was developed to classify a two category image. The procedure utilized pattern recognition and feature extraction techniques. Images were acquired using a computer vision system with a series of interference filters to limit the wavelength band of the images. The procedure developed for multispectral analysis is: (1) Filter selection and image acquisition. (2) Pattern recognition. (3) Bayes minimum error rate classification. (4) Feature extraction by Fisher transformation or by Hotelling transformation. The analytical procedure was programmed in Microsoft C computer language and implemented on an IBM AT computer. The system was tested by identifying an apple against a Formica background. The classified images and histograms indicated that the separation was possible. Spectrum analysis. Robotics. Computer vision.
198	Some topics on similarity metric learning Cao, Qiong January 2015 (has links) The success of many computer vision problems and machine learning algorithms critically depends on the quality of the chosen distance metrics or similarity functions. Due to the fact that the real-data at hand is inherently task- and data-dependent, learning an appropriate distance metric or similarity function from data for each specific task is usually superior to the default Euclidean distance or cosine similarity. This thesis mainly focuses on developing new metric and similarity learning models for three tasks: unconstrained face verification, person re-identification and kNN classification. Unconstrained face verification is a binary matching problem, the target of which is to predict whether two images/videos are from the same person or not. Concurrently, person re-identification handles pedestrian matching and ranking across non-overlapping camera views. Both vision problems are very challenging because of the large transformation differences in images or videos caused by pose, expression, occlusion, problematic lighting and viewpoint. To address the above concerns, two novel methods are proposed. Firstly, we introduce a new dimensionality reduction method called Intra-PCA by considering the robustness to large transformation differences. We show that Intra-PCA significantly outperforms the classic dimensionality reduction methods (e.g. PCA and LDA). Secondly, we propose a novel regularization framework called Sub-SML to learn distance metrics and similarity functions for unconstrained face verifica- tion and person re-identification. The main novelty of our formulation is to incorporate both the robustness of Intra-PCA to large transformation variations and the discriminative power of metric and similarity learning, a property that most existing methods do not hold. Working with the task of kNN classification which relies a distance metric to identify the nearest neighbors, we revisit some popular existing methods for metric learning and develop a general formulation called DMLp for learning a distance metric from data. To obtain the optimal solution, a gradient-based optimization algorithm is proposed which only needs the computation of the largest eigenvector of a matrix per iteration. Although there is a large number of studies devoted to metric/similarity learning based on different objective functions, few studies address the generalization analysis of such methods. We describe a novel approch for generalization analysis of metric/similarity learning which can deal with general matrix regularization terms including the Frobenius norm, sparse L1-norm, mixed (2, 1)-norm and trace-norm. The novel models developed in this thesis are evaluated on four challenging databases: the Labeled Faces in the Wild dataset for unconstrained face verification in still images; the YouTube Faces database for video-based face verification in the wild; the Viewpoint Invariant Pedestrian Recognition database for person re-identification; the UCI datasets for kNN classification. Experimental results show that the proposed methods yield competitive or state-of-the-art performance. 004 Machine Learning ; Computer Vision
199	A two-level model-based object recognition technique 黃業新, Wong, Yip-san. January 1995 (has links) published_or_final_version / Computer Science / Master / Master of Philosophy Pattern recognition systems. Computer vision.
200	Robust feature-point based image matching Sze, Wui-fung., 施會豐. January 2006 (has links) published_or_final_version / abstract / Electrical and Electronic Engineering / Master / Master of Philosophy Computer vision. Image processing. Algorithms.

Search results