Global ETD Search

901	Efficient techniques for video shot segmentation and retrieval. / CUHK electronic theses & dissertations collection January 2007 (has links) Video segmentation is the first step to most content-based video analysis. In this thesis, several methods have been proposed to detect shot transitions including cut and wipe. In particular, a new cut detection method is proposed to apply multi-adaptive thresholds during three-step processing of frame-by-frame discontinuity values. A "likelihood value", which measures the possibility of the presence of a cut at each step of processing, is used to reduce the influence of threshold selection to the detection performance. A wipe detection algorithm is also proposed in our thesis to detect various wipe effects with accurate frame ranges. In the algorithm, we carefully model a wipe based on its properties and then use the model to remove possible confusion caused by motion or other transition effects. / With the segmented video shots, video indexing and retrieval systems retrieve video shots using shot-based similarity matching based on the features of shot key-frames. Most shot-based similarity matching methods focus on low-level features such as color and texture. Those methods are often not effective enough in video retrieval due to the large gap between semantic interpretation of videos and the low level features. In this thesis, we propose an attention-driven video retrieval method by using an efficient spatiotemporal attention detection framework. Within the framework, we propose an efficient method for focus of attention (FOA) detection which involves combining adaptively the spatial and motion attention to form an overall attention map. Without computing motion explicitly, it detects motion attention using the rank deficiency of gray scale gradient tensors. We also propose an attention-driven shot matching method using primarily FOA. The matching method boosts the attended regions in the respective shots by converting attention values to importance factors in the process of shot similarity matching. Experiment results demonstrate the advantages of the proposed method in shot similarity matching. / Li, Shan. / "September 2007." / Adviser: Moon-Chuen Lee. / Source: Dissertation Abstracts International, Volume: 69-02, Section: B, page: 1108. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 150-168). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307. Digital video Image processing Information retrieval
902	Embedding and hallucination for image and video. / 圖像視頻之嵌入與幻想研究 / CUHK electronic theses & dissertations collection / Tu xiang shi pin zhi kan ru yu huan xiang yan jiu January 2007 (has links) For face identification, especially by human, it is desirable to render a high-resolution (HR) face image from the low-resolution (LR) one, which is called face hallucination or face super-resolution. A number of super-resolution techniques have been proposed in recent years. However, for face hallucination the utilization of the special properties of the faces is conductive to generate the HR face images. / In this thesis, we propose a new face hallucination framework based on image patches, which integrates two novel statistical super-resolution models. Considering that image patches reflect the combined effect of personal characteristics and patch-location, we first formulate a TensorPatch model based on multilinear analysis to explicitly model the interaction between multiple constituent factors. Motivated by Locally Linear Embedding, we develop an enhanced multilinear patch hallucination algorithm, which efficiently exploits the local distribution structure in the sample space. To better preserve face subtle details, we derive the Coupled PCA algorithm to learn the relation between HR residue and LR residue, which is utilized for compensate the error residue in hallucinated images. Experiments demonstrate that our framework not only well maintains the global facial structures, but also recovers the detailed facial traits in high quality. (Abstract shortened by UMI.) / In this thesis, we propose a novel dimensionality reduction algorithm called graph-regularized projection (GRP) to tackle the problem of semi-supervised dimensionality reduction that is rarely investigated in the literature. Given partially labeled data points, GRP aims at learning a not only smooth but also discriminative projection from high-dimensional data vectors to their latent low-dimensional representations. Motivated by recent semi-supervised learning process: graph regularization, we develop a graph-based regularization framework to enforce smoothness along the graph of the desired projection initiated by margin maximization. As a result, GRP has a natural out-of-sample extension to novel examples and thus can be generalized to the entire high-dimensional space. Extensive experiments on a synthetic dataset and several real databases demonstrate the effectiveness of our algorithm. / Next, this thesis addresses the problem of how to learn an appropriate feature representation from video to benefit video-based face recognition. By simultaneously exploiting the spatial and temporal information, the problem is posed as learning Spatio-Temporal Embedding (STE) from raw videos. STE of a video sequence is defined as its condensed version capturing the essence of space-time characteristics of the video. Relying on the co-occurrence statistics and supervised signatures provided by training videos, STE preserves the intrinsic temporal structures hidden in video volume, meanwhile encodes the discriminative cues into the spatial domain. To conduct STE, we propose two novel techniques, Bayesian keyframe learning and nonparametric discriminant embedding (NDE), for temporal and spatial learning, respectively. In terms of learned STEs, we derive a statistical formulation to the recognition problem with a probabilistic fusion model. On a large face video database containing more than 200 training and testing sequences, our approach consistently outperforms the state-of-the-art methods, achieving a perfect recognition accuracy. / Liu, Wei. / "August 2007." / Advisers: Xiaoou Tang; Jianzhuang Liu. / Source: Dissertation Abstracts International, Volume: 69-02, Section: B, page: 1110. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 140-151). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307. Computer vision Image processing Optical pattern recognition
903	GPU-friendly visual computing. / CUHK electronic theses & dissertations collection January 2007 (has links) Real-time performance is necessary for video-rate processing. By utilizing GPU for acceleration, we propose an efficient technique for the warped display of surveillance video signal. Usually, there are regions of interest (ROIs) in video surveillance, such as entrance or exit, and moving objects or persons. The ii surveillant wants to see more of the ROIs, but also wants to have an overview of the whole surveillance scope. The warped display solves this conflict by locally zooming in the ROIs. / The above warped-display technique may not be able to capture more information. It only provides an efficient way to display the captured frame. To solve this problem, we propose a novel technique to automatically adjust the exposure and capture more information. Traditional automatic exposure control (AEC) is usually based on the intensity level. On the other hand, our technique is based on the information theory and the amount of information is measured by Shannon entropy. The computation of entropy is accelerated by GPU to achieve the video-rate performance. / Volume rendering is another hot research area. In this area, isosurfaces have been widely adopted to reveal the complex structures in volumetric data, due to its fine visual quality. We describe a GPU-based marching cubes (MC) algorithm to visualize multiple translucent isosurfaces. With the proposed parallel algorithm, we can naturally generate triangles in order, which facilitates the visibility-correct visualization of multiple translucent isosurfaces without computationally expensive sorting. Upon a commodity GPU, our implementation can extract isosurfaces from a high-resolution volume in real time and render the result. / We first present a GPU-friendly image rendering framework, which can achieve a wide range of non-photorealistic rendering (NPR) effects. Most of these effects usually require the tailor-made algorithms. By feeding with constant kernels, the usage of our framework is as simple as that of discrete linear filtering. However, our framework is non-linear and hence can mimic complex NPR effects, such as watercolor, painting, sketching, and so on. The core of our framework is the cellular neural networks (CNN). By relaxing the constraints in the traditional CNN, we demonstrate that various interesting and convincing results can be obtained. As CNN is locally connected and designed for massively parallel hardware, it fits nicely into the GPU hardware and the performance is improved a lot. / With the development of graphics processing unit (GPU), it is more and more efficient to solve complex algorithms with GPU because of its highly parallel structure and fast floating point operations. These complex algorithms were usually implemented with CPU previously. In this thesis, we propose several GPU-friendly concepts and algorithms to address some problems of visual computing, including: image rendering, video rendering, and volume rendering. / Wang Guangyu. / "September 2007." / Advisers: Pheng Am Heng; Tien-Tsin Wong. / Source: Dissertation Abstracts International, Volume: 69-08, Section: B, page: 4865. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 115-131). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Computer vision Image processing Rendering (Computer graphics)
904	Shape registration: toward the automatic construction of deformable shape and appearance models. / CUHK electronic theses & dissertations collection January 2007 (has links) A primary investigation on the selection of texture representations for the appearance modeling is also enclosed in this thesis, as a useful piece of work toward the automatic construction of deformable appearance models. / For both methods, the model generalization errors---the criteria directly evaluating deformable models, are adopted to quantitatively evaluate the registration results. The proposed methods are compared with state-of-the-art ones on both synthetic and real biomedical data. Their abilities to construct 2D and 3D shape models with better quality are demonstrated. Based on the STS method, an Active Boundary Model is also proposed for 3D images segmentation. / In recent years, the deformable shape models have been playing important roles in medical image analysis. A key problem involved in their construction is the shape registration: to establish dense correspondences across a group of different shapes. / So the second method, named STS (Segments tied to splines), is further proposed. It can directly take point sets as input shapes, which is able to handle shapes of complicated topologies in high dimensions. STS employs the same number of segments to gradually and concurrently model different point sets, achieving their registration by maintaining a correspondence that is naturally established at the coarsest stage of modeling. It formulates the registration problem in a Bayesian framework, where a constrained Gaussian Mixture Model (GMM) is taken to measure the likelihood, and an item derived from the bending energy of the Thin Plate Spline (TPS) is assumed to be the prior. This problem is efficiently solved by an Expectation-Maximum (EM) algorithm, which is embedded in a coarse-to-fine scheme. / The first method, called CAP (Coding all the points), employs a set of landmarks along the shape contours to establish the correspondence between shapes. Shape registration is formulated as an optimal coding problem, where not only the position of landmarks, but also the shape contours themselves are coded. The resultant description length is minimized by a new optimization approach, which utilizes multiple optimization techniques and a propagation scheme. However, CAP has difficulty to handle shapes in high dimensions, especially with complicated topologies. This is because it needs to parameterize the shapes under registration, so as to manipulate the trajectories of landmarks. / Two basic elements are normally embedded in a shape registration algorithm: a shape representation model and a transformation model. To our best knowledge, most existing methods treat them separately, where the representations for each shape are obtained first, and then the correspondence is established by only optimizing transformations. From the view of building deformable shape models, this leads to sub-optimal results, because a shape model is a coupled one of both representation and transformation. In this thesis, two new methods have been developed, both achieving the registration by simultaneously optimizing the shape representation and transformation, and thus have the potential to build optimal deformable shape models. Neither of them depend on any specific feature detection. / Jiang, Yifeng. / "September 2007." / Advisers: Hung-Tat Tsui; Qing-Hu Max Meng. / Source: Dissertation Abstracts International, Volume: 69-08, Section: B, page: 4844. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 161-172). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Image processing--Mathematics Shapes--Mathematical models
905	Binary plankton recognition using random sampling. / CUHK electronic theses & dissertations collection January 2006 (has links) Among the these proposed methods (i.e., random subspace, bagging, and pairwise classification), the pairwise classification method produces the highest accuracy at the expense of more computation time for training classifiers. The random subspace method and bagging approach have similar performance. To recognize a testing plankton pattern, the computational costs of the these methods are alike. / Due to the complexity of plankton recognition problem, it is difficult to pursue a single optimal classifier to meet all the requirements. In this work, instead of developing a single sophisticated classifier, we propose an ensemble learning framework based on the random sampling techniques including random subspace and bagging. In the random subspace method, a set of low-dimensional subspaces are generated by randomly sampling on the feature space, and multiple classifiers constructed from these random subspaces are combined to yield a powerful classifier. In the bagging approach, a number of independent bootstrap replicates are generated by randomly sampling with replacement on the training set. A classifier is trained on each replicate, and the final result is produced by integrating all the classifiers using majority voting. Using random sampling, the constructed classifiers are stable and multiple classifiers cover the entire feature space or the whole training set without losing discriminative information. Thus, good performance can be achieved. Experimental results demonstrate the effectiveness of the random sampling techniques for improving the system performance. / On the other hand, in previous approaches, normally the samples of all the plankton classes are used for a single classifier training. It may be difficult to select one feature space to optimally represent and classify all the patterns. Therefore, the overall accuracy rate may be low. In this work, we propose a pairwise classification framework, in which the complex multi-class plankton recognition problem is transformed into a set of two-class problems. Such a problem decomposition leads to a number of simpler classification problems to be solved, and it provides an approach for independent feature selection for each pair of classes. This is the first time for such a framework introduced in plankton recognition. We achieve nearly perfect classification accuracy on every pairwise classifier with less number of selected features, since it is easier to select an optimal feature vector to discriminate the two-class patterns. The ensemble of these pairwise classifiers will increase the overall performance. A high accuracy rate of 94.49% is obtained from a collection of more than 3000 plankton images, making it comparable with what a trained biologist can achieve by using conventional manual techniques. / Plankton including phytoplankton and zooplankton form the base of the food chain in the ocean and are a fundamental component of marine ecosystem dynamics. The rapid mapping of plankton abundance together with taxonomic and size composition can help the oceanographic researchers understand how climate change and human activities affect marine ecosystems. / Recently the University of South Florida developed the Shadowed Image Particle Profiling and Evaluation Recorder (SIPPER), an underwater video system which can continuously capture the magnified plankton images in the ocean. The SIPPER images differ from those used for most previous research in four aspects: (i) the images are much noisier, (ii) the objects are deformable and often partially occluded, (iii) the images are projection variant, i.e., the images are video records of three-dimensional objects in arbitrary positions and orientations, and (iv) the images are binary thus are lack of texture information. To deal with these difficulties, we implement three most valuable general features (i.e., moment invariants, Fourier descriptors, and granulometries) and propose a set of specific features such as circular projections, boundary smoothness, and object density to form a more complete description of the binary plankton patterns. These features are translation, scale, and rotation invariant. Moreover, they are less sensitive to noise. High-quality features will surely benefit the overall performance of the plankton recognition system. / Since all the features are extracted from the same plankton pattern, they may contain much redundant information and noise as well. Different types of features are incompatible in length and scale and the combined feature vector has a higher dimensionality. To make the best of these features for the binary SIPPER plankton image classification, we propose a two-stage PCA based scheme for feature selection, combination, and normalization. The first-stage PCA is used to compact every long feature vector by removing the redundant information and reduce noise as well, and the second-stage PCA is employed to compact the combined feature vector by eliminating the correlative information among different types of features. In addition, we normalize every component in the combined feature vector to the same scale according to its mean value and variance. In doing so, we reduce the computation time for the later recognition stage, and improve the classification accuracy. / Zhao Feng. / "May 2006." / Adviser: Xiaoou Tang. / Source: Dissertation Abstracts International, Volume: 67-11, Section: B, page: 6666. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2006. / Includes bibliographical references (p. 121-136). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Image processing Plankton--Identification Sampling (Statistics)
906	Solving combinatorial optimization problems using neural networks with applications in speech recognition Balakrishnan, Sreeram Viswanath January 1992 (has links) No description available. 621.3994
907	GL4D: a GPU-based architecture for interactive 4D visualization. January 2011 (has links) Chu, Alan. / "October 2010." / Thesis (M.Phil.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 74-80). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.2 / Chapter 1.1 --- Motivation --- p.3 / Chapter 2 --- Background --- p.4 / Chapter 2.1 --- OpenGL and OpenGL Shading Language --- p.4 / Chapter 2.2 --- 4D Visualization --- p.6 / Chapter 2.2.1 --- 3-manifold as Surface for 4D Objects --- p.7 / Chapter 2.2.2 --- Visualizing 4D Objects in Euclidean 3-space --- p.8 / Chapter 2.2.3 --- The 4D Rendering Pipeline --- p.9 / Chapter 3 --- Related Work --- p.11 / Chapter 3.1 --- General Purpose Processing on Graphics Processing Units --- p.11 / Chapter 3.2 --- Volume Rendering --- p.12 / Chapter 3.2.1 --- Indirect Volume Rendering --- p.13 / Chapter 3.2.2 --- Direct Volume Rendering on Structured Grid --- p.13 / Chapter 3.2.3 --- Direct Volume Rendering on Unstructured Grid --- p.18 / Chapter 3.2.4 --- Acceleration of DVR --- p.19 / Chapter 3.3 --- 4D Visualization --- p.22 / Chapter 4 --- GL4D: Hardware Accelerated Interactive 4D Visualization --- p.26 / Chapter 4.1 --- Preprocessing: Prom Equations to Tetrahedral Mesh --- p.28 / Chapter 4.2 --- Core Rendering Pipeline: OpenGL for 4D Rendering --- p.29 / Chapter 4.2.1 --- Vertex Data Upload --- p.30 / Chapter 4.2.2 --- Slice-based Multi-pass Tetrahedral Mesh Rendering --- p.30 / Chapter 4.2.3 --- Back-to-front Composition --- p.38 / Chapter 4.3 --- Advanced Visualization Features in GL4D --- p.38 / Chapter 4.3.1 --- Stereoscopic Rendering --- p.39 / Chapter 4.3.2 --- False Intersection Detection --- p.40 / Chapter 4.3.3 --- Transparent 4D Objects Rendering --- p.42 / Chapter 4.3.4 --- Optimization --- p.44 / Chapter 5 --- Results --- p.48 / Chapter 5.1 --- Data Sets --- p.48 / Chapter 5.1.1 --- 3-manifolds in E4´ؤM3 4 --- p.49 / Chapter 5.1.2 --- 2-manifolds in E4´ؤM2 4 --- p.50 / Chapter 5.2 --- Performance --- p.69 / Chapter 6 --- Conclusion --- p.71 / Chapter 7 --- Future Work --- p.72 / Bibliography --- p.74 Computer graphics Image processing--Computer programs
908	Perceptual quality assessment and processing for visual signals. January 2013 (has links) 視覺信號，包括圖像，視頻等，在采集，壓縮，存儲，傳輸，重新生成的過程中都會被各種各樣的噪聲所影響，因此他們的主觀質量也就會降低。所以，主觀視覺質量在現今的視覺信號處理跟通訊系統中起到了很大的作用。這篇畢業論文主要討論質量評價的算法設計，以及這些衡量標準在視覺信號處理上的應用。這篇論文的工作主要包括以下五個方面。 / 第一部分主要集中在具有完全套考原始圖像的圖像質量評價。首先我們研究人類視覺系統的特征。具體說來，視覺在結構化失真上面的水平特性和顯著特征會被建模然后應用到結構相似度(SSIM)這個衡量標準上。實驗顯示我們的方法明顯的提高了衡量標準典主觀評價的相似度。由這個質量衡量標準的啟發，我們設計了一個主觀圖像壓縮的方法。其中我們提出了一個自適應的塊大小的超分辨率算法指導的下采樣的算法。實驗結果證明提出的圖像壓縮算法無論在主觀還是在客觀層面都構建了高質量的圖像。 / 第二個部分的工作主要討論具有完全參考原始視頻的視頻質量評價。考慮到人類視覺系統的特征，比如時空域的對此敏感函數，眼球的移動，紋理的遮掩特性，空間域的一致性，時間域的協調性，不同塊變換的特性，我們設計了一個自適應塊大小的失真閾值的模型。實驗證明，我們提出的失真閾值模型能夠更精確的描迷人類視覺系統的特性。基于這個自適應塊大小的失真閾值模型，我們設計了一個簡單的主觀質量評價標準。在公共的圓像以及視頻的主觀數據庫上的測試結果證明了這個簡單的評價標準的有效性。因此，我們把這個簡單的質量標準應用于視頻編碼系統中。它可以在同樣的碼率下提供更高主觀質量的視頻。 / 第三部分我們討論具有部分參考信息的圖像質量評價。我們通過描迷重組后的離散余弦變換域的系數的統計分布來衡量圖像的主觀質量。提出的評價標準發掘了相鄰的離散余弦系數的相同統計特性，相鄰的重組離散余弦系數的互信息，以及圖像的能量在不同頻率下的分布。實驗結果證明我們提出的質量標準河以超越其他的具有部分參考信息的質量評價標準，甚至還超過了具有完全參考信息的質量評價標準。而且，提取的特征很容易被編碼以及隱藏到圖像中以便于在圖像通訊中進行質量監控。 / 第四部分我們討論具有部分參考信息的視頻質量評價。我們提取的特征可以很好的描迷空間域的信息失，和時間域的相鄰兩幀間的直方圖的統計特性。在視頻主觀質量的數據庫上的實驗結果，也證明了提出的方法河以超越其他代表性的視頻質量評價標準，甚至是具有完全參考信息的質量評價標準，譬如PSNR以及SSIM 。我們的方法只需要很少的特征來描迷每一幀視頻圖像。對于每一幀圖像，一個特征用于描迷空間域的特點，另外三個特征用于描述時間域的特點。考慮到計算的復雜度以及壓縮特征所需要的碼率，提出的方法河以很簡單的在視頻的傳輸過程中監控視頻的質量。 / 之前的四部分提到的主觀質量評價標準主要集中在傳統的失真上面，譬如JPEG 圖像壓縮， H.264視頻壓縮。在最后一部分，我們討論在圖像跟視頻的retargeting過程中的失真。現如今，隨著消費者電子的發展，視覺信號需要在不同分辨率的顯示設備上進行通訊交互。因此， retargeting的算法把同一個原始圖像適應于不同的分辨率的顯示設備。這樣的過程就會引入圖像的失真。我們研究了對于retargeting圖像主觀質量的測試者的分數，從三個方面進行討論測試者對于retargeting圖像失真的反應.圖像retargeting的尺度，圖像retargeting的算法，原始圖像的內容特性。通過大量的主觀實驗測試，我們構建了一個關于圖像retargeting的主觀數據庫。基于這個主觀數據庫，我們評價以及分析了幾個具有代表性的質量評價標準。 / Visual signals, including images, videos, etc., are affected by a wide variety of distortions during acquisition, compression, storage, processing, transmission, and reproduction processes, which result in perceptual quality degradation. As a result, perceptual quality assessment plays a very important role in today's visual signal processing and communication systems. In this thesis, quality assessment algorithms for evaluating the visual signal perceptual quality, as well as the applications on visual signal processing and communications, are investigated. The work consists of five parts as briefly summarized below. / The first part focuses on the full-reference (FR) image quality assessment. The properties of the human visual system (HVS) are firstly investigated. Specifically, the visual horizontal effect (HE) and saliency properties over the structural distortions are modelled and incorporated into the structure similarity index (SSIM). Experimental results show significantly improved performance in matching the subjective ratings. Inspired by the developed FR image metric, a perceptual image compression scheme is developed, where the adaptive block-based super-resolution directed down-sampling is proposed. Experimental results demonstrated that the proposed image compression scheme can produce higher quality images in terms of both objective and subjective qualities, compared with the existing methods. / The second part concerns the FR video quality assessment. The adaptive block-size transform (ABT) based just-noticeable difference (JND) for visual signals is investigated by considering the HVS characteristics, e.g., spatio-temporal contrast sensitivity function (CSF), eye movement, texture masking, spatial coherence, temporal consistency, properties of different block-size transforms, etc. It is verified that the developed ABT based JND can more accurately depict the HVS property, compared with the state-of-the-art JND models. The ABT based JND is thereby utilized to develop a simple perceptual quality metric for visual signals. Validations on the image and video subjective quality databases proved its effectiveness. As a result, the developed perceptual quality metric is employed for perceptual video coding, which can deliver video sequences of higher perceptual quality at the same bit-rates. / The third part discusses the reduced-reference (RR) image quality assessment, which is developed by statistically modelling the coe cient distribution in the reorganized discrete cosine transform (RDCT) domain. The proposed RR metric exploits the identical statistical nature of the adjacent DCT coefficients, the mutual information (MI) relationship between adjacent RDCT coefficients, and the image energy distribution among different frequency components. Experimental results demonstrate that the proposed metric outperforms the representative RR image quality metrics, and even the FR quality metric, i.e., peak signal to noise ratio (PSNR). Furthermore, the extracted RR features can be easily encoded and embedded into the distorted images for quality monitoring during image communications. / The fourth part investigates the RR video quality assessment. The RR features are extracted to exploit the spatial information loss and the temporal statistical characteristics of the inter-frame histogram. Evaluations on the video subjective quality databases demonstrate that the proposed method outperforms the representative RR video quality metrics, and even the FR metrics, such as PSNR, SSIM in matching the subjective ratings. Furthermore, only a small number of RR features is required to represent the original video sequence (each frame requires only 1 and 3 parameters to depict the spatial and temporal characteristics, respectively). By considering the computational complexity and the bit-rates for extracting and representing the RR features, the proposed RR quality metric can be utilized for quality monitoring during video transmissions, where the RR features for perceptual quality analysis can be easily embedded into the videos or transmitted through an ancillary data channel. / The aforementioned perceptual quality metrics focus on the traditional distortions, such as JPEG image compression noise, H.264 video compression noise, and so on. In the last part, we investigate the distortions introduced during the image and video retargeting process. Nowadays, with the development of the consumer electronics, more and more visual signals have to communicate between different display devices of different resolutions. The retargeting algorithm is employed to adapt a source image of one resolution to be displayed in a device of a different resolution, which may introduce distortions during the retargeting process. We investigate the subjective responses on the perceptual qualities of the retargeted images, and discuss the subjective results from three perspectives, i.e., retargeting scales, retargeting methods, and source image content attributes. An image retargeting subjective quality database is built by performing a large-scale subjective study of image retargeting quality on a collection of retargeted images. Based on the built database, several representative quality metrics for retargeted images are evaluated and discussed. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Ma, Lin. / "December 2012." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 185-197). / Abstract also in Chinese. / Dedication --- p.ii / Acknowledgments --- p.iii / Abstract --- p.viii / Publications --- p.xi / Nomenclature --- p.xvii / Contents --- p.xxiv / List of Figures --- p.xxviii / List of Tables --- p.xxx / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation and Objectives --- p.1 / Chapter 1.2 --- Subjective Perceptual Quality Assessment --- p.5 / Chapter 1.3 --- Objective Perceptual Quality Assessment --- p.10 / Chapter 1.3.1 --- Visual Modelling Approach --- p.10 / Chapter 1.3.2 --- Engineering Modelling Approach --- p.15 / Chapter 1.3.3 --- Perceptual Subjective Quality Databases --- p.19 / Chapter 1.3.4 --- Performance Evaluation --- p.21 / Chapter 1.4 --- Thesis Contributions --- p.22 / Chapter 1.5 --- Organization of the Thesis --- p.24 / Chapter I --- Full Reference Quality Assessment --- p.26 / Chapter 2 --- Full Reference Image Quality Assessment --- p.27 / Chapter 2.1 --- Visual Horizontal Effect for Image Quality Assessment --- p.27 / Chapter 2.1.1 --- Introduction --- p.27 / Chapter 2.1.2 --- Proposed Image Quality Assessment Framework --- p.28 / Chapter 2.1.3 --- Experimental Results --- p.34 / Chapter 2.1.4 --- Conclusion --- p.36 / Chapter 2.2 --- Image Compression via Adaptive Block-Based Super-Resolution Directed Down-Sampling --- p.37 / Chapter 2.2.1 --- Introduction --- p.37 / Chapter 2.2.2 --- The Proposed Image Compression Framework --- p.38 / Chapter 2.2.3 --- Experimental Results --- p.42 / Chapter 2.2.4 --- Conclusion --- p.45 / Chapter 3 --- Full Reference Video Quality Assessment --- p.46 / Chapter 3.1 --- Adaptive Block-size Transform based Just-Noticeable Dfference Model for Visual Signals --- p.46 / Chapter 3.1.1 --- Introduction --- p.46 / Chapter 3.1.2 --- JND Model based on Transforms of Different Block Sizes --- p.48 / Chapter 3.1.3 --- Selection Strategy Between Transforms of Different Block Sizes --- p.53 / Chapter 3.1.4 --- JND Model Evaluation --- p.56 / Chapter 3.1.5 --- Conclusion --- p.60 / Chapter 3.2 --- Perceptual Quality Assessment --- p.60 / Chapter 3.2.1 --- Experimental Results --- p.62 / Chapter 3.2.2 --- Conclusion --- p.64 / Chapter 3.3 --- Motion Trajectory Based Visual Saliency for Video Quality Assessment --- p.65 / Chapter 3.3.1 --- Motion Trajectory based Visual Saliency for VQA --- p.66 / Chapter 3.3.2 --- New Quaternion Representation (QR) for Each frame --- p.66 / Chapter 3.3.3 --- Saliency Map Construction by QR --- p.67 / Chapter 3.3.4 --- Incorporating Visual Saliency with VQAs --- p.68 / Chapter 3.3.5 --- Experimental Results --- p.69 / Chapter 3.3.6 --- Conclusion --- p.72 / Chapter 3.4 --- Perceptual Video Coding --- p.72 / Chapter 3.4.1 --- Experimental Results --- p.75 / Chapter 3.4.2 --- Conclusion --- p.76 / Chapter II --- Reduced Reference Quality Assessment --- p.77 / Chapter 4 --- Reduced Reference Image Quality Assessment --- p.78 / Chapter 4.1 --- Introduction --- p.78 / Chapter 4.2 --- Reorganization Strategy of DCT Coefficients --- p.81 / Chapter 4.3 --- Relationship Analysis of Intra and Inter RDCT subbands --- p.83 / Chapter 4.4 --- Reduced Reference Feature Extraction in Sender Side --- p.88 / Chapter 4.4.1 --- Intra RDCT Subband Modeling --- p.89 / Chapter 4.4.2 --- Inter RDCT Subband Modeling --- p.91 / Chapter 4.4.3 --- Image Frequency Feature --- p.92 / Chapter 4.5 --- Perceptual Quality Analysis in the Receiver Side --- p.95 / Chapter 4.5.1 --- Intra RDCT Feature Difference Analysis --- p.95 / Chapter 4.5.2 --- Inter RDCT Feature Difference Analysis --- p.96 / Chapter 4.5.3 --- Image Frequency Feature Difference Analysis --- p.96 / Chapter 4.6 --- Experimental Results --- p.98 / Chapter 4.6.1 --- Efficiency of the DCT Reorganization Strategy --- p.98 / Chapter 4.6.2 --- Performance of the Proposed RR IQA --- p.99 / Chapter 4.6.3 --- Performance of the Proposed RR IQA over Each Individual Distortion Type --- p.105 / Chapter 4.6.4 --- Statistical Significance --- p.107 / Chapter 4.6.5 --- Performance Analysis of Each Component --- p.109 / Chapter 4.7 --- Conclusion --- p.111 / Chapter 5 --- Reduced Reference Video Quality Assessment --- p.113 / Chapter 5.1 --- Introduction --- p.113 / Chapter 5.2 --- Proposed Reduced Reference Video Quality Metric --- p.114 / Chapter 5.2.1 --- Reduced Reference Feature Extraction from Spatial Perspective --- p.116 / Chapter 5.2.2 --- Reduced Reference Feature Extraction from Temporal Perspective --- p.118 / Chapter 5.2.3 --- Visual Quality Analysis in Receiver Side --- p.121 / Chapter 5.3 --- Experimental Results --- p.123 / Chapter 5.3.1 --- Consistency Test of the Proposed RR VQA over Compressed Video Sequences --- p.124 / Chapter 5.3.2 --- Consistency Test of the Proposed RR VQA over Video Sequences with Simulated Distortions --- p.126 / Chapter 5.3.3 --- Performance Evaluation of the Proposed RR VQA on Compressed Video Sequences --- p.129 / Chapter 5.3.4 --- Performance Evaluation of the Proposed RR VQA on Video Sequences Containing Transmission Distortions --- p.133 / Chapter 5.3.5 --- Performance Analysis of Each Component --- p.135 / Chapter 5.4 --- Conclusion --- p.137 / Chapter III --- Retargeted Visual Signal Quality Assessment --- p.138 / Chapter 6 --- Image Retargeting Perceptual Quality Assessment --- p.139 / Chapter 6.1 --- Introduction --- p.139 / Chapter 6.2 --- Preparation of Database Building --- p.142 / Chapter 6.2.1 --- Source Image --- p.142 / Chapter 6.2.2 --- Retargeting Methods --- p.143 / Chapter 6.2.3 --- Subjective Testing --- p.146 / Chapter 6.3 --- Data Processing and Analysis for the Database --- p.150 / Chapter 6.3.1 --- Processing of Subjective Ratings --- p.150 / Chapter 6.3.2 --- Analysis and Discussion of the Subjective Ratings --- p.153 / Chapter 6.4 --- Objective Quality Metric for Retargeted Images --- p.162 / Chapter 6.4.1 --- Quality Metric Performances on the Constructed Image Retargeting Database --- p.162 / Chapter 6.4.2 --- Subjective Analysis of the Shape Distortion and Content Information Loss --- p.165 / Chapter 6.4.3 --- Discussion --- p.167 / Chapter 6.5 --- Conclusion --- p.169 / Chapter 7 --- Conclusions --- p.170 / Chapter 7.1 --- Conclusion --- p.170 / Chapter 7.2 --- Future Work --- p.173 / Chapter A --- Attributes of the Source Image --- p.176 / Chapter B --- Retargeted Image Name and the Corresponding Number --- p.179 / Chapter C --- Source Image Name and the Corresponding Number --- p.183 / Bibliography --- p.185 Image processing Visual perception Signal processing
909	Two approaches to sparsity for image restoration. January 2013 (has links) 稀疏性在最近的圖像恢復技術發展中起到了重要作用。在這個碩士研究中，我們專注於兩種通過信號稀疏性假設相聯繫的圖像恢復問題。具體來講，在第一個圖像恢復問題中，信號本身在某些變換域是稀疏的，例如小波變換。在本研究的第二部分，信號並非傳統意義上的稀疏，但它可以用很少的幾個參數來表示--亦即信號具有稀疏的表示。我們希望通過講述一個「雙城記」，聯繫起這兩個稀疏圖像重建問題。 / 在第二章中，我們提出了一種創新的算法框架，用於解決信號稀疏假設下的圖像恢復問題。重建圖像的目標函數，由一個數據保真項和`1正則項組成。然而，我們不是直接估計重建的圖像，而是專注於如何獲得重建的這個過程。我們的策略是將這個重建過程表示成基本閾值函數的線性組合（LET）：這些線性係數可以通過最小化目標函數解得。然後，可以更新閾值函數并迭代這個過程（i-LET）。這種線性參數化的主要優點是可以大幅降低問題的規模-每次我們只需解決一個線性係數維度大小的優化問題（通常小於十），而不是整個圖像大小的問題。如果閾值函滿足一定的條件，迭代LET算法可以保證全局的收斂性。多個測試圖像在不同噪音水平和不同卷積核類型的測試清楚地表明，我們提出的框架在所需運算時間和迭代循環次數方面，通常超越當今最好水平。 / 在第三章中，我們擴展了有限創新率採樣框架至某一種特定二維曲線。我們用掩模函數的解來間接定義這個二維曲線。這裡，掩模函數可以表示為有限數目的正弦信號加權求和。因此，從這個角度講，我們定義的二維曲線具有「有限創新率」（FRI）。由於與定義曲線相關聯的指示器圖像沒有帶寬限制，因而根據經典香農採樣定理，不能在有限數量的採樣基礎上獲得完全重建。然而，我們證明，仍然可以設計一個針對指示器圖像採樣的框架，實現完美重構。此外，對於這一方法的空間域解釋，使我們能夠拓展嚴格的FRI曲線模型用於描述自然圖像的邊緣，可以在各種圖像處理的問題中保持圖像的邊緣。我們用一個潛在的在圖像上採樣中的應用作為示例。 / Sparsity has played an important role in recent developments of various image restoration techniques. In this MPhil study, we focus on two different types of image restoration problems, which are related by the sparsity assumptions. Specifically, in the first image restoration problem, the signal (i.e. the restored image) itself is sparse in some transformation domain, e.g. wavelet. While in the second part of this study, the signal is not sparse in the traditional sense but that it can be parametrized with a few parameters hence having a sparse representation. Our goal is to tell a "tale of two cities" and to show the connections between the two sparse image restoration problems in this thesis. / In Chapter 2, we proposed a novel algorithmic framework to solve image restoration problems under sparsity assumptions. As usual, the reconstructed image is the minimum of an objective functional that consists of a data fidelity term and an ℓ₁ regularization. However, instead of estimating the reconstructed image that minimizes the objective functional directly, we focus on the restoration process that maps the degraded measurements to the reconstruction. Our idea amounts to parameterizing the process as a linear combination of few elementary thresholding functions (LET) and solve for the linear weighting coefficients by minimizing the objective functional. It is then possible to update the thresholding functions and to iterate this process (i-LET). The key advantage of such a linear parametrization is that the problem size reduces dramatically--each time we only need to solve an optimization problem over the dimension of the linear coefficients (typically less than 10) instead of the whole image dimensio . With the elementary thresholding functions satisfying certain constraints, global convergence of the iterated LET algorithm is guaranteed. Experiments on several test images over a wide range of noise levels and different types of convolution kernels clearly indicate that the proposed framework usually outperform state-of-theart algorithms in terms of both CPU time and number of iterations. / In Chapter 3, we extended the sampling framework for signals with finite rate of innovation to a specific class of two-dimensional curves, which are defined implicitly as the roots of a mask function. Here the mask function has a parametric representation as weighted summation of a finite number of sinusoids, and therefore, has finite rate of innovation [1]. The associated indicator image of the defined curve is not bandlimited and cannot be perfectly reconstructed based on the classical Shannon's sampling theorem. Yet, we show that it is possible to devise a sampling scheme and have a perfect reconstruction from finite number of (noiseless) samples of the indicator image with the annihilating filter method (also known as Prony's method). Robust reconstruction algorithms with noisy samples are also developed. Furthermore, the new spatial domain interpretation of the annihilating filter enables us to generalize the exact FRI curve model to characterize edges of a natural image. We can impose the annihilation constraint to preserve edges in various image processing problems. We exemplified the effectiveness of the annihilation constraint with a potential application in image up-sampling. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Pan, Hanjie. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 69-74). / Abstracts also in Chinese. / Acknowledgments --- p.iii / Abstract --- p.vii / Contents --- p.xii / List of Figures --- p.xv / List of Tables --- p.xvii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Sampling Sparse Signals --- p.1 / Chapter 1.2 --- Thesis Organizations and Contributions --- p.3 / Chapter 2 --- An Iterated Linear Expansion of Thresholds for ℓ₁-based Image Restoration --- p.5 / Chapter 2.1 --- Introduction --- p.5 / Chapter 2.1.1 --- Problem Description --- p.5 / Chapter 2.1.2 --- Approaches to Solve the Problem --- p.6 / Chapter 2.1.3 --- Proposed Approach --- p.8 / Chapter 2.1.4 --- Organization of the Chapter --- p.9 / Chapter 2.2 --- Basic Ingredients --- p.9 / Chapter 2.2.1 --- Iterative Reweighted Least Square Methods --- p.9 / Chapter 2.2.2 --- Linear Expansion of Thresholds (LET) --- p.11 / Chapter 2.3 --- Iterative LET Restoration --- p.15 / Chapter 2.3.1 --- Selection of i-LET Bases --- p.15 / Chapter 2.3.2 --- Convergence of the i-LET Scheme --- p.16 / Chapter 2.3.3 --- Examples of i-LET Bases --- p.18 / Chapter 2.4 --- Experimental Results --- p.23 / Chapter 2.4.1 --- Deconvolution with Decimated Wavelet Transform --- p.24 / Chapter 2.4.2 --- Deconvolution with Redundant Wavelet Transform --- p.28 / Chapter 2.4.3 --- Algorithm Complexity Analysis --- p.29 / Chapter 2.4.4 --- Choice of Regularization Weight λ --- p.30 / Chapter 2.4.5 --- Deconvolution with Cycle Spinnings --- p.30 / Chapter 2.5 --- Summary --- p.31 / Chapter 3 --- Sampling Curves with Finite Rate of Innovation --- p.33 / Chapter 3.1 --- Introduction --- p.33 / Chapter 3.2 --- Two-dimensional Curves with Finite Rate of Innovation --- p.34 / Chapter 3.2.1 --- FRI Curves --- p.34 / Chapter 3.2.2 --- Interior Indicator Image --- p.35 / Chapter 3.2.3 --- Acquisition of Indicator Image Samples --- p.36 / Chapter 3.3 --- Reconstruction of the Annihilable Curves --- p.37 / Chapter 3.3.1 --- Annihilating Filter Method --- p.37 / Chapter 3.3.2 --- Relate Fourier Transform with Spatial Domain Samples --- p.39 / Chapter 3.3.3 --- Reconstruction of Annihilation Coe cients --- p.39 / Chapter 3.3.4 --- Reconstruction with Model Mismatch --- p.42 / Chapter 3.3.5 --- Retrieval of the Annihilable Curve Amplitudes --- p.46 / Chapter 3.4 --- Dealing with Non-ideal Low-pass Filtered Samples --- p.48 / Chapter 3.5 --- Generalization of the FRI Framework for Natural Images --- p.49 / Chapter 3.5.1 --- Spatial Domain Interpretation of the Annihilation Equation --- p.50 / Chapter 3.5.2 --- Annihilable Curve Approximation of Image Edges --- p.51 / Chapter 3.5.3 --- Up-sampling with Annihilation Constraint --- p.53 / Chapter 3.6 --- Conclusion --- p.57 / Chapter 4 --- Conclusions --- p.59 / Chapter 4.1 --- Thesis Summary --- p.59 / Chapter 4.2 --- Perspectives --- p.60 / Chapter A --- Proofs and Derivations --- p.61 / Chapter A.1 --- Proof of Lemma 3 --- p.61 / Chapter A.2 --- Proof of Theorem 2 --- p.62 / Chapter A.3 --- Efficient Implementation of IRLS Inner Loop with Matlab --- p.63 / Chapter A.4 --- Derivations of the Sampling Formula (3.7) --- p.64 / Chapter A.5 --- Correspondence between the Spatial and Fourier Domain Samples --- p.65 / Chapter A.6 --- Optimal Post-filter Applied to Non-ideal Samples --- p.66 / Bibliography --- p.69 Image reconstruction--Mathematics Image processing--Digital techniques
910	Reconstruction of high-resolution image from movie frames. January 2003 (has links) by Ling Kai Tung. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 44-45). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.7 / Chapter 2 --- Fundamentals --- p.9 / Chapter 2.1 --- Digital image representation --- p.9 / Chapter 2.2 --- Motion Blur --- p.13 / Chapter 3 --- Methods for Solving Nonlinear Least-Squares Prob- lem --- p.15 / Chapter 3.1 --- Introduction --- p.15 / Chapter 3.2 --- Nonlinear Least-Squares Problem --- p.15 / Chapter 3.3 --- Gauss-Newton-Type Methods --- p.16 / Chapter 3.3.1 --- Gauss-Newton Method --- p.16 / Chapter 3.3.2 --- Damped Gauss-Newton Method --- p.17 / Chapter 3.4 --- Full Newton-Type Methods --- p.17 / Chapter 3.4.1 --- Quasi-Newton methods --- p.18 / Chapter 3.5 --- Constrained problems --- p.19 / Chapter 4 --- Reconstruction of High-Resolution Images from Movie Frames --- p.20 / Chapter 4.1 --- Introduction --- p.20 / Chapter 4.2 --- The Mathematical Model --- p.22 / Chapter 4.2.1 --- The Discrete Model --- p.23 / Chapter 4.2.2 --- Regularization --- p.24 / Chapter 4.3 --- Acquisition of Low-Resolution Movie Frames --- p.25 / Chapter 4.4 --- Experimental Results --- p.25 / Chapter 4.5 --- Concluding Remarks --- p.26 / Chapter 5 --- Constrained Total Least-Squares Computations for High-Resolution Image Reconstruction --- p.31 / Chapter 5.1 --- Introduction --- p.31 / Chapter 5.2 --- The Mathematical Model --- p.32 / Chapter 5.3 --- Numerical Algorithm --- p.37 / Chapter 5.4 --- Numerical Results --- p.39 / Chapter 5.5 --- Concluding Remarks --- p.39 / Bibliography --- p.44 Image reconstruction--Mathematics Image processing--Digital techniques

Search results