Global ETD Search

211	Content-aware Video Compression Subramanian, Vivek January 2019 (has links) In a video there are certain regions in the image that viewers focus on more than others, which are called the salient regions or RegionsOf-Interest (ROI). This thesis aims to improve the perceived quality of videos by improving the quality of these ROis while degrading the quality of the other non-ROI regions of a frame to keep the same bitrate as would have been the case otherwise. This improvement is achieved by using saliency maps generated using an eye tracker or a deep neural network and providing this information to a modified video encoder. In this thesis the open source x264 encoder was chosen to make use of this information. The effects of ROI encoding are studied for high quality 720p videos by encoding them at low bitrates. The results indicate that ROI encoding can improve subjective video quality when carefully applied. / I en video £inns <let vissa delar av bilden som tittarna fokuserar mer pa an andra, och dessa kallas Region of Interest". Malet med den har uppsatsen ar att hoja den av tittaren upplevda videokvaliteten genom att minska kompressionsgraden ( och darmed hoja kvaliteten) i de iogonfallande delarna av bilden, samtid som man hojer kompressionsgraden i ovriga delar sa att bitraten blir den samma som innan andringen. Den har forbattringen gors genom att anvanda Saliency Mapsssom visar de iogonfallande delarna for varje bildruta. Dessa Saliency Maps"har antingen detekterats med hjalp av en Eye Tracker eller sa har de raknats fram av ett Neuralt Natverk. Informationen anvands sedan i en modifierad version av den oppna codecen x264 enligt en egendesignad algoritm. Effekten av forandringen har studerats genom att koda hogkvalitativa kallfiler vid lag bitrate. Resultaten indikerar att denna metod kan forbattra den upplevda kvaliteten av en video om den appliceras med ratt styrka. region-of-interest saliency map bitrate H.264 video compression quantization offset Engineering and Technology Teknik och teknologier
212	Deep Learning Optimization and Acceleration Jiang, Beilei 08 1900 (has links) The novelty of this dissertation is the optimization and acceleration of deep neural networks aimed at real-time predictions with minimal energy consumption. It consists of cross-layer optimization, output directed dynamic quantization, and opportunistic near-data computation for deep neural network acceleration. On two datasets (CIFAR-10 and CIFAR-100), the proposed deep neural network optimization and acceleration frameworks are tested using a variety of Convolutional neural networks (e.g., LeNet-5, VGG-16, GoogLeNet, DenseNet, ResNet). Experimental results are promising when compared to other state-of-the-art deep neural network acceleration efforts in the literature. Cross-Layer Optimization Dynamic Quantization Performance Evaluation
213	Speaker Identification Based On Discriminative Vector Quantization And Data Fusion Zhou, Guangyu 01 January 2005 (has links) Speaker Identification (SI) approaches based on discriminative Vector Quantization (VQ) and data fusion techniques are presented in this dissertation. The SI approaches based on Discriminative VQ (DVQ) proposed in this dissertation are the DVQ for SI (DVQSI), the DVQSI with Unique speech feature vector space segmentation for each speaker pair (DVQSI-U), and the Adaptive DVQSI (ADVQSI) methods. The difference of the probability distributions of the speech feature vector sets from various speakers (or speaker groups) is called the interspeaker variation between speakers (or speaker groups). The interspeaker variation is the measure of template differences between speakers (or speaker groups). All DVQ based techniques presented in this contribution take advantage of the interspeaker variation, which are not exploited in the previous proposed techniques by others that employ traditional VQ for SI (VQSI). All DVQ based techniques have two modes, the training mode and the testing mode. In the training mode, the speech feature vector space is first divided into a number of subspaces based on the interspeaker variations. Then, a discriminative weight is calculated for each subspace of each speaker or speaker pair in the SI group based on the interspeaker variation. The subspaces with higher interspeaker variations play more important roles in SI than the ones with lower interspeaker variations by assigning larger discriminative weights. In the testing mode, discriminative weighted average VQ distortions instead of equally weighted average VQ distortions are used to make the SI decision. The DVQ based techniques lead to higher SI accuracies than VQSI. DVQSI and DVQSI-U techniques consider the interspeaker variation for each speaker pair in the SI group. In DVQSI, speech feature vector space segmentations for all the speaker pairs are exactly the same. However, each speaker pair of DVQSI-U is treated individually in the speech feature vector space segmentation. In both DVQSI and DVQSI-U, the discriminative weights for each speaker pair are calculated by trial and error. The SI accuracies of DVQSI-U are higher than those of DVQSI at the price of much higher computational burden. ADVQSI explores the interspeaker variation between each speaker and all speakers in the SI group. In contrast with DVQSI and DVQSI-U, in ADVQSI, the feature vector space segmentation is for each speaker instead of each speaker pair based on the interspeaker variation between each speaker and all the speakers in the SI group. Also, adaptive techniques are used in the discriminative weights computation for each speaker in ADVQSI. The SI accuracies employing ADVQSI and DVQSI-U are comparable. However, the computational complexity of ADVQSI is much less than that of DVQSI-U. Also, a novel algorithm to convert the raw distortion outputs of template-based SI classifiers into compatible probability measures is proposed in this dissertation. After this conversion, data fusion techniques at the measurement level can be applied to SI. In the proposed technique, stochastic models of the distortion outputs are estimated. Then, the posteriori probabilities of the unknown utterance belonging to each speaker are calculated. Compatible probability measures are assigned based on the posteriori probabilities. The proposed technique leads to better SI performance at the measurement level than existing approaches. Speaker Recognition Vector Quantization Data Fusion Pattern Recognition Electrical and Computer Engineering Electrical and Electronics Engineering
214	DESIGN OF ALGORITHMS TO ASSOCIATE SENSOR NODES TO FUSION CENTERS USING QUANTIZED MEASUREMENTS Vudumu, Sarojini January 2023 (has links) Wireless sensor networks (WSNs) typically consist of a significant number of inexpensive sensor nodes, each of which is powered by a battery or another finite energy source that is difficult to replace because of the environment they are in or the cost of doing so. The applications of WSNs include military surveillance, disaster management, target tracking and monitoring environmental conditions. In order to increase the lifespan of WSNs, energy-efficient sensing and communication approaches for sensor nodes are essential. Recently, there has been an increase in interest in using unmanned aerial vehicles (UAVs) as portable data collectors for ground sensor nodes in WSN. Several approaches to solving effective communication between sensor nodes and the fusion center have been investigated in this thesis. Because processing, sensing range, transmission bandwidth, and energy consumption are always limited, it is beneficial not to use all the information provided at each sensor node in order to prolong its life span and reduce communication costs. In order to address this problem, first, efficient measurement quantization techniques are proposed using a single fusion center and multiple sensors. The dynamic bit distribution is done among all the sensors and within the measurement elements. The problem is then expanded to include multiple fusion centers, and a novel algorithm is proposed to associate sensors to fusion centers. The bandwidth distribution for targets which are being monitored by several sensors is addressed. Additionally, how to use the situation in which the sensors are in the coverage radius of multiple fusion centers in order to share the targets between them is discussed. Finally, performance bounded data collection algorithms are proposed where the necessary accuracy for each target is specified. In order to determine the minimum number of data collectors needed and their initial placement, an algorithm is proposed. When there are fewer fixed data collectors than there are regions to collect the data from, a coverage path planning method is developed. Since the optimal solution requires an enormous computational requirement and not realistic for real-time online implementation, approximate algorithms are proposed for multi-objective integer optimization problems. In order to assess each suggested algorithm's effectiveness, many simulated scenarios are used together with baselines and simple existing methods. / Thesis / Doctor of Philosophy (PhD)
215	Dynamic Classification Using the Adaptive Competitive Algorithm Deldadehasl, maryam 01 December 2023 (has links) (PDF) The Vector Quantization (VQ) model proposes a powerful solution for data clustering. Its design indicates a specific combination of concepts from machine learning and dynamical systems theory to classify input data into distinct groups. The model evolves over time to better match the distribution of the input data. This adaptive feature is a strength of the model, as it allows the cluster centers to shift according to the input patterns, effectively quantizing the data distribution. It is a gradient dynamical system, using the energy function V as its Lyapunov function, and thus possesses properties of convergence and stability. These characteristics make the VQ model a promising tool for complex data analysis tasks, including those encountered in machine learning, data mining, and pattern recognition.In this study, we have applied the dynamic model to the "Breast Cancer Wisconsin Diagnostic" dataset, a comprehensive collection of features derived from digitized images of fine needle aspirate (FNA) of breast masses. This dataset, comprising various diagnostic measurements related to breast cancer, poses a unique challenge for clustering due to its high dimensionality and the critical nature of its application in medical diagnostics. By employing the model, we aim to demonstrate its efficacy in handling complex, multidimensional data, especially in the realm of medical pattern recognition and data mining. This integration not only highlights the model's versatility in different domains but also showcases its potential in contributing significantly to medical diagnostics, particularly in breast cancer identification and classification. Neural networks real-time clustering Vector Quantization
216	Low-delay sensing and transmission in wireless sensor networks Karlsson, Johannes January 2008 (has links) With the increasing popularity and relevance of ad-hoc wireless sensor networks, cooperative transmission is more relevant than ever. In this thesis, we consider methods for optimization of cooperative transmission schemes in wireless sensor networks. We are in particular interested in communication schemes that can be used in applications that are critical to low-delays, such as networked control, and propose suitable candidates of joint source-channel coding schemes. We show that, in many cases, there are significant gains if the parts of the system are jointly optimized for the current source and channel. We especially focus on two means of cooperative transmission, namely distributed source coding and relaying. In the distributed source coding case, we consider transmission of correlated continuous sources and propose an algorithm for designing simple and energy-efficient sensor nodes. In particular the cases of the binary symmetric channel as well as the additive white Gaussian noise channel are studied. The system works on a sample by sample basis yielding a very low encoding complexity, at an insignificant delay. Due to the source correlation, the resulting quantizers use the same indices for several separated intervals in order to reduce the quantization distortion. For the case of relaying, we study the transmission of a continuous Gaussian source and the transmission of an uniformly distributed discrete source. In both situations, we propose design algorithms to design low-delay source-channel and relay mappings. We show that there can be significant power savings if the optimized systems are used instead of more traditional systems. By studying the structure of the optimized source-channel and relay mappings, we provide useful insights on how the optimized systems work. Interestingly, the design algorithm generally produces relay mappings with a structure that resembles Wyner-Ziv compression. Cooperative communication wireless sensor networks low-delay transmission joint source-channel coding estimation quantization Telecommunications Telekommunikation
217	L2 Optimized Predictive Image Coding with L∞ Bound Chuah, Sceuchin 04 1900 (has links) <p>In many scientific, medical and defense applications of image/video compression, an <em>l</em><sub>∞ </sub>error bound is required. However, pure <em>l</em><sub>∞</sub>-optimized image coding, colloquially known as near-lossless image coding, is prone to structured errors such as contours and speckles if the bit rate is not sufficiently high; moreover, previous <em>l</em><sub>∞</sub>-based image coding methods suffer from poor rate control. In contrast, the <em>l</em><sub>2</sub> error metric aims for average fidelity and hence preserves the subtlety of smooth waveforms better than the <em>l</em><sub>∞</sub> error metric and it offers fine granularity in rate control; but pure <em>l</em><sub>2</sub>-based image coding methods (e.g., JPEG 2000) cannot bound individual errors as <em>l</em><sub>∞</sub>-based methods can. This thesis presents a new compression approach to retain the benefits and circumvent the pitfalls of the two error metrics.</p> / Master of Applied Science (MASc) L∞-constrained image compression predictive coding optimal scalar quantization Signal Processing Signal Processing
218	Design of Keyword Spotting System Based on Segmental Time Warping of Quantized Features Karmacharya, Piush January 2012 (has links) Keyword Spotting in general means identifying a keyword in a verbal or written document. In this research a novel approach in designing a simple spoken Keyword Spotting/Recognition system based on Template Matching is proposed, which is different from the Hidden Markov Model based systems that are most widely used today. The system can be used equally efficiently on any language as it does not rely on an underlying language model or grammatical constraints. The proposed method for keyword spotting is based on a modified version of classical Dynamic Time Warping which has been a primary method for measuring the similarity between two sequences varying in time. For processing, a speech signal is divided into small stationary frames. Each frame is represented in terms of a quantized feature vector. Both the keyword and the speech utterance are represented in terms of 1‐dimensional codebook indices. The utterance is divided into segments and the warped distance is computed for each segment and compared against the test keyword. A distortion score for each segment is computed as likelihood measure of the keyword. The proposed algorithm is designed to take advantage of multiple instances of test keyword (if available) by merging the score for all keywords used. The training method for the proposed system is completely unsupervised, i.e., it requires neither a language model nor phoneme model for keyword spotting. Prior unsupervised training algorithms were based on computing Gaussian Posteriorgrams making the training process complex but the proposed algorithm requires minimal training data and the system can also be trained to perform on a different environment (language, noise level, recording medium etc.) by re‐training the original cluster on additional data. Techniques for designing a model keyword from multiple instances of the test keyword are discussed. System performance over variations of different parameters like number of clusters, number of instance of keyword available, etc were studied in order to optimize the speed and accuracy of the system. The system performance was evaluated for fourteen different keywords from the Call - Home and the Switchboard speech corpus. Results varied for different keywords and a maximum accuracy of 90% was obtained which is comparable to other methods using the same time warping algorithms on Gaussian Posteriorgrams. Results are compared for different parameters variation with suggestion of possible improvements. / Electrical and Computer Engineering Engineering Electrical Engineering Keyword Spotting K-means Clustering Segmental Time Warping Template Matching Vector Quantization
219	Speech Coder using Line Spectral Frequencies of Cascaded Second Order Predictors Namburu, Visala 14 November 2001 (has links) A major objective in speech coding is to represent speech with as few bits as possible. Usual transmission parameters include auto regressive parameters, pitch parameters, excitation signals and excitation gains. The pitch predictor makes these coders sensitive to channel errors. Aiming for robustness to channel errors, we do not use pitch prediction and compensate for its lack with a better representation of the excitation signal. We propose a new speech coding approach, Vector Sum Excited Cascaded Linear Prediction (VSECLP), based on code excited linear prediction. We implement forward linear prediction using five cascaded second order sections - parameterized in terms of line spectral frequency - in place of the conventional tenth order filter. The line spectral frequency parameters estimated by the Direct Line Spectral Frequency (DLSF) adaptation algorithm are closer to the true values than those estimated by the Cascaded Recursive Least Squares - Subsection algorithm. A simplified version of DLSF is proposed to further reduce computational complexity. Split vector quantization is used to quantize the line spectral frequency parameters and vector sum codebooks to quantize the excitation signals. The effect on reconstructed speech quality and transmission rate, of an increased number of bits and differently split combinations, is analyzed by testing VSECLP on the TIMIT database. The quantization of the excitation vectors using the discrete cosine transform resulted in segmental signal to noise ratio of 4 dB at 20.95 kbps, whereas the same quality was obtained at 9.6 kbps using vector sum codebooks. / Master of Science Vector Quantization Speech Coding Cascaded Second Order Predictors Linear Prediction Line Spectral Frequencies
220	Experiments in Image Segmentation for Automatic US License Plate Recognition Diaz Acosta, Beatriz 09 July 2004 (has links) License plate recognition/identification (LPR/I) applies image processing and character recognition technology to identify vehicles by automatically reading their license plates. In the United States, however, each state has its own standard-issue plates, plus several optional styles, which are referred to as special license plates or varieties. There is a clear absence of standardization and multi-colored, complex backgrounds are becoming more frequent in license plates. Commercially available optical character recognition (OCR) systems generally fail when confronted with textured or poorly contrasted backgrounds, therefore creating the need for proper image segmentation prior to classification. The image segmentation problem in LPR is examined in two stages: license plate region detection and license plate character extraction from background. Three different approaches for license plate detection in a scene are presented: region distance from eigenspace, border location by edge detection and the Hough transform, and text detection by spectral analysis. The experiments for character segmentation involve the RGB, HSV/HSI and 1976 CIE Lab* color spaces as well as their Karhunen-Loéve transforms. The segmentation techniques applied include multivariate hierarchical agglomerative clustering and minimum-variance color quantization. The trade-off between accuracy and computational expense is used to select a final reliable algorithm for license plate detection and character segmentation. The spectral analysis approach together with the K-L Lab* transformed color quantization are found experimentally as the best alternatives for the two identified image segmentation stages for US license plate recognition. / Master of Science Data Cluster Analysis Hough Transform Spectral Analysis Color Image Segmentation License Plate Recognition Minimum-Variance Quantization

Search results