Global ETD Search

121	Making music through real-time voice timbre analysis : machine learning and timbral control Stowell, Dan January 2010 (has links) People can achieve rich musical expression through vocal sound { see for example human beatboxing, which achieves a wide timbral variety through a range of extended techniques. Yet the vocal modality is under-exploited as a controller for music systems. If we can analyse a vocal performance suitably in real time, then this information could be used to create voice-based interfaces with the potential for intuitive and ful lling levels of expressive control. Conversely, many modern techniques for music synthesis do not imply any particular interface. Should a given parameter be controlled via a MIDI keyboard, or a slider/fader, or a rotary dial? Automatic vocal analysis could provide a fruitful basis for expressive interfaces to such electronic musical instruments. The principal questions in applying vocal-based control are how to extract musically meaningful information from the voice signal in real time, and how to convert that information suitably into control data. In this thesis we address these questions, with a focus on timbral control, and in particular we develop approaches that can be used with a wide variety of musical instruments by applying machine learning techniques to automatically derive the mappings between expressive audio input and control output. The vocal audio signal is construed to include a broad range of expression, in particular encompassing the extended techniques used in human beatboxing. The central contribution of this work is the application of supervised and unsupervised machine learning techniques to automatically map vocal timbre to synthesiser timbre and controls. Component contributions include a delayed decision-making strategy for low-latency sound classi cation, a regression-tree method to learn associations between regions of two unlabelled datasets, a fast estimator of multidimensional di erential entropy and a qualitative method for evaluating musical interfaces based on discourse analysis. 300.285 Electronic Engineering
122	Surveillance centric coding Akram, Muhammad January 2011 (has links) The research work presented in this thesis focuses on the development of techniques specific to surveillance videos for efficient video compression with higher processing speed. The Scalable Video Coding (SVC) techniques are explored to achieve higher compression efficiency. The framework of SVC is modified to support Surveillance Centric Coding (SCC). Motion estimation techniques specific to surveillance videos are proposed in order to speed up the compression process of the SCC. The main contributions of the research work presented in this thesis are divided into two groups (i) Efficient Compression and (ii) Efficient Motion Estimation. The paradigm of Surveillance Centric Coding (SCC) is introduced, in which coding aims to achieve bit-rate optimisation and adaptation of surveillance videos for storing and transmission purposes. In the proposed approach the SCC encoder communicates with the Video Content Analysis (VCA) module that detects events of interest in video captured by the CCTV. Bit-rate optimisation and adaptation are achieved by exploiting the scalability properties of the employed codec. Time segments containing events relevant to surveillance application are encoded using high spatiotemporal resolution and quality while the irrelevant portions from the surveillance standpoint are encoded at low spatio-temporal resolution and / or quality. Thanks to the scalability of the resulting compressed bit-stream, additional bit-rate adaptation is possible; for instance for the transmission purposes. Experimental evaluation showed that significant reduction in bit-rate can be achieved by the proposed approach without loss of information relevant to surveillance applications. In addition to more optimal compression strategy, novel approaches to performing efficient motion estimation specific to surveillance videos are proposed and implemented with experimental results. A real-time background subtractor is used to detect the presence of any motion activity in the sequence. Different approaches for selective motion estimation, GOP based, Frame based and Block based, are implemented. In the former, motion estimation is performed for the whole group of pictures (GOP) only when a moving object is detected for any frame of the GOP. iii While for the Frame based approach; each frame is tested for the motion activity and consequently for selective motion estimation. The selective motion estimation approach is further explored at a lower level as Block based selective motion estimation. Experimental evaluation showed that significant reduction in computational complexity can be achieved by applying the proposed strategy. In addition to selective motion estimation, a tracker based motion estimation and fast full search using multiple reference frames has been proposed for the surveillance videos. Extensive testing on different surveillance videos shows benefits of application of proposed approaches to achieve the goals of the SCC. 502.85 Electronic Engineering
123	Sparse approximation and dictionary learning with applications to audio signals Barchiesi, Daniele January 2013 (has links) Over-complete transforms have recently become the focus of a wide wealth of research in signal processing, machine learning, statistics and related fields. Their great modelling flexibility allows to find sparse representations and approximations of data that in turn prove to be very efficient in a wide range of applications. Sparse models express signals as linear combinations of a few basis functions called atoms taken from a so-called dictionary. Finding the optimal dictionary from a set of training signals of a given class is the objective of dictionary learning and the main focus of this thesis. The experimental evidence presented here focuses on the processing of audio signals, and the role of sparse algorithms in audio applications is accordingly highlighted. The first main contribution of this thesis is the development of a pitch-synchronous transform where the frame-by-frame analysis of audio data is adapted so that each frame analysing periodic signals contains an integer number of periods. This algorithm presents a technique for adapting transform parameters to the audio signal to be analysed, it is shown to improve the sparsity of the representation if compared to a non pitchsynchronous approach and further evaluated in the context of source separation by binary masking. A second main contribution is the development of a novel model and relative algorithm for dictionary learning of convolved signals, where the observed variables are sparsely approximated by the atoms contained in a convolved dictionary. An algorithm is devised to learn the impulse response applied to the dictionary and experimental results on synthetic data show the superior approximation performance of the proposed method compared to a state-of-the-art dictionary learning algorithm. Finally, a third main contribution is the development of methods for learning dictionaries that are both well adapted to a training set of data and mutually incoherent. Two novel algorithms namely the incoherent k-svd and the iterative projections and rotations (ipr) algorithm are introduced and compared to different techniques published in the literature in a sparse approximation context. The ipr algorithm in particular is shown to outperform the benchmark techniques in learning very incoherent dictionaries while maintaining a good signal-to-noise ratio of the representation. 621.384 Electronic Engineering
124	Digital signal processing for the detection of hidden objects using an FMCW radar Liau, Teh-Fu January 1987 (has links) This thesis deals with the detection of hidden objects using a short-range frequency-modulated continuous wave (FMCW) radar. The detection is carried out by examining the estimated Power Spectral Density (PSD) functions of sampled returns, the peaks of which theoretically correspond to the reflecting surfaces of hidden objects. Fourier and non-Fourier PSD estimation algorithms are applied to the radar returns to extract information on the hidden surfaces. The Fourier methods used are Direct, Blackman-Tukey, Bartlett, and Smoothed Periodograms. The different PSDs are compared, and the validity of each PSD is then discussed. The study is new for this type of radar and the results are used as references for other PSD estimations. Non-Fourier methods offer many advantages. Firstly the Autoregressive Process (AR) is used for this particular application. As well as PSDs the noise spectra are also produced to show the performance of the chosen models. An alternative approach to the conventional forward-backward residuals ( e. g. Burg's method) or autocorrelation and covariance methods ( as those used in speech analysis ) is introduced in this thesis. The stability and good resolution of the PSDs is obtained by a better estimation of the autocovariance coefficients (ACF) from the data available : averaging two p-shifted ACF calculated by covariance method. Once the covariance coefficients are found, the Levinson-Durbin recursive algorithm is used to get the model parameters and the PSDs. Two other non-conventional methods are also attempted to show the image of hidden objects. They are Pisarenko Harmonic Decomposition method and Prony energy spectrum density estimation. In addition to the one-dimensional processing stated above, this thesis extends it to two-dimensional cases, which give more information on the shape of hidden objects. 621.3848 Electronic Engineering
125	Low attenuation microwave waveguides Al-Hariri, Ali Mohammed Bakir January 1974 (has links) An investigation of the dispersion and attenuation characteristics of cylindrical structures supporting guided electromagnetic waves with low attenuation is described. The object of the investigation is to understand how the cross-sectional shape and the nature of the boundary conditions affects the propagation characteristics. Attention is directed towards structures supporting the least number of propagating modes under the conditions which yield low attenuation over a reasonable bandwidth. Elliptical waveguides with both smooth-walls and corrugated walls are studied in detail. This reveals errors in previous-theories which are corrected. Some aspects of corrugated rectangular and circular waveguides are considered. Potential low attenuation waveguides such as the dielectric lined and dielectric waveguides are evaluated. 621.3813 Electronic Engineering
126	Non-intrusive measurement in packet networks and its applications Ming, Leung Chi January 2004 (has links) Network measurementis becoming increasingly important as a meanst o assesst he performanceo f packet networks. Network performance can involve different aspects such as availability, link failure detection etc, but in this thesis, we will focus on Quality of Service (QoS). Among the metrics used to define QoS, we are particularly interested in end-to-end delay performance. Recently, the adoption of Service Level Agreements (SLA) between network operators and their customersh as becomea major driving force behind QoS measurementm: easurementi s necessaryt o produce evidence of fulfilment of the requirements specified in the SLA. Many attempts to do QoS based packet level measurement have been based on Active Measurement, in which the properties of the end-to-end path are tested by adding testing packets generated from the sending end. The main drawback of active probing is its intrusive nature which causes extraburden on the network, and has been shown to distort the measured condition of the network. The other category of network measurement is known as Passive Measurement. In contrast to Active Measurement, there are no testing packets injected into the network, therefore no intrusion is caused. The proposed applications using Passive Measurement are currently quite limited. But Passive Measurement may offer the potential for an entirely different perspective compared with Active Measurements In this thesis, the objective is to develop a measurement methodology for the end-to-end delay performance based on Passive Measurement. We assume that the nodes in a network domain are accessible.F or example, a network domain operatedb y a single network operator. The novel idea is to estimate the local per-hop delay distribution based on a hybrid approach (model and measurement-based)W. ith this approach,t he storagem easurementd ata requirement can be greatly alleviated and the overhead put in each local node can be minimized, so maintaining the fast switching operation in a local switcher or router. Per-hop delay distributions have been widely used to infer QoS at a single local node. However, the end-to-end delay distribution is more appropriate when quantifying delays across an end-to-end path. Our approach is to capture every local node's delay distribution, and then the end-to-end delay distribution can be obtained by convolving the estimated delay distributions. In this thesis, our algorithm is examined by comparing the proximity of the actual end-to-end delay distribution with the estimated one obtained by our measurement method under various conditions. e. g. in the presence of Markovian or Power-law traffic. Furthermore, the comparison between Active Measurement and our scheme is also studied. 2 Network operators may find our scheme useful when measuring the end-to-end delay performance. As stated earlier, our scheme has no intrusive effect. Furthermore, the measurement result in the local node can be re-usable to deduce other paths' end-to-end delay behaviour as long as this local node is included in the path. Thus our scheme is more scalable compared with active probing. 621.38216 Electronic Engineering
127	Motion scalability for video coding with flexible spatio-temporal decompositions Mrak, Marta January 2007 (has links) The research presented in this thesis aims to extend the scalability range of the wavelet-based video coding systems in order to achieve fully scalable coding with a wide range of available decoding points. Since the temporal redundancy regularly comprises the main portion of the global video sequence redundancy, the techniques that can be generally termed motion decorrelation techniques have a central role in the overall compression performance. For this reason the scalable motion modelling and coding are of utmost importance, and specifically, in this thesis possible solutions are identified and analysed. The main contributions of the presented research are grouped into two interrelated and complementary topics. Firstly a flexible motion model with rateoptimised estimation technique is introduced. The proposed motion model is based on tree structures and allows high adaptability needed for layered motion coding. The flexible structure for motion compensation allows for optimisation at different stages of the adaptive spatio-temporal decomposition, which is crucial for scalable coding that targets decoding on different resolutions. By utilising an adaptive choice of wavelet filterbank, the model enables high compression based on efficient mode selection. Secondly, solutions for scalable motion modelling and coding are developed. These solutions are based on precision limiting of motion vectors and creation of a layered motion structure that describes hierarchically coded motion. The solution based on precision limiting relies on layered bit-plane coding of motion vector values. The second solution builds on recently established techniques that impose scalability on a motion structure. The new approach is based on two major improvements: the evaluation of distortion in temporal Subbands and motion search in temporal subbands that finds the optimal motion vectors for layered motion structure. Exhaustive tests on the rate-distortion performance in demanding scalable video coding scenarios show benefits of application of both developed flexible motion model and various solutions for scalable motion coding. 006.696 Electronic Engineering
128	Robust signatures for 3D face registration and recognition Nair, Prathap M. January 2010 (has links) Biometric authentication through face recognition has been an active area of research for the last few decades, motivated by its application-driven demand. The popularity of face recognition, compared to other biometric methods, is largely due to its minimum requirement of subject co-operation, relative ease of data capture and similarity to the natural way humans distinguish each other. 3D face recognition has recently received particular interest since three-dimensional face scans eliminate or reduce important limitations of 2D face images, such as illumination changes and pose variations. In fact, three-dimensional face scans are usually captured by scanners through the use of a constant structured-light source, making them invariant to environmental changes in illumination. Moreover, a single 3D scan also captures the entire face structure and allows for accurate pose normalisation. However, one of the biggest challenges that still remain in three-dimensional face scans is the sensitivity to large local deformations due to, for example, facial expressions. Due to the nature of the data, deformations bring about large changes in the 3D geometry of the scan. In addition to this, 3D scans are also characterised by noise and artefacts such as spikes and holes, which are uncommon with 2D images and requires a pre-processing stage that is speci c to the scanner used to capture the data. The aim of this thesis is to devise a face signature that is compact in size and overcomes the above mentioned limitations. We investigate the use of facial regions and landmarks towards a robust and compact face signature, and we study, implement and validate a region-based and a landmark-based face signature. Combinations of regions and landmarks are evaluated for their robustness to pose and expressions, while the matching scheme is evaluated for its robustness to noise and data artefacts. 005.3 Electronic Engineering
129	Automated camera ranking and selection using video content and scene context Daniyal, Fahad M. January 2012 (has links) When observing a scene with multiple cameras, an important problem to solve is to automatically identify “what camera feed should be shown and when?” The answer to this question is of interest for a number of applications and scenarios ranging from sports to surveillance. In this thesis we present a framework for the ranking of each video frame and camera across time and the camera network, respectively. This ranking is then used for automated video production. In the first stage information from each camera view and from the objects in it is extracted and represented in a way that allows for object- and frame-ranking. First objects are detected and ranked within and across camera views. This ranking takes into account both visible and contextual information related to the object. Then content ranking is performed based on the objects in the view and camera-network level information. We propose two novel techniques for content ranking namely: Routing Based Ranking (RBR) and Multivariate Gaussian based Ranking (MVG). In RBR we use a rule based framework where weighted fusion of object and frame level information takes place while in MVG the rank is estimated as a multivariate Gaussian distribution. Through experimental and subjective validation we demonstrate that the proposed content ranking strategies allows the identification of the best-camera at each time. The second part of the thesis focuses on the automatic generation of N-to-1 videos based on the ranked content. We demonstrate that in such production settings it is undesirable to have frequent inter-camera switching. Thus motivating the need for a compromise, between selecting the best camera most of the time and minimising the frequent inter-camera switching, we demonstrate that state-of-the-art techniques for this task are inadequate and fail in dynamic scenes. We propose three novel methods for automated camera selection. The first method (¡go f ) performs a joint optimization of a cost function that depends on both the view quality and inter-camera switching so that a i Abstract ii pleasing best-view video sequence can be composed. The other two methods (¡dbn and ¡util) include the selection decision into the ranking-strategy. In ¡dbn we model the best-camera selection as a state sequence via Directed Acyclic Graphs (DAG) designed as a Dynamic Bayesian Network (DBN), which encodes the contextual knowledge about the camera network and employs the past information to minimize the inter camera switches. In comparison ¡util utilizes the past as well as the future information in a Partially Observable Markov Decision Process (POMDP) where the camera-selection at a certain time is influenced by the past information and its repercussions in the future. The performance of the proposed approach is demonstrated on multiple real and synthetic multi-camera setups. We compare the proposed architectures with various baseline methods with encouraging results. The performance of the proposed approaches is also validated through extensive subjective testing. 006.3 Electronic Engineering
130	Automatic transcription of polyphonic music exploiting temporal evolution Benetos, Emmanouil January 2012 (has links) Automatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes. 621.382 Electronic Engineering

Search results