• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 5
  • 1
  • Tagged with
  • 53
  • 18
  • 17
  • 11
  • 6
  • 6
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Automatic annotation of musical audio for interactive applications

Brossier, Paul M. January 2006 (has links)
As machines become more and more portable, and part of our everyday life, it becomes apparent that developing interactive and ubiquitous systems is an important aspect of new music applications created by the research community. We are interested in developing a robust layer for the automatic annotation of audio signals, to be used in various applications, from music search engines to interactive installations, and in various contexts, from embedded devices to audio content servers. We propose adaptations of existing signal processing techniques to a real time context. Amongst these annotation techniques, we concentrate on low and mid-level tasks such as onset detection, pitch tracking, tempo extraction and note modelling. We present a framework to extract these annotations and evaluate the performances of different algorithms. The first task is to detect onsets and offsets in audio streams within short latencies. The segmentation of audio streams into temporal objects enables various manipulation and analysis of metrical structure. Evaluation of different algorithms and their adaptation to real time are described. We then tackle the problem of fundamental frequency estimation, again trying to reduce both the delay and the computational cost. Different algorithms are implemented for real time and experimented on monophonic recordings and complex signals. Spectral analysis can be used to label the temporal segments; the estimation of higher level descriptions is approached. Techniques for modelling of note objects and localisation of beats are implemented and discussed. Applications of our framework include live and interactive music installations, and more generally tools for the composers and sound engineers. Speed optimisations may bring a significant improvement to various automated tasks, such as automatic classification and recommendation systems. We describe the design of our software solution, for our research purposes and in view of its integration within other systems.
22

The National Gramophonic Society

Morgan, Nicholas Travers January 2013 (has links)
No description available.
23

Perceptual mixing for musical production

Terrell, Michael John January 2012 (has links)
A general model of music mixing is developed, which enables a mix to be evaluated as a set of acoustic signals. A second model describes the mixing process as an optimisation problem, in which the errors are evaluated by comparing sound features of a mix with those of a reference mix, and the parameters are the controls on the mixing console. Initial focus is placed on live mixing, where the practical issues of: live acoustic sources, multiple listeners, and acoustic feedback, increase the technical burden on the mixing engineer. Using the two models, a system is demonstrated that takes as input reference mixes, and automatically sets the controls on the mixing console to recreate their objective, acoustic sound features for all listeners, taking into account the practical issues outlined above. This reduces the complexity of mixing live music to that of recorded music, and unifies future mixing research. Sound features evaluated from audio signals are shown to be unsuitable for describing a mix, because they do not incorporate the effects of listening conditions, or masking interactions between sounds. Psychophysical test methods are employed to develop a new perceptual sound feature, termed the loudness balance, which is the first loudness feature to be validated for musical sounds. A novel, perceptual mixing system is designed, which allows users to directly control the loudness balance of the sounds they are mixing, for both live and recorded music, and which can be extended to incorporate other perceptual features. The perceptual mixer is also employed as an analytical tool, to allow direct measurement of mixing best practice, to provide fully-automatic mixing functionality, and is shown to be an improvement over current heuristic models. Based on the conclusions of the work, a framework for future automatic mixing is provided, centred on perceptual sound features that are validated using psychophysical methods.
24

Information-theoretic measures of predictability for music content analysis

Foster, Peter January 2014 (has links)
This thesis is concerned with determining similarity in musical audio, for the purpose of applications in music content analysis. With the aim of determining similarity, we consider the problem of representing temporal structure in music. To represent temporal structure, we propose to compute information-theoretic measures of predictability in sequences. We apply our measures to track-wise representations obtained from musical audio; thereafter we consider the obtained measures predictors of musical similarity. We demonstrate that our approach benefits music content analysis tasks based on musical similarity. For the intermediate-specificity task of cover song identification, we compare contrasting discrete-valued and continuous-valued measures of pairwise predictability between sequences. In the discrete case, we devise a method for computing the normalised compression distance (NCD) which accounts for correlation between sequences. We observe that our measure improves average performance over NCD, for sequential compression algorithms. In the continuous case, we propose to compute information-based measures as statistics of the prediction error between sequences. Evaluated using 300 Jazz standards and using the Million Song Dataset, we observe that continuous-valued approaches outperform discrete-valued approaches. Further, we demonstrate that continuous-valued measures of predictability may be combined to improve performance with respect to baseline approaches. Using a filter-and-refine approach, we demonstrate state-of-the-art performance using the Million Song Dataset. For the low-specificity tasks of similarity rating prediction and song year prediction, we propose descriptors based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. We evaluate our descriptors using a dataset of 15 500 track excerpts of Western popular music, for which we have 7 800 web-sourced pairwise similarity ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.
25

Supervised dictionary learning for action recognition and localization

Kumar, B. G. Vijay January 2012 (has links)
Image sequences with humans and human activities are everywhere. With the amount of produced and distributed data increasing at an unprecedented rate, there has been a lot of interest in building systems that can understand and interpret the visual data, and in particular detect and recognise human actions. Dictionary based approaches learn a dictionary from descriptors extracted from the videos in the first stage and a classifier or a detector in the second stage. The major drawback of such an approach is that the dictionary is learned in an unsupervised manner without considering the task (classification or detection) that follows it. In this work we develop task dependent(supervised) dictionaries for action recognition and localization, i.e., dictionaries that are best suited for the subsequent task. In the first part of the work, we propose a supervised max-margin framework for linear and non-linear Non-Negative Matrix Factorization (NMF). To achieve this, we impose max-margin constraints within the formulation of NMF and simultaneously solve for the classifier and the dictionary. The dictionary (basis matrix) thus obtained maximizes the margin of the classifier in the low dimensional space (in the linear case) or in the high dimensional feature space (in the non-linear case). In the second part the work, we develop methodologies for action localization. We first propose a dictionary weighting approach where we learn local and global weights for the dictionary by considering the localization information of the training sequences. We next extend this approach to learn a task-dependent dictionary for action localization that incorporates the localization information of the training sequences into dictionary learning. The results on publicly available datasets show that the performance of the system is improved by using the supervised information while learning dictionary.
26

Enhanced target detection in CCTV network system using colour constancy

Soori, Umair January 2014 (has links)
The focus of this research is to study how targets can be more faithfully detected in a multi-camera CCTV network system using spectral feature for the detection. The objective of the work is to develop colour constancy (CC) methodology to help maintain the spectral feature of the scene into a constant stable state irrespective of variable illuminations and camera calibration issues. Unlike previous work in the field of target detection, two versions of CC algorithms have been developed during the course of this work which are capable to maintain colour constancy for every image pixel in the scene: 1) a method termed as Enhanced Luminance Reflectance CC (ELRCC) which consists of a pixel-wise sigmoid function for an adaptive dynamic range compression, 2) Enhanced Target Detection and Recognition Colour Constancy (ETDCC) algorithm which employs a bidirectional pixel-wise non-linear transfer PWNLTF function, a centre-surround luminance enhancement and a Grey Edge white balancing routine. The effectiveness of target detections for all developed CC algorithms have been validated using multi-camera ‘Imagery Library for Intelligent Detection Systems’ (iLIDS), ‘Performance Evaluation of Tracking and Surveillance’ (PETS) and ‘Ground Truth Colour Chart’ (GTCC) datasets. It is shown that the developed CC algorithms have enhanced target detection efficiency by over 175% compared with that without CC enhancement. The contribution of this research has been one journal paper published in the Optical Engineering together with 3 conference papers in the subject of research.
27

Tracking moving objects in surveillance video

Dunne, Peter John January 2012 (has links)
The thesis looks at approaches to the detection and tracking of potential objects of interest in surveillance video. The aim was to investigate and develop methods that might be suitable for eventual application through embedded software, running on a fixed-point processor, in analytics capable cameras. The work considers common approaches to object detection and representation, seeking out those that offer the necessary computational economy and the potential to be able to cope with constraints such as low frame rate due to possible limited processor time, or weak chromatic content that can occur in some typical surveillance contexts. The aim is for probabilistic tracking of objects rather than simple concatenation of frame by frame detections. This involves using recursive Bayesian estimation. The particle filter is a technique for implementing such a recursion and so it is examined in the context of both single target and combined multi-target tracking. A detailed examination of the operation of the single target tracking particle filter shows that objects can be tracked successfully using a relatively simple structured grey-scale histogram representation. It is shown that basic components of the particle filter can be simplified without loss in tracking quality. An analysis brings out the relationships between commonly used target representation distance measures and shows that in the context of the particle filter there is little to choose between them. With the correct choice of parameters, the simplest and computationally economic distance measure performs well. The work shows how to make that correct choice. Similarly, it is shown that a simple measurement likelihood function can be used in place of the more ubiquitous Gaussian. The important step of target state estimation is examined. The standard weighted mean approach is rejected, a recently proposed maximum a posteriori approach is shown to be not suitable in the context of the work, and a practical alternative is developed. Two methods are presented for tracker initialization. One of them is a simplification of an existing published method, the other is a novel approach. The aim is to detect trackable objects as they enter the scene, extract trackable features, then actively follow those features through subsequent frames. The multi-target tracking problem is then posed as one of management of multiple independent trackers.
28

Linking music metadata

Macrae, Robert January 2012 (has links)
The internet has facilitated music metadata production and distribution on an unprecedented scale. A contributing factor of this data deluge is a change in the authorship of this data from the expert few to the untrained crowd. The resulting unordered flood of imperfect annotations provides challenges and opportunities in identifying accurate metadata and linking it to the music audio in order to provide a richer listening experience. We advocate novel adaptations of Dynamic Programming for music metadata synchronisation, ranking and comparison. This thesis introduces Windowed Time Warping, Greedy, Constrained On-Line Time Warping for synchronisation and the Concurrence Factor for automatically ranking metadata. We begin by examining the availability of various music metadata on the web. We then review Dynamic Programming methods for aligning and comparing two source sequences whilst presenting novel, specialised adaptations for efficient, realtime synchronisation of music and metadata that make improvements in speed and accuracy over existing algorithms. The Concurrence Factor, which measures the degree in which an annotation of a song agrees with its peers, is proposed in order to utilise the wisdom of the crowds to establish a ranking system. This attribute uses a combination of the standard Dynamic Programming methods Levenshtein Edit Distance, Dynamic Time Warping, and Longest Common Subsequence to compare annotations. We present a synchronisation application for applying the aforementioned methods as well as a tablature-parsing application for mining and analysing guitar tablatures from the web. We evaluate the Concurrence Factor as a ranking system on a largescale collection of guitar tablatures and lyrics to show a correlation with accuracy that is superior to existing methods currently used in internet search engines, which are based on popularity and human ratings.
29

Motion prediction and interaction localisation of people in crowds

Mazzon, Riccardo January 2013 (has links)
The ability to analyse and predict the movement of people in crowded scenarios can be of fundamental importance for tracking across multiple cameras and interaction localisation. In this thesis, we propose a person re-identification method that takes into account the spatial location of cameras using a plan of the locale and the potential paths people can follow in the unobserved areas. These potential paths are generated using two models. In the first, people’s trajectories are constrained to pass through a set of areas of interest (landmarks) in the site. In the second we integrate a goal-driven approach to the Social Force Model (SFM), initially introduced for crowd simulation. SFM models the desire of people to reach specific interest points (goals) in a site, such as exits, shops, seats and meeting points while avoiding walls and barriers. Trajectory propagation creates the possible re-identification candidates, on which association of people across cameras is performed using spatial location of the candidates and appearance features extracted around a person’s head. We validate the proposed method in a challenging scenario from London Gatwick airport and compare it to state-of-the-art person re-identification methods. Moreover, we perform detection and tracking of interacting people in a framework based on SFM that analyses people’s trajectories. The method embeds plausible human behaviours to predict interactions in a crowd by iteratively minimising the error between predictions and measurements. We model people approaching a group and restrict the group formation based on the relative velocity of candidate group members. The detected groups are then tracked by linking their centres of interaction over time using a buffered graph-based tracker. We show how the proposed framework outperforms existing group localisation techniques on three publicly available datasets.
30

Multi-target tracking and performance evaluation on videos

Poiesi, Fabio January 2014 (has links)
Multi-target tracking is the process that allows the extraction of object motion patterns of interest from a scene. Motion patterns are often described through metadata representing object locations and shape information. In the first part of this thesis we discuss the state-of-the-art methods aimed at accomplishing this task on monocular views and also analyse the methods for evaluating their performance. The second part of the thesis describes our research contribution to these topics. We begin presenting a method for multi-target tracking based on track-before-detect (MTTBD) formulated as a particle filter. The novelty involves the inclusion of the target identity (ID) into the particle state, which enables the algorithm to deal with an unknown and unlimited number of targets. We propose a probabilistic model of particle birth and death based on Markov Random Fields. This model allows us to overcome the problem of the mixing of IDs of close targets. We then propose three evaluation measures that take into account target-size variations, combine accuracy and cardinality errors, quantify long-term tracking accuracy at different accuracy levels, and evaluate ID changes relative to the duration of the track in which they occur. This set of measures does not require pre-setting of parameters and allows one to holistically evaluate tracking performance in an application-independent manner. Lastly, we present a framework for multi-target localisation applied on scenes with a high density of compact objects. Candidate target locations are initially generated by extracting object features from intensity maps using an iterative method based on a gradient-climbing technique and an isocontour slicing approach. A graph-based data association method for multi-target tracking is then applied to link valid candidate target locations over time and to discard those which are spurious. This method can deal with point targets having indistinguishable appearance and unpredictable motion. MT-TBD is evaluated and compared with state-of-the-art methods on real-world surveillance.

Page generated in 0.024 seconds