• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 5
  • 5
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Diarization, Localization and Indexing of Meeting Archives

Vajaria, Himanshu 21 February 2008 (has links)
This dissertation documents the research performed on the topics of localization, diarization and indexing in meeting archives. It surveys existing work in these areas, identifies opportunities for improvements and proposes novel solutions for each of these problems. The framework resulting from this dissertation enables various kinds of queries such as identifying the participants of a meeting, finding all meetings for a particular participant, locating a particular individual in the video and finding all instances of speech from a particular individual. Also, since the proposed solutions are computationally efficient, require no training and use little domain knowledge, they can be easily ported to other domains of multimedia analysis. Speaker diarization involves determining the number of distinct speakers and identifying the durations when they spoke in an audio recording. We propose novel solutions for the segmentation and clustering sub-tasks, based on graph spectral clustering. The resulting system yields a diarization error rate of around 20%, a relative improvement of 16% over the current popular diarization technique which is based on hierarchical clustering. The most significant contribution of this work lies in performing speaker localization using only a single camera and a single microphone by exploiting long term audio-visual co-occurence. Our novel computational model allows identifying regions in the image belonging to the speaker even when the speaker's face is non-frontal and even when the speaker is only partially visible. This approach results in a hit ratio of 73.8% compared to an MI based approach which results in a hit ratio of 52.6%, which illustrates its suitability in the meeting domain. The third problem addresses indexing meeting archives to enable retrieving all segments from the archive during which a particular individual speaks, in a query by example framework. By performing audio-visual association and clustering, a target cluster is generated per individual that contains multiple multimodal samples for that individual to which a query sample is matched. The use of multiple samples results in a retrieval precision of 92.6% at 90% recall compared to a precision of 71% at the same recall, achieved by a unimodal unisample system.
2

Indexing presentations using multiple media streams

Ruddarraju, Ravikrishna 15 August 2006 (has links)
This thesis presents novel techniques to index multiple media streams in a digi- tally captured presentation. These media streams are related by the common content in a presentation. We use relevance curves to represent these relationships. These relevance curves are generated by using a mix of text processing techniques and distance measures for sparse vocabularies. These techniques are used to automatically detect slide boundaries in a presentation. Accuracy of detecting these boundaries is evaluated as a function of word error rates.
3

Detecting Group Turns of Speaker Groups in Meeting Room Conversations Using Audio-Video Change Scale-Space

Krishnan, Ravikiran 30 June 2010 (has links)
Automatic analysis of conversations is important for extracting high-level descriptions of meetings. In this work, as an alternative to linguistic approaches, we develop a novel, purely bottom-up representation, constructed from both audio and video signals that help us char- acterize and build a rich description of the content at multiple temporal scales. Nonverbal communication plays an important role in describing information about the communication and the nature of the conversation. We consider simple audio and video features to extract these changes in conversation. In order to detect these changes, we consider the evolution of the detected change, using the Bayesian Information Criterion (BIC) at multiple temporal scales to build an audio-visual change scale-space. Peaks detected in this representation yields group turn based conversational changes at dierent temporal scales. We use the NIST Meeting Room corpus to test our approach. Four clips of eight minutes are extracted from this corpus at random, and the other ten are extracted after 90 seconds of the start of the entire video in the corpus. A single microphone and a single camera are used from the dataset. The group turns detected in this test gave an overall detection result, when compared with dierent thresholds with xed group turn scale range, of 82%, and a best result of 91% for a single video. Conversation overlaps, changes and their inferred models oer an intermediate-level de- scription of meeting videos that are useful in summarization and indexing of meetings. Since the proposed solutions are computationally e cient, require no training and use little domain knowledge, they can be easily added as a feature to other multimedia analysis techniques.
4

Multimedia Analysis Over 3G Wireless Interface

Tay, Jeremy Yee Chiat January 2003 (has links)
Recent rapid advancements in mobile communication and emerging demands for complicated multimedia content and services over mobile systems have caused a dramatic increase in research interest in this area. Among the topics covering multimedia service performance over the wireless interface, the quality of received multimedia content is an important issue. With the increase of visual media in mobile services, user opinion acquired through perception of received image quality will play an increasingly important role in determining the effectiveness of such services. The work documented in this thesis is motivated by the general lack of published work on software test beds for Third Generation Mobile Network (3G) and in particular for investigating mobile environment multimedia quality degradation. A 3G multimedia quality analysis system is presented, subjecting the input multimedia stream to the simulated 3G radio activities and measuring its degradation in terms of human perception. This approach takes a new and different model of multimedia quality measurement in a wireless communication domain, showing the possibility of a more effective approach that can be applied in many cases for assisting service quality assurance research across this area. The development of this software system is covered in detail together with in-depth analysis of multimedia image quality over a simulated 3G radio interface. Universal Mobile Telecommunications System (UMTS) is the 3G standard chosen for study in this work. The suggested test bed simulates a single Frequency Division Duplex (FDD) downlink UMTS Territorial Radio Access (UTRA) channel, where the received media's image analysis is performed using a Human Vision System (HVS) based image quality metric. The system aims to provide a multipurpose and versatile multimedia 3G test bed for use in testing of various solutions for protecting multimedia data across a 3G radio interface. Furthermore, it produces effective human vision oriented feedback on visual media degradation, providing a new and efficient method to address effectiveness of solutions in multimedia delivery over a mobile environment. This thesis shows the ability of HVS-based image quality metric in analyzing degradation of visual media over a noisy mobile environment. This presents a novel direction in the area of telecommunication service multimedia quality analysis, with potential user quality perception being considered on top of data or signal-based error measurements. With such a new approach, development of multimedia protection solutions can be made more effective. Effective feedback provided by considering quality measurement with strong correlation to human perception allows close analysis of user visual discrimination across an image. An example of the usefulness of this information is especially visible if considering development of a content-based multimedia data protective system that provides different levels of protection, depending on the importance of visual media. An apparent potential application of this thesis is in the testing of a multimedia/image protection protocol in a downlink channel. Future work might aim to extend the current system by adding network level traffic simulations and further addition of dynamic network control components, further considering network traffic conditions.
5

Visual Interactive Labeling of Large Multimedia News Corpora

Han, Qi, John, Markus, Kurzhals, Kuno, Messner, Johannes, Ertl, Thomas 25 January 2019 (has links)
The semantic annotation of large multimedia corpora is essential for numerous tasks. Be it for the training of classification algorithms, efficient content retrieval, or for analytical reasoning, appropriate labels are often the first necessity before automatic processing becomes efficient. However, manual labeling of large datasets is time-consuming and tedious. Hence, we present a new visual approach for labeling and retrieval of reports in multimedia news corpora. It combines automatic classifier training based on caption text from news reports with human interpretation to ease the annotation process. In our approach, users can initialize labels with keyword queries and iteratively annotate examples to train a classifier. The proposed visualization displays representative results in an overview that allows to follow different annotation strategies (e.g., active learning) and assess the quality of the classifier. Based on a usage scenario, we demonstrate the successful application of our approach. Therein, users label several topics which interest them and retrieve related documents with high confidence from three years of news reports.

Page generated in 0.0782 seconds