• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 17
  • 4
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 37
  • 37
  • 22
  • 10
  • 9
  • 7
  • 7
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Improvement of Sound Source Localization for a Binaural Robot of Spherical Head with Pinnae / 耳介付球状頭部を持つ両耳聴ロボットのための音源定位の高性能化

Kim, Ui-Hyun 24 September 2013 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第17928号 / 情博第510号 / 新制||情||90(附属図書館) / 30748 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 奥乃 博, 教授 河原 達也, 教授 山本 章博 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
12

Speech Detection Using Gammatone Features And One-class Support Vector Machine

Cooper, Douglas 01 January 2013 (has links)
A network gateway is a mechanism which provides protocol translation and/or validation of network traffic using the metadata contained in network packets. For media applications such as Voice-over-IP, the portion of the packets containing speech data cannot be verified and can provide a means of maliciously transporting code or sensitive data undetected. One solution to this problem is through Voice Activity Detection (VAD). Many VAD’s rely on time-domain features and simple thresholds for efficient speech detection however this doesn’t say much about the signal being passed. More sophisticated methods employ machine learning algorithms, but train on specific noises intended for a target environment. Validating speech under a variety of unknown conditions must be possible; as well as differentiating between speech and nonspeech data embedded within the packets. A real-time speech detection method is proposed that relies only on a clean speech model for detection. Through the use of Gammatone filter bank processing, the Cepstrum and several frequency domain features are used to train a One-Class Support Vector Machine which provides a clean-speech model irrespective of environmental noise. A Wiener filter is used to provide improved operation for harsh noise environments. Greater than 90% detection accuracy is achieved for clean speech with approximately 70% accuracy for SNR as low as 5dB
13

Saliency-directed prioritization of visual data in wireless surveillance networks

Mehmood, Irfan, Sajjad, M., Ejaz, W., Baik, S.W. 18 July 2019 (has links)
Yes / In wireless visual sensor networks (WVSNs), streaming all imaging data is impractical due to resource constraints. Moreover, the sheer volume of surveillance videos inhibits the ability of analysts to extract actionable intelligence. In this work, an energy-efficient image prioritization framework is presented to cope with the fragility of traditional WVSNs. The proposed framework selects semantically relevant information before it is transmitted to a sink node. This is based on salient motion detection, which works on the principle of human cognitive processes. Each camera node estimates the background by a bootstrapping procedure, thus increasing the efficiency of salient motion detection. Based on the salient motion, each sensor node is classified as being high or low priority. This classification is dynamic, such that camera nodes toggle between high-priority and low-priority status depending on the coverage of the region of interest. High-priority camera nodes are allowed to access reliable radio channels to ensure the timely and reliable transmission of data. We compare the performance of this framework with other state-of-the-art methods for both single and multi-camera monitoring. The results demonstrate the usefulness of the proposed method in terms of salient event coverage and reduced computational and transmission costs, as well as in helping analysts find semantically relevant visual information. / Supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2013R1A1A2012904).
14

What, When, and Where Exactly? Human Activity Detection in Untrimmed Videos Using Deep Learning

Rahman, Md Atiqur 06 December 2023 (has links)
Over the past decade, there has been an explosion in the volume of video data, including internet videos and surveillance camera footage. These videos often feature extended durations with unedited content, predominantly filled with background clutter, while the relevant activities of interest occupy only a small portion of the footage. Consequently, there is a compelling need for advanced processing techniques to automatically analyze this vast reservoir of video data, specifically with the goal of identifying the segments that contain the events of interest. Given that humans are the primary subjects in these videos, comprehending human activities plays a pivotal role in automated video analysis. This thesis seeks to tackle the challenge of detecting human activities from untrimmed videos, aiming to classify and pinpoint these activities both in their spatial and temporal dimensions. To achieve this, we propose a modular approach. We begin by developing a temporal activity detection framework, and then progressively extend the framework to support activity detection in the spatio-temporal dimension. To perform temporal activity detection, we introduce an end-to-end trainable deep learning model leveraging 3D convolutions. Additionally, we propose a novel and adaptable fusion strategy to combine both the appearance and motion information extracted from a video, using RGB and optical flow frames. Importantly, we incorporate the learning of this fusion strategy into the activity detection framework. Building upon the temporal activity detection framework, we extend it by incorporating a spatial localization module to enable activity detection both in space and time in a holistic end-to-end manner. To accomplish this, we leverage shared spatio-temporal feature maps to jointly optimize both spatial and temporal localization of activities, thus making the entire pipeline more effective and efficient. Finally, we introduce several novel techniques for modeling actor motion, specifically designed for efficient activity recognition. This is achieved by harnessing 2D pose information extracted from video frames and then representing human motion through bone movement, bone orientation, and body joint positions. Our experimental evaluations, conducted using benchmark datasets, showcase the effectiveness of the proposed temporal and spatio-temporal activity detection methods when compared to the current state-of-the-art methods. Moreover, the proposed motion representations excel in both performance and computational efficiency. Ultimately, this research shall pave the way forward towards imbuing computers with social visual intelligence, enabling them to comprehend human activities in any given time and space, opening up exciting possibilities for the future.
15

Activisee: Discrete and Continuous Activity Detection for Human Users Through Wearable Sensor-Augmented Glasses

Raychoudhury, Mrittika 03 January 2023 (has links)
No description available.
16

Development Of An Advanced Step Counting Algorithm With Integrated Activity Detection For Free Living Environments

Dolan, Paige M 01 June 2024 (has links) (PDF)
Physical activity plays a crucial role in maintaining overall health and reducing the risk of various chronic diseases. Step counting has emerged as a popular method for assessing physical activity levels, given its simplicity and ease of use. However, accurately measuring step counts in free-living environments presents significant challenges, with most activity trackers exhibiting a percent error above 20%. This study aims to address these challenges by creating a machine learning algorithm that leverages activity labels to improve step count accuracy in real-world conditions. Two approaches to balancing data were used: one employed a simpler oversampling technique, while the other adopted a more nuanced approach involving the removal of outliers. Models 1 and 2 were trained on each of these uniquely balanced datasets. Model 1 performed much better than Model 2 on testing datasets, but both achieved better than 20% error on new datasets, indicating their potential for more accurate step counting in real-world conditions. Despite challenges such as data imbalance, the study demonstrated the viability of using activity labels to enhance step counting accuracy. Future research should focus on addressing data imbalances and exploring more advanced machine learning techniques for more reliable activity monitoring.
17

Voice Activity Detection / Voice Activity Detection

Ent, Petr January 2009 (has links)
Práce pojednává o využití support vector machines v detekci řečové aktivity. V první části jsou zkoumány různé druhy příznaků, jejich extrakce a zpracování a je nalezena jejich optimální kombinace, která podává nejlepší výsledky. Druhá část představuje samotný systém pro detekci řečové aktivity a ladění jeho parametrů. Nakonec jsou výsledky porovnány s dvěma dalšími systémy, založenými na odlišných principech. Pro testování a ladění byla použita ERT broadcast news databáze. Porovnání mezi systémy bylo pak provedeno na databázi z NIST06 Rich Test Evaluations.
18

Identificação de atividade de voz baseada em vídeo

Scott, Dario 30 March 2010 (has links)
Made available in DSpace on 2015-03-05T14:01:22Z (GMT). No. of bitstreams: 0 Previous issue date: 30 / Hewlett-Packard Brasil Ltda / Atualmente, existem diversos trabalhos com as mais variadas abordagens relativas ao processamento de imagens digitais para detecção de atividade de voz (VAD). As suas aplicações perpassam diferentes áreas, como por exemplo, comandos de voz em veículos e videoconferência. A motivação deste trabalho constitui-se na construção de um algoritmo que contribua para o aperfeiçoamento das técnicas de processamento de imagens aplicadas para a detecção de atividade de voz em vídeos. A problemática envolvida já apresenta uma grande diversidade de abordagens. No entanto, o foco deste trabalho situa-se na busca de alternativas para a melhoria na extração de um modelo de cor de pele e não-pele e, a partir daí, extrair um classificador para identificar a atividade de fala com mais precisão. Algoritmos já existentes de identificação de face e de classificação dos lábios foram utilizados e aprimorados. Através da criação de patches abaixo dos olhos, foi criado um modelo para determinar as características individuais de cor de / Currently, there are several works with many di_erent approaches to image processing for detection of voice activity (VAD). Its applications cross over di_erent areas, such as voice commands in vehicles and videoconferencing. The motivation of this work consists in building an algorithm that contributes to the improvement of techniques image processing applied to detect voice activity on video. The issue already presents a great diversity of approaches. However, the focus of this work lies in _nding alternatives to improve the extraction of a skin and non-skin color model and, from there, extract a classi_er to identify the activity of speech more accurately. Existing algorithms of face detection and classi_cation of the lips were used and improved. Through the creation of patches under the eyes, a model was created to determine the individual characteristics of skin color using the mean and standard deviation of the pixels of the patches and the mouth area. The results are presented based on two approaches.
19

Routine activity extraction from local alignments in mobile phone context data

Moritz, Rick 05 February 2014 (has links) (PDF)
Humans are creatures of habit, often developing a routine for their day-to-day life. We propose a way to identify routine as regularities extracted from the context data of mobile phones. We choose Lecroq et al.'s existing state of the art algorithm as basis for a set of modifications that render it suitable for the task. Our approach searches alignments in sequences of n-tuples of context data, which correspond to the user traces of routine activity. Our key enhancements to this algorithm are exploiting the sequential nature of the data an early maximisation approach. We develop a generator of context-like data to allow us to evaluate our approach. Additionally, we collect and manually annotate a mobile phone context dataset to facilitate the evaluation of our algorithm. The results allow us to validate the concept of our approach.
20

Robustní detekce řečové aktivity / Robust Speech Activity Detection

Popková, Anna January 2019 (has links)
The aim of this work is to design and create a robust speech activity detector that is able to detect speech in different languages, in a noise environment and with music on background. I decided to solve this problem by using a neural network as a classification model that assigns one of the four possible classes - silence, speech, music, or noise to the input of audio recording. The resulting tool is able to detect the speech in at least 12 languages. Speech with musical background up to 88 % accuracy and system success on noisy data reaches from 84 % (5 dB SNR) to 88 % (20 dB SNR). This tool can be used for speech activity detection in various research areas of speech processing. The main contribution is the elimination of music, which when not eliminated, significantly increases the error rate of systems for speaker identification or speech recognition.

Page generated in 0.2583 seconds