• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 17
  • 4
  • 4
  • 2
  • Tagged with
  • 35
  • 35
  • 35
  • 10
  • 7
  • 6
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

A Unified Statistical Approach to Fast and Robust Multichannel Speech Separation and Dereverberation / 高速かつ頑健な多チャンネル音声分離・残響除去のための統合的・統計的アプローチ

Sekiguchi, Kouhei 23 March 2021 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23309号 / 情博第745号 / 新制||情||127(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)准教授 吉井 和佳, 教授 河原 達也, 教授 西野 恒, 教授 田中 利幸 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
12

Design of a Programmable Four-Preset Guitar Pedal

Trombley, Michael January 2017 (has links)
No description available.
13

Applications of Fourier Analysis to Audio Signal Processing: An Investigation of Chord Detection Algorithms

Lenssen, Nathan 01 January 2013 (has links)
The discrete Fourier transform has become an essential tool in the analysis of digital signals. Applications have become widespread since the discovery of the Fast Fourier Transform and the rise of personal computers. The field of digital signal processing is an exciting intersection of mathematics, statistics, and electrical engineering. In this study we aim to gain understanding of the mathematics behind algorithms that can extract chord information from recorded music. We investigate basic music theory, introduce and derive the discrete Fourier transform, and apply Fourier analysis to audio files to extract spectral data.
14

Computationally efficient methods for polyphonic music transcription

Pertusa, Antonio 09 July 2010 (has links)
Este trabajo propone una serie de métodos eficientes para convertir una señal de audio musical polifónica (WAV, MP3) en una partitura (MIDI).
15

Music complexity: a multi-faceted description of audio content

Streich, Sebastian 21 February 2007 (has links)
Esta tesis propone un juego de algoritmos que puede emplearse para computar estimaciones de las distintas facetas de complejidad que ofrecen señales musicales auditivas. Están enfocados en los aspectos de acústica, ritmo, timbre y tonalidad. Así pues, la complejidad musical se entiende aquí en el nivel más basto del común acuerdo entre oyentes humanos. El objetivo es obtener juicios de complejidad mediante computación automática que resulten similares al punto de vista de un oyente ingenuo. La motivación de la presente investigación es la de mejorar la interacción humana con colecciones de música digital. Según se discute en la tesis,hay toda una serie de tareas a considerar, como la visualización de una colección, la generación de listas de reproducción o la recomendación automática de música. A través de las estimaciones de complejidad musical provistas por los algoritmos descritos, podemos obtener acceso a un nivel de descripción semántica de la música que ofrecerá novedosas e interesantes soluciones para estas tareas. / This thesis proposes a set of algorithms that can be used to compute estimates of music complexity facets from musical audio signals. They focus on aspects of acoustics, rhythm, timbre, and tonality. Music complexity is thereby considered on the coarse level of common agreement among human listeners. The target is to obtain complexity judgments through automatic computation that resemble a naive listener's point of view. The motivation for the presented research lies in the enhancement of human interaction with digital music collections. As we will discuss, there is a variety of tasks to be considered, such as collection visualization, play-list generation, or the automatic recommendation of music. Through the music complexity estimates provided by the described algorithms we can obtain access to a level of semantic music description, which allows for novel and interesting solutions of these tasks.
16

Musical expectation modelling from audio : a causal mid-level approach to predictive representation and learning of spectro-temporal events

Hazan, Amaury 16 July 2010 (has links)
We develop in this thesis a computational model of music expectation, which may be one of the most important aspects in music listening. Many phenomenons related to music listening such as preference, surprise or emo- tions are linked to the anticipatory behaviour of listeners. In this thesis, we concentrate on a statistical account to music expectation, by modelling the processes of learning and predicting spectro-temporal regularities in a causal fashion. The principle of statistical modelling of expectation can be applied to several music representations, from symbolic notation to audio signals. We first show that computational learning architectures can be used and evaluated to account behavioral data concerning auditory perception and learning. We then propose a what/when representation of musical events which enables to sequentially describe and learn the structure of acoustic units in musical audio signals. The proposed representation is applied to describe and anticipate timbre features and musical rhythms. We suggest ways to exploit the properties of the expectation model in music analysis tasks such as structural segmentation. We finally explore the implications of our model for interactive music applications in the context of real-time transcription, concatenative synthesis, and visualization. / Esta tesis presenta un modelo computacional de expectativa musical, que es un aspecto muy importante de como procesamos la música que oímos. Muchos fenómenos relacionados con el procesamiento de la música están vinculados a una capacidad para anticipar la continuación de una pieza de música. Nos enfocaremos en un acercamiento estadístico de la expectativa musical, modelando los procesos de aprendizaje y de predicción de las regularidades espectro-temporales de forma causal. El principio de modelado estadístico de la expectativa se puede aplicar a varias representaciones de estructuras musicales, desde las notaciones simbólicas a la señales de audio. Primero demostramos que ciertos algoritmos de aprendizaje de secuencias se pueden usar y evaluar en el contexto de la percepción y el aprendizaje de secuencias auditivas. Luego, proponemos una representación, denominada qué/cuándo, para representar eventos musicales de una forma que permite describir y aprender la estructura secuencial de unidades acústicas en señales de audio musical. Aplicamos esta representación para describir y anticipar características tímbricas y ritmos. Sugerimos que se pueden explotar las propiedades del modelo de expectativa para resolver tareas de análisis como la segmentación estructural de piezas musicales. Finalmente, exploramos las implicaciones de nuestro modelo a la hora de definir nuevas aplicaciones en el contexto de la transcripción en tiempo real, la síntesis concatenativa y la visualización.
17

Learning representations of speech from the raw waveform / Apprentissage de représentations de la parole à partir du signal brut

Zeghidour, Neil 13 March 2019 (has links)
Bien que les réseaux de neurones soient à présent utilisés dans la quasi-totalité des composants d’un système de reconnaissance de la parole, du modèle acoustique au modèle de langue, l’entrée de ces systèmes reste une représentation analytique et fixée de la parole dans le domaine temps-fréquence, telle que les mel-filterbanks. Cela se distingue de la vision par ordinateur, un domaine où les réseaux de neurones prennent en entrée les pixels bruts. Les mel-filterbanks sont le produit d’une connaissance précieuse et documentée du système auditif humain, ainsi que du traitement du signal, et sont utilisées dans les systèmes de reconnaissance de la parole les plus en pointe, systèmes qui rivalisent désormais avec les humains dans certaines conditions. Cependant, les mel-filterbanks, comme toute représentation fixée, sont fondamentalement limitées par le fait qu’elles ne soient pas affinées par apprentissage pour la tâche considérée. Nous formulons l’hypothèse qu’apprendre ces représentations de bas niveau de la parole, conjontement avec le modèle, permettrait de faire avancer davantage l’état de l’art. Nous explorons tout d’abord des approches d’apprentissage faiblement supervisé et montrons que nous pouvons entraîner un unique réseau de neurones à séparer l’information phonétique de celle du locuteur à partir de descripteurs spectraux ou du signal brut et que ces représentations se transfèrent à travers les langues. De plus, apprendre à partir du signal brut produit des représentations du locuteur significativement meilleures que celles d’un modèle entraîné sur des mel-filterbanks. Ces résultats encourageants nous mènent par la suite à développer une alternative aux mel-filterbanks qui peut être entraînée à partir des données. Dans la seconde partie de cette thèse, nous proposons les Time-Domain filterbanks, une architecture neuronale légère prenant en entrée la forme d’onde, dont on peut initialiser les poids pour répliquer les mel-filterbanks et qui peut, par la suite, être entraînée par rétro-propagation avec le reste du réseau de neurones. Au cours d’expériences systématiques et approfondies, nous montrons que les Time-Domain filterbanks surclassent systématiquement les melfilterbanks, et peuvent être intégrées dans le premier système de reconnaissance de la parole purement convolutif et entraîné à partir du signal brut, qui constitue actuellement un nouvel état de l’art. Les descripteurs fixes étant également utilisés pour des tâches de classification non-linguistique, pour lesquelles elles sont d’autant moins optimales, nous entraînons un système de détection de dysarthrie à partir du signal brut, qui surclasse significativement un système équivalent entraîné sur des mel-filterbanks ou sur des descripteurs de bas niveau. Enfin, nous concluons cette thèse en expliquant en quoi nos contributions s’inscrivent dans une transition plus large vers des systèmes de compréhension du son qui pourront être appris de bout en bout. / While deep neural networks are now used in almost every component of a speech recognition system, from acoustic to language modeling, the input to such systems are still fixed, handcrafted, spectral features such as mel-filterbanks. This contrasts with computer vision, in which a deep neural network is now trained on raw pixels. Mel-filterbanks contain valuable and documented prior knowledge from human auditory perception as well as signal processing, and are the input to state-of-the-art speech recognition systems that are now on par with human performance in certain conditions. However, mel-filterbanks, as any fixed representation, are inherently limited by the fact that they are not fine-tuned for the task at hand. We hypothesize that learning the low-level representation of speech with the rest of the model, rather than using fixed features, could push the state-of-the art even further. We first explore a weakly-supervised setting and show that a single neural network can learn to separate phonetic information and speaker identity from mel-filterbanks or the raw waveform, and that these representations are robust across languages. Moreover, learning from the raw waveform provides significantly better speaker embeddings than learning from mel-filterbanks. These encouraging results lead us to develop a learnable alternative to mel-filterbanks, that can be directly used in replacement of these features. In the second part of this thesis we introduce Time-Domain filterbanks, a lightweight neural network that takes the waveform as input, can be initialized as an approximation of mel-filterbanks, and then learned with the rest of the neural architecture. Across extensive and systematic experiments, we show that Time-Domain filterbanks consistently outperform melfilterbanks and can be integrated into a new state-of-the-art speech recognition system, trained directly from the raw audio signal. Fixed speech features being also used for non-linguistic classification tasks for which they are even less optimal, we perform dysarthria detection from the waveform with Time-Domain filterbanks and show that it significantly improves over mel-filterbanks or low-level descriptors. Finally, we discuss how our contributions fall within a broader shift towards fully learnable audio understanding systems.
18

Robust Audio Scene Analysis for Rescue Robots / レスキューロボットのための頑健な音環境理解

Bando, Yoshiaki 26 March 2018 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21209号 / 情博第662号 / 新制||情||114(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 鹿島 久嗣, 教授 田中 利幸, 講師 吉井 和佳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
19

Aural servo : towards an alternative approach to sound localization for robot motion control / Asservissement sonore : vers une alternative à la localisation de source pour la commande de robot

Magassouba, Aly 05 December 2016 (has links)
Cette thèse s'intéresse au développement de lois de commande basées sur la perception auditive. Dans le domaine de l'audition robotique, le contrôle du robot à partir d'informations auditives est généralement basé sur des approches de localisation de source sonore. Cependant, la localisation de source en conditions réelles est un tâche complexe à résoudre. En environnement intérieur, les perturbations causées par le bruit, la réverbération ou même la structure du robot peuvent altérer le processus de localisation. Cette tâche de localisation devient encore plus complexe si la source et/ou le robot sont en mouvement. Aujourd'hui, en se restreignant aux systèmes binauraux, la localisation sonore en environnement réel n'est pas encore réalisable de manière robuste. A l'opposé, nous développons dans cette thèse une commande référencée capteurs, l'asservissement sonore, qui ne nécessite pas de localiser la source. Le mouvement du robot est directement reliée à la perception auditive: une tâche de positionnement est réalisée par une boucle de commande, où le mouvement du robot est régi par la dynamique d'indices sonores de bas niveau. Les résultats expérimentaux dans différentes conditions acoustiques et sur différentes plates-formes robotiques confirment la pertinence de cette approche en condition réelle. / This thesis is concerned about the development of a control framework based on auditory perception. In general, in robot audition, the motion control of a robot using hearing sense is based on sound source localization approaches. However, sound source localization under realistic conditions is a significant challenge to solve. In indoor environment perturbations caused by noise, reverberation or even the structure of the robot may alter the localization process. When considering dynamic scenes where the robot and/or the sound source might move, the degree of complexity of source localization raises to a higher level. As a result, sound source localization considering binaural setup is not achievable yet in real-world environments. By contrast, we develop in this thesis a sensor-based control approach, aural servo, that does not require to localize the source. The motion of the robot is straightly connected to the auditory perception: a positioning task is performed through a feedback loop where the motion of the robot is governed by the dynamic of low-level auditory features. Experimental results in various acoustic conditions and robotic platforms confirm the relevance of this approach for real-world environments.
20

Futuristic Teleconfernecing / Futuristisk Teleconfernecing

Mallavarapu, Haritha January 2012 (has links)
Majorly the intension behind the engineering besides expecting cool and futuristic is to get appropriate eye contact and emotions back into the teleconferencing domain that two dimensional setups simply cannot provide. Under this system, the distant participant can make clear visual communication with particular people in her or his own frame of perspective. Teleconferencing is a communication technology that allows users at two or more different localizations to interact by making a face-to-face assembling environment. TC systems carry both audio, video and data streams throughout the session it has been gaining popularity in all government sectors. From the most recent demonstration of such a fantast manner of teleconferencing from university of southern California, “Attaining visual communication in a One-to-Many 3D Video Teleconferencing System”, receive a 3D teleconferencing developing a 3D teleconferencing is not only concerning the video but also experiencing a 3D audio by users. A 3D audio system can be described as a reliable audio captured by positioning of speakers. In this thesis we effort to develop a 3D audio system where two microphones and two speakers are used. This structure designed based on the behavior of the human ear while capturing sounds. I studied different usable methods for such structure and then I designed a new system which will be robust and friendly user. The idea of this new system from the scientist Zuccarelli’s theory(1983) which he said that human ear not only capture the sounds it emits sounds as well, and he designed holophonic for the recording sounds from human ear in scientific manner but he did not reveal. I took the concept from him then I captured all the positions of sounds in spherical form. I found that the sound is coming from which direction depending on the pattern of the sound signal; to capture the sounds and to find the directions I used interference and diffraction of the head. An ideal microphone arrangement therefore should be able to guide maximum directivity towards speech no matter which direction it initiates. Directional microphone technology has been used in hearing instruments since the late 1960s, and has been shown to effectively improve speech understanding in background noise. In a futurist implementation of directional microphones system can be interested for industrial and medical applications as well. / In this thesis I have taken the reference of 3D video teleconference by Southern California to design 3D audio teleconference. For teleconference only video is not clear, both 3D audio and video give very good communication, expressions, emotions virtually like the remote people are residing just beside us. I have implemented one structure using two microphones and two speakers, I have implemented this structure in real time using Matlab and done experiments practically. I fixed two directional microphones at distance of 17 cm apart. With one speaker I sent signal at a frequency of 2.5 KHz and the positions are varies in spherical form and observed all positions frequency spectrums and signal patterns by using phase delays. Then I have taken two speakers one is just nearer to the microphone to capture the sound coming from microphone that is fixed at 1.25 KHZ. The other speaker is at 2.5 KHz and varies in spherical form, observed all the positions magnitudes, spectrums and patterns. Instead of placing one microphone just nearer to the microphone I just kept one obstacle between microphone and speaker with fixed frequency and the other speaker is again varies spherically and observed all the positions spectrums and patterns. This is called head diffraction. Finally I found all the variations in all directions in signal strength, pattern and in spectrums. I got very great differences in two positions in front and back. I implemented 3d space for Audio Teleconference. From the above results I have concluded as follows. Here I have compared my results before interference and after interference. Before interference I have used one speaker (2.5 KHz) and two microphones and tested signal level in front and back positions. The signal strength in the front position is stronger than the than the back position. In this stage I could not achieve same signal strengths in front and back positions. To achieve this I have chosen interference with two speakers .One speaker is placed at a fixed position near to the microphones with constant emitting frequency (1.25 KHz) and other speaker is moving in all directions. In this method again I compared my signal in front and back positions. Here the signal strength is almost same in both positions. Finally I have tried to implement same method with head diffraction. In this method again the signal strength is fluctuated in front and back positions. Finally the implemented method is best method for audio teleconferencing room. Using this method we can communicate from any direction of the teleconference room. No need to sit exactly nearer to the microphone. The audio signal strength is almost similar from all the directions of the teleconference room. If there are any obstacles in the teleconference room this method will not be successful.

Page generated in 0.5494 seconds