• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 17
  • 7
  • 5
  • 1
  • 1
  • 1
  • Tagged with
  • 39
  • 39
  • 13
  • 13
  • 11
  • 10
  • 8
  • 8
  • 8
  • 7
  • 7
  • 6
  • 6
  • 6
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Vizualizace parametrů mnohakanálového zvukového systému v internetovém prohlížeči / Parameter Visualization of Multichannel Audio System in Web Browser

Lach, Martin January 2020 (has links)
This work deals with Audified Audio Processing System. This embedded system includes an Arm processor running the Linux operating system. At present, parameter control (phantom, gain) is complicated, without feedback. In this work, the creation of server - client application, which will allow easy setting of the mentioned parameters and will show their effect, is described.
22

Vizualizace parametrů mnohakanálového zvukového systému v internetovém prohlížeči / Parameter Visualization of Multichannel Audio System in Web Browser

Lach, Martin January 2020 (has links)
This work deals with Audified Audio Processing System. This embedded system includes an ARM processor running the Linux operating system. At present, parameter control (phantom, gain) is complicated, without feedback. In this work, the creation of web application, which will allow easy setting of the mentioned parameters and will show their effect, is described.
23

Teaching an Agent to Replicate Melodies by Listening : A Reinforcement Learning Approach to Generating Piano Rolls and Parameters of Physically Modeled Instruments from Target Audios / Att lära en agent att replikera melodier från gehör : En förstärkningsinlärningsmetod för att generera pianorullar och parametrar för fysiskt modellerade instrument från referensljud

Eriksson, Wille January 2022 (has links)
Reinforcement learning has seen great improvements in recent years, with new frameworks and algorithms continually being developed. Some efforts have also been made to incorporate this method into music in various ways. In this project, the prospect of using reinforcement learning to make an agent learn to replicate a piece of music using a model of an instrument is explored. Both synthesizers and physically modeled instruments, in particular the Karplus-Strong algorithm, are considered. Two reward functions are introduced to measure the similarity between two audios: one based on frequency content and another based on waveform envelope. The results suggest that audio can be successfully replicated, both using a synthesizer and the Karplus-Strong algorithm. Further research can be conducted on replicating more complex melodies and creatively composing using physical models of instruments. https://github.com/wille-eriksson/RL-instruments / Förstärkningsinlärning är ett fält som har genomgått stor utveckling under de senaste åren, då nya ramverk och algoritmer har tillgängliggjorts. Vissa försök har gjorts för att använda metoden i samband med musik. I detta projekt utforskas möjligheterna att använda förstärkningsinlärning för att lära en agent att återskapa musikstycken med modellerade instrument. Både syntar och fysiskt modellerade instrument, särskilt Karplus-Strongs algoritm, tas i beaktan. Två belöningsfunktioner presenteras för att bedöma likheten mellan två ljudsignaler: en baserad på frekvensinehåll och en annan på vågformshöljet. Resultaten antyder att ljudsignaler kan återskapas med framgång, både med syntar och med Karplus-Strongs algoritm. Fortsatt forskning kan göras för att utveckla ett ramverk som kan hantera komplexare melodier samt skapa kreativa kompositioner med ett fysiskt modellerat instrument. https://github.com/wille-eriksson/RL-instruments
24

Development of real time audio equalizer application using MATLAB App Designer

Langelaar, Johannes, Strömme Mattsson, Adam, Natvig, Filip January 2019 (has links)
This paper outlines the design of a high-precision graphic audio equalizer with digital filters in parallel, along with its implementation in MATLAB App Designer. The equalizer is comprised of 31 bands separated with a one-third octave frequency ratio, and its frequency response is controlled by 63 filters. Furthermore, the application can process audio signals, in real time, recorded by microphone and from audio files. While processing, it displays an FFT plot of the output sound, also in real time, equipped with a knob by which the refreshing pace can be adjusted. The actual frequency response proved to match the desired one accurately, but the matching is computationally demanding for the computer. An even higher accuracy would entail a computational complexity beyond the power of ordinary computers, and was thus concluded to be inappropriate. As a result, the final application manages to provide most laptops with both high precision and proper functionality.
25

Processamento de áudio em tempo real em dispositivos computacionais de alta disponibilidade e baixo custo / Real time digital audio processing using highly available, low cost devices

Bianchi, André Jucovsky 21 October 2013 (has links)
Neste trabalho foi feita uma investigação sobre a realização de processamento de áudio digital em tempo real utilizando três dispositivos com características computacionais fundamentalmente distintas porém bastante acessíveis em termos de custo e disponibilidade de tecnologia: Arduino, GPU e Android. Arduino é um dispositivo com licenças de hardware e software abertas, baseado em um microcontrolador com baixo poder de processamento, muito utilizado como plataforma educativa e artística para computações de controle e interface com outros dispositivos. GPU é uma arquitetura de placas de vídeo com foco no processamento paralelo, que tem motivado o estudo de modelos de programação específicos para sua utilização como dispositivo de processamento de propósito geral. Android é um sistema operacional para dispositivos móveis baseado no kernel do Linux, que permite o desenvolvimento de aplicativos utilizando linguagem de alto nível e possibilita o uso da infraestrutura de sensores, conectividade e mobilidade disponível nos aparelhos. Buscamos sistematizar as limitações e possibilidades de cada plataforma através da implementação de técnicas de processamento de áudio digital em tempo real e da análise da intensidade computacional em cada ambiente. / This dissertation describes an investigation about real time audio signal processing using three platforms with fundamentally distinct computational characteristics, but which are highly available in terms of cost and technology: Arduino, GPU boards and Android devices. Arduino is a device with open hardware and software licences, based on a microcontroller with low processing power, largely used as educational and artistic platform for control computations and interfacing with other devices. GPU is a video card architecture focusing on parallel processing, which has motivated the study of specific programming models for its use as a general purpose processing device. Android is an operating system for mobile devices based on the Linux kernel, which allows the development of applications using high level language and allows the use of sensors, connectivity and mobile infrastructures available on devices. We search to systematize the limitations and possibilities of each platform through the implementation of real time digital audio processing techinques and the analysis of computational intensity in each environment.
26

Processamento de áudio em tempo real em dispositivos computacionais de alta disponibilidade e baixo custo / Real time digital audio processing using highly available, low cost devices

André Jucovsky Bianchi 21 October 2013 (has links)
Neste trabalho foi feita uma investigação sobre a realização de processamento de áudio digital em tempo real utilizando três dispositivos com características computacionais fundamentalmente distintas porém bastante acessíveis em termos de custo e disponibilidade de tecnologia: Arduino, GPU e Android. Arduino é um dispositivo com licenças de hardware e software abertas, baseado em um microcontrolador com baixo poder de processamento, muito utilizado como plataforma educativa e artística para computações de controle e interface com outros dispositivos. GPU é uma arquitetura de placas de vídeo com foco no processamento paralelo, que tem motivado o estudo de modelos de programação específicos para sua utilização como dispositivo de processamento de propósito geral. Android é um sistema operacional para dispositivos móveis baseado no kernel do Linux, que permite o desenvolvimento de aplicativos utilizando linguagem de alto nível e possibilita o uso da infraestrutura de sensores, conectividade e mobilidade disponível nos aparelhos. Buscamos sistematizar as limitações e possibilidades de cada plataforma através da implementação de técnicas de processamento de áudio digital em tempo real e da análise da intensidade computacional em cada ambiente. / This dissertation describes an investigation about real time audio signal processing using three platforms with fundamentally distinct computational characteristics, but which are highly available in terms of cost and technology: Arduino, GPU boards and Android devices. Arduino is a device with open hardware and software licences, based on a microcontroller with low processing power, largely used as educational and artistic platform for control computations and interfacing with other devices. GPU is a video card architecture focusing on parallel processing, which has motivated the study of specific programming models for its use as a general purpose processing device. Android is an operating system for mobile devices based on the Linux kernel, which allows the development of applications using high level language and allows the use of sensors, connectivity and mobile infrastructures available on devices. We search to systematize the limitations and possibilities of each platform through the implementation of real time digital audio processing techinques and the analysis of computational intensity in each environment.
27

Síťový interface k detektoru klíčových slov / Network Interface for Keyword Spotting System

Skotnica, Martin Unknown Date (has links)
A considerable part of the research in computer science is dedicated to speech recognition as the speech-controlled systems become useful in many applications. One of them is the keyword spotting which makes possible to find words in audio data. Such a detector is developed at BUT Faculty of Information Technology. The goal of this work is to propose a network interface to this keyword detector based on client/server architecture. Client connects to the server and sends audio data. Server runs keyword detector with this received data and sends the result of keyword spotting back to client. Finally client visualizes the result and interact with user.
28

Categorisation of the Emotional Tone of Music using Neural Networks

Hedén Malm, Jacob, Sinclair, Kyle January 2020 (has links)
Machine categorisation of the emotional content of music is an ongoing research area. Feature description and extraction for such a vague and subjective field as emotion presents a difficulty for human-designed audioprocessing. Research into machine categorisation of music based on genrehas expanded as media companies have increased their recommendation and automation efforts, but work into categorising music based on sentiment remains lacking. We took an informed experimental method towards finding a workable solution for a multimedia company, Ichigoichie, who wished to develop a generalizable classifier on musical qualities. This consisted of first orienting ourselves within the academic literature relevant on the subject, which suggested applying spectrographic pre-processing to the sound samples, and then analyzing these visually with a convolutional neural network. To verify this method, we prototyped the model in a high level framework utilizing Python which pre-processes 10 second audio files into spectrographs and then provides these as learning data to a convolutional neural network. This network is assessed on both its categorization accuracy and its generalizability to other data sets. Our results show that the method is justifiable as a technique for providing machine categorization of music based on genre, and even provides evidence that such a method is technically feasible for commercial applications today. / Maskinkategorisering av känsloprofilen i musik är ett pågående forskningsområde. Traditionellt sett görs detta med algoritmer som är skräddarsydda för en visstyp av musik och kategoriseringsområde. En nackdel med detta är att det inte går att applicera sådana algoritmer på flera användningsområden, och att det krävs både god musikkunnighet och även tekniskt vetande för att lyckas utveckla sådana algoritmer. På grund av dessa anledningar ökar stadigt mängden av forskning runt huruvida samma ändamål går att åstadkommas med hjälp av maskininlärningstekniker, och speciellt artificiella neuronnät, en delgrupp av maskininlärning. I detta forskningsprojekt ämnade vi att fortsätta med detta forskningsområde,och i slutändan hoppas kunna besvara frågan om huruvida det går att klassificera och kategorisera musik utifrån känsloprofilen inom musiken, med hjälp av artificiella neuronnät. Vi fann genom experimentell forskning att artificiella neuronnät är en mycket lovande teknik för klassificering av musik, och uppnådde goda resultat. Metoden som användes bestådde av spektrografisk ljudprocessering, och sedan analys av dessa spektrogram med konvolutionella neuronnät, en sorts artificiella neuronnät ämnade för visuell analys.
29

基於 RGBD 影音串流之肢體表情語言表現評估 / Estimation and Evaluation of Body Language Using RGBD Data

吳怡潔, Wu, Yi Chieh Unknown Date (has links)
本論文基於具備捕捉影像深度的RGBD影音串流裝置-Kinect感測器,在簡報場域中,作為擷取簡報者肢體動作、表情、以及語言表現模式的設備。首先我們提出在特定時段內的表現模式,可以經由大眾的評估,而具有喜歡/不喜歡的性質,我們將其分別命名為Period of Like(POL)以及Period of Dislike(POD)。論文中並以三種Kinect SDK所提供的影像特徵:動畫單元、骨架關節點、以及3D臉部頂點,輔以35位評估者所提供之評估資料,以POD/POL取出的特徵模式,分析是否具有一致性,以及是否可用於未來預測。最後將研究結果開發應用於原型程式,期許這樣的預測系統,能夠為在簡報中表現不佳而困擾的人們,提點其優劣之處,以作為後續改善之依據。 / In this thesis, we capture body movements, facial expressions, and voice data of subjects in the presentation scenario using RGBD-capable Kinect sensor. The acquired videos were accessed by a group of reviewers to indicate their preferences/aversions to the presentation style. We denote the two classes of ruling as Period of Like (POL) and Period of Dislike (POD), respectively. We then employ three types of image features, namely, animation units (AU), skeletal joints, and 3D face vertices to analyze the consistency of the evaluation result, as well as the ability to classify unseen footage based on the training data supplied by 35 evaluators. Finally, we develop a prototype program to help users to identify their strength/weakness during their presentation so that they can improve their skills accordingly.
30

MDCT Domain Enhancements For Audio Processing

Suresh, K 08 1900 (has links) (PDF)
Modified discrete cosine transform (MDCT) derived from DCT IV has emerged as the most suitable choice for transform domain audio coding applications due to its time domain alias cancellation property and de-correlation capability. In the present research work, we focus on MDCT domain analysis of audio signals for compression and other applications. We have derived algorithms for linear filtering in DCT IV and DST IV domains for symmetric and non-symmetric filter impulse responses. These results are also extended to MDCT and MDST domains which have the special property of time domain alias cancellation. We also derive filtering algorithms for the DCT II and DCT III domains. Comparison with other methods in the literature shows that, the new algorithm developed is computationally MAC efficient. These results are useful for MDCT domain audio processing such as reverb synthesis, without having to reconstruct the time domain signal and then perform the necessary filtering operations. In audio coding, the psychoacoustic model plays a crucial role and is used to estimate the masking thresholds for adaptive bit-allocation. Transparent quality audio coding is possible if the quantization noise is kept below the masking threshold for each frame. In the existing methods, the masking threshold is calculated using the DFT of the signal frame separately for MDCT domain adaptive quantization. We have extended the spectral integration based psychoacoustic model proposed for sinusoidal modeling of audio signals to the MDCT domain. This has been possible because of the detailed analysis of the relation between DFT and MDCT; we interpret the MDCT coefficients as co-sinusoids and then apply the sinusoidal masking model. The validity of the masking threshold so derived is verified through listening tests as well as objective measures. Parametric coding techniques are used for low bit rate encoding of multi-channel audio such as 5.1 format surround audio. In these techniques, the surround channels are synthesized at the receiver using the analysis parameters of the parametric model. We develop algorithms for MDCT domain analysis and synthesis of reverberation. Integrating these ideas, a parametric audio coder is developed in the MDCT domain. For the parameter estimation, we use a novel analysis by synthesis scheme in the MDCT domain which results in better modeling of the spatial audio. The resulting parametric stereo coder is able to synthesize acceptable quality stereo audio from the mono audio channel and a side information of approximately 11 kbps. Further, an experimental audio coder is developed in the MDCT domain incorporating the new psychoacoustic model and the parametric model.

Page generated in 0.1903 seconds