Global ETD Search

121	ARMAS: Active Reconstruction of Missing Audio Segments Pokharel, Sachin, Ali, Muhammad January 2021 (has links) Background: Audio signal reconstruction using machine/deep learning algorithms has been explored much more in the recent years, and it has many applications in digital signal processing. There are many research works on audio reconstruction with linear interpolation, phase coding, tone insertion techniques combined with AI models. However, there is no research work on reconstructing audio signals with the fusion of Steganoflage (an adaptive approach to image steganography) and AI models. Thus, in our thesis work, we focus on audio reconstruction combining Steganoflage and AI models. Objectives: This thesis aims to explore the possible enhancement of audio reconstruction using machine/deep learning models fusing Steganoflage technique. Furthermore, the suitable models implemented with the fusion of Steganoflage are analyzed and compared based on the performance metrics. Methods: We have conducted a systematic literature review followed by an experiment method to answer our research questions. The models implemented in the thesis are the results from a systematic literature review (SLR). In the experiments, we have fused the RF (Random Forest), SVR (Support Vector Regression), and LSTM (Long Short-Term Memory) models with Steganoflage for possible enhancement of reconstruction of lost audio signals. Then, the models were trained to estimate the possible approximate reconstructed signals. Finally, we observed the performance of the models and compared the reconstructed audio signals with the original signals (ground-truth) with four different performance metrics: Pearson linear correlation, PSNR, WPSNR, and SSIM. Results: The results from the SLR show that for machine learning models, RF and SVR models were mainly used for signals reconstructions and works well with time-series data. For deep learning models, recurrent neural network LSTM was the first choice as the survey of literature demonstrated that the model is suitable for time series forecasting. From the experiments, we found that the performance of LSTM model was better than RF and SVR models. Moreover, the reconstruction of audio signals from dropped short single region was better than that for multiple regions. Conclusions: We conclude that the Steganoflage, when fused with machine/deep learning models, enhances the lost audio signal reconstruction. Moreover, we also conclude that the LSTM model is more accurate than RF and SVR models in reconstructing the lost audio signals for a single drop region on both short and long gaps. However, we also observed that the audio reconstruction for multiple drops needs improvements considering long gaps. Furthermore, improvements can be made by exploring newer AI methods/optimization to enhance the reconstructed audio signals. Audio Reconstruction Audio Inpainting Deep Learning Machine Learning Audio Steganography Computer Sciences Datavetenskap (datalogi)
122	Bayesian Music Alignment / ベイス推定に基づく音楽アライメント Maezawa, Akira 23 March 2015 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19106号 / 情博第552号 / 新制\|\|情\|\|98(附属図書館) / 32057 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授河原達也, 教授田中利幸, 講師吉井和佳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Bayesian Inference Audio-to-score alignment Audio-to-audio alignment Subset music alignment Dereverberation 007
123	Visual Search Performance in a Dynamic Environment with 3D Auditory Cues McIntire, John Paul 18 April 2007 (has links) No description available. Psychology, Experimental visual search spatial audio virtual audio 3D audio auditory cues dynamic environment
124	Content-based audio search: from fingerprinting to semantic audio retrieval Cano Vila, Pedro 27 April 2007 (has links) Aquesta tesi tracta de cercadors d'audio basats en contingut. Específicament, tracta de desenvolupar tecnologies que permetin fer més estret l'interval semàntic o --semantic gap' que, a avui dia, limita l'ús massiu de motors de cerca basats en contingut. Els motors de cerca d'àudio fan servir metadades, en la gran majoria generada per editors, per a gestionar col.leccions d'àudio. Tot i ser una tasca àrdua i procliu a errors, l'anotació manual és la pràctica més habitual. Els mètodes basats en contingut àudio, és a dir, aquells algorismes que extreuen automàticament etiquetes descriptives de fitxers d'àudio, no són generalment suficientment madurs per a permetre una interacció semàntica. En la gran majoria, els mètodes basats en contingut treballen amb descriptors de baix nivell, mentre que els descriptors d'alt nivell estan més enllà de les possibilitats actuals. En la tesi explorem mètodes, que considerem pas previs per a atacar l'interval semàntic. / This dissertation is about audio content-based search. Specifically, it is on developing technologies for bridging the semantic gap that currently prevents wide-deployment of audio content-based search engines.Audio search engines rely on metadata, mostly human generated, to manage collections of audio assets.Even though time-consuming and error-prone, human labeling is a common practice.Audio content-based methods, algorithms that automatically extract description from audio files, are generally not mature enough to provide a user friendly representation for interacting with audio content. Mostly, content-based methods are based on low-level descriptions, while high-level or semantic descriptions are beyond current capabilities. In this thesis we explore technologies that can help close the semantic gap. classificacio d'audio ontologies identificacio d'audio recuperacio d'informacio audio cerca d'audio basada en contingut audio classification wordnet ontology audio fingerprinting audio retrieval content based audio search recuperación de información de audio identificación de audio ontologías clasificación de áudio 531/534 68 78
125	Amélioration de codecs audio standardisés avec maintien de l'interopérabilité Lapierre, Jimmy January 2016 (has links) Résumé : L’audio numérique s’est déployé de façon phénoménale au cours des dernières décennies, notamment grâce à l’établissement de standards internationaux. En revanche, l’imposition de normes introduit forcément une certaine rigidité qui peut constituer un frein à l’amélioration des technologies déjà déployées et pousser vers une multiplication de nouveaux standards. Cette thèse établit que les codecs existants peuvent être davantage valorisés en améliorant leur qualité ou leur débit, même à l’intérieur du cadre rigide posé par les standards établis. Trois volets sont étudiés, soit le rehaussement à l’encodeur, au décodeur et au niveau du train binaire. Dans tous les cas, la compatibilité est préservée avec les éléments existants. Ainsi, il est démontré que le signal audio peut être amélioré au décodeur sans transmettre de nouvelles informations, qu’un encodeur peut produire un signal amélioré sans ajout au décodeur et qu’un train binaire peut être mieux optimisé pour une nouvelle application. En particulier, cette thèse démontre que même un standard déployé depuis plusieurs décennies comme le G.711 a le potentiel d’être significativement amélioré à postériori, servant même de cœur à un nouveau standard de codage par couches qui devait préserver cette compatibilité. Ensuite, les travaux menés mettent en lumière que la qualité subjective et même objective d’un décodeur AAC (Advanced Audio Coding) peut être améliorée sans l’ajout d’information supplémentaire de la part de l’encodeur. Ces résultats ouvrent la voie à davantage de recherches sur les traitements qui exploitent une connaissance des limites des modèles de codage employés. Enfin, cette thèse établit que le train binaire à débit fixe de l’AMR WB+ (Extended Adaptive Multi-Rate Wideband) peut être compressé davantage pour le cas des applications à débit variable. Cela démontre qu’il est profitable d’adapter un codec au contexte dans lequel il est employé. / Abstract : Digital audio applications have grown exponentially during the last decades, in good part because of the establishment of international standards. However, imposing such norms necessarily introduces hurdles that can impede the improvement of technologies that have already been deployed, potentially leading to a proliferation of new standards. This thesis shows that existent coders can be better exploited by improving their quality or their bitrate, even within the rigid constraints posed by established standards. Three aspects are studied, being the enhancement of the encoder, the decoder and the bit stream. In every case, the compatibility with the other elements of the existent coder is maintained. Thus, it is shown that the audio signal can be improved at the decoder without transmitting new information, that an encoder can produce an improved signal without modifying its decoder, and that a bit stream can be optimized for a new application. In particular, this thesis shows that even a standard like G.711, which has been deployed for decades, has the potential to be significantly improved after the fact. This contribution has even served as the core for a new standard embedded coder that had to maintain that compatibility. It is also shown that the subjective and objective audio quality of the AAC (Advanced Audio Coding) decoder can be improved, without adding any extra information from the encoder, by better exploiting the knowledge of the coder model’s limitations. Finally, it is shown that the fixed rate bit stream of the AMR-WB+ (Extended Adaptive Multi-Rate Wideband) can be compressed more efficiently when considering a variable bit rate scenario, showing the need to adapt a coder to its use case. Audio numérique Télécommunications Standards de télécommunication Traitement de signal numérique Codage audio Codage par transformée Codage entropique Rehaussement audio Digital audio Telecommunications Telecommunication standards Digital signal processing Audio coding Transform coding Entropy coding Audio enhancement
126	Blind Detection Techniques For Spread Spectrum Audio Watermarking Krishna Kumar, S 10 1900 (has links) In spreads pectrum (SS)watermarking of audio signals, since the watermark acts as an additive noise to the host audio signal, the most important challenge is to maintain perceptual transparency. Human perception is a very sensitive apparatus, yet can be exploited to hide some information, reliably. SS watermark embedding has been proposed, in which psycho-acoustically shaped pseudo-random sequences are embedded directly into the time domain audio signal. However, these watermarking schemes use informed detection, in which the original signal is assumed available to the watermark detector. Blind detection of psycho-acoustically shaped SS watermarking is not well addressed in the literature. The problem is still interesting, because, blind detection is more practical for audio signals and, psycho-acoustically shaped watermarks embedding oﬀers the maximum possible watermark energy under requirements of perceptual transparency. In this thesis we study the blind detection of psycho-acoustically shaped SS watermarks in time domain audio signals. We focus on a class of watermark sequences known as random phase watermarks, where the watermark magnitude spectrum is deﬁned by the perceptual criteria and the randomness of the sequence lies in their phase spectrum. Blind watermark detectors, which do not have access to the original host signal, may seem handicapped, because an approximate watermark has to be re-derived from the watermarked signal. Since the comparison of blind detection with fully informed detection is unfair, a hypothetical detection scheme, denoted as semi-blind detection, is used as a reference benchmark. In semi-blind detection, the host signal as such is not available for detection, but it is assumed that suﬃcient information is available for deriving the exact watermark, which could be embedded in the given signal. Some reduction in performance is anticipated in blind detection over the semi-blind detection. Our experiments revealed that the statistical performance of the blind detector is better than that of the semi-blind detector. We analyze the watermark-to-host correlation (WHC) of random phase watermarks, and the results indicate that WHC is higher when a legitimate watermark is present in the audio signal, which leads to better detection performance. Based on these ﬁndings, we attempt to harness this increased correlation in order to further improve the performance. The analysis shows that uniformly distributed phase diﬀerence (between the host signal and the watermark) provides maximum advantage. This property is veriﬁed through experimentation over a variety of audio signals. In the second part, the correlated nature of audio signals is identiﬁed as a potential threat to reliable blind watermark detection, and audio pre-whitening methods are suggested as a possible remedy. A direct deterministic whitening (DDW) scheme is derived, from the frequency domain analysis of the time domain correlation process. Our experimental studies reveal that, the Savitzky-Golay Whitening (SGW), which is otherwise inferior to DDW technique, performs better when the audio signal is predominantly low pass. The novelty of this work lies in exploiting the complementary nature of the two whitening techniques and combining them to obtain a hybrid whitening (HbW) scheme. In the hybrid scheme the DDW and SGW techniques are selectively applied, based on short time spectral characteristics of the audio signal. The hybrid scheme extends the reliability of watermark detection to a wider range of audio signals. We also discuss enhancements to the HbW technique for robustness to temporal oﬀsets and ﬁltering. Robustness of SS watermark blind detection, with hybrid whitening, is determined through a set of experiments and the results are presented. It is seen that the watermarking scheme is robust to common signal processing operations such as additive noise, ﬁltering, lossy compression, etc. Sound Recordings - Security Audio Systems - Watermarking Random Phase Watermarks Spread Spectrum Audio Watermarking Blind Watermark Detection Watermark Embedding Audio Signals - Correlation Audio Pre-whitening Audio Watermarking Blind Audio Watermark Detection Increased Correlation Improved Detection Communcations Engineering
127	Digital compensation of distortion in audio systems / Digital kompensering av distorsion i ljudsystem Bengtsson, Fredrik, Berglund, Rikard January 2010 (has links) <p>The advancements of computational power in low cost FPGAs are giving the opportunityto implement real-time compensation of loudspeakers and audio systems. The need for expensive commercial audio systems is reduced when the fidelity ofmuch cheaper audio systems easily can be improved by real-time compensation. The topic of this thesis is to investigate and evaluate methods for digital compensationof distortion in audio systems. More specifically, a VHDL module isimplemented to, when necessary, alleviate the problem of drastically deterioratingfidelity of the bass appearing when the input power is too high.</p> Distortion Digital Compensation Signal Processing Digital Filters Audio Systems Amplifiers Modelling Audio Compressor Audio Limiter Electrical engineering Elektroteknik
128	Digital compensation of distortion in audio systems / Digital kompensering av distorsion i ljudsystem Bengtsson, Fredrik, Berglund, Rikard January 2010 (has links) The advancements of computational power in low cost FPGAs are giving the opportunityto implement real-time compensation of loudspeakers and audio systems. The need for expensive commercial audio systems is reduced when the fidelity ofmuch cheaper audio systems easily can be improved by real-time compensation. The topic of this thesis is to investigate and evaluate methods for digital compensationof distortion in audio systems. More specifically, a VHDL module isimplemented to, when necessary, alleviate the problem of drastically deterioratingfidelity of the bass appearing when the input power is too high. Distortion Digital Compensation Signal Processing Digital Filters Audio Systems Amplifiers Modelling Audio Compressor Audio Limiter Electrical engineering Elektroteknik
129	Audio Event Detection On Tv Broadcast Ozan, Ezgi Can 01 September 2011 (has links) (PDF) The availability of digital media has grown tremendously with the fast-paced ever-growing storage and communication technologies. As a result, today, we are facing a problem in indexing and browsing the huge amounts of multimedia data. This amount of data is impossible to be indexed or browsed by hand so automatic indexing and browsing systems are proposed. Audio Event Detection is a research area which tries to analyse the audio data in a semantic and perceptual manner, to bring a conceptual solution to this problem. In this thesis, a method for detecting several audio events in TV broadcast is proposed. The proposed method includes an audio segmentation stage to detect event boundaries. Broadcast audio is classified into 17 classes. The feature set for each event is obtained by using a feature selection algorithm to select suitable features among a large set of popular descriptors. Support Vector Machines and Gaussian Mixture Models are used as classifiers and the proposed system achieved an average recall rate of 88% for 17 different audio events. Comparing with the results in the literature, the proposed method is promising.
130	Produção de vídeo na escola : um estudo sobre processos de aprendizagem audiovisual / School video production : a study on audiovisual learning processes Miranda, Fabianna Maria Whonrath, 1977- 26 August 2018 (has links) Orientador: Nuno Cesar Pereira de Abreu / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Artes / Made available in DSpace on 2018-08-26T23:27:12Z (GMT). No. of bitstreams: 1 Miranda_FabiannaMariaWhonrath_D.pdf: 106462815 bytes, checksum: 451402fcb413b650b69fe0c23c38582e (MD5) Previous issue date: 2015 / Resumo: O presente trabalho é resultado das reflexões e análises advindas da experiência docente da pesquisadora com o ensino de produção de vídeo na 1a série do Ensino Médio por um período de quatro anos em uma escola da rede particular na cidade de Campinas ¿ SP. O objetivo foi o de avaliar aspectos da percepção dos alunos acerca do próprio aprendizado a partir da descrição dos procedimentos realizados em sala de aula e dos seus resultados. O curso de "Cinema e produção de vídeo", instituído na escola estudada como disciplina regular do currículo de Artes na 1a série do Ensino Médio em 2009, foi pensado para favorecer, nos estudantes, o desenvolvimento de habilidades e competências relacionadas às exigências de uma sociedade em que o audiovisual passou a ocupar um espaço de destaque nas relações interpessoais como ferramenta de comunicação. A escolha pelo método fenomenológico a partir do relato de experiências dos alunos pretende, portanto, tratar com certa precisão uma realidade imprecisa e dinâmica a qual demandará, cada vez mais, processos de aprendizagem variados e, também, em contínua evolução. A discussão acerca da importância da desconstrução de estratégicas pedagógicas antigas e, de certa maneira, desatualizadas em alguns aspectos, busca abordar de que modo o próprio contexto social pode influenciar novas práticas de ensino e estimular o aprimoramento de processos de aprendizagem da linguagem audiovisual . As etapas deste trabalho perpassam pela discussão bibliográfica de obras publicadas sobre produção de vídeo no espaço escolar; pela fundamentação do método fenomenológico; pelo relato das atividades pedagógicas; pela transcrição das entrevistas seguida da análise dos discursos dos alunos. Esta pesquisa pretende discutir as experiências com produção de vídeo do ponto de vista dos sujeitos envolvidos no processo de ensino-aprendizagem e, assim, poder ampliar perspectivas de trabalho com ensino de audiovisual na escola a partir da reflexão crítica sobre a prática pedagógica / Abstract: This work is the result of reflections and analysis arising from the teaching experience of the researcher in teaching video production to first-year high-school students for a period of four years at a private school network in the city of Campinas - SP. The objective was to evaluate aspects of the students' perception about their own learning from the description of the procedures performed in the classroom and their results. The course "Film and video production", studied as a regular discipline within the Arts curriculum in 2009, was set up in order to encourage in students the development of skills and competencies related to the demands of society in which the audiovisual has come to occupy a prominent space in interpersonal relations as a communication tool. The choice for the phenomenological method based on the student experience therefore aims to deal, with some precision, with an inaccurate and dynamic reality which requires increasingly varied and continuously evolving learning processes. The discussion about the importance of deconstructing pedagogical strategies that are old and, to an extent, out of date, seeks to investigate the extent to which social context can influence new teaching practices and encourage the improvement of learning processes of audiovisual language. The stages of this work include the literature discussion of works published on topics such video production; the grounds of the phenomenological method; the report of educational activities; and the transcription and analysis of student interviews. This research discusses the experience with video production from the point of view of those involved in the teaching-learning process and it could broaden the horizons of researchers seeking to work with audiovisual teaching in schools / Doutorado / Multimeios / Doutora em Multimeios Ensino audiovisual Comunicação audiovisual Audio-visual education Communication Audio-visual Education - Cooperative Audio-visual

Search results