• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 48
  • 10
  • 10
  • 10
  • 10
  • 10
  • 10
  • 6
  • 5
  • 1
  • 1
  • 1
  • Tagged with
  • 83
  • 83
  • 72
  • 25
  • 23
  • 20
  • 19
  • 19
  • 13
  • 12
  • 11
  • 10
  • 9
  • 9
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

A compiler for the LMT music transcription language/

Adler, Stuart Philip January 1974 (has links)
No description available.
62

Digital musical instruments : a design approach based on moving mechanical systems

Sinyor, Elliot. January 2006 (has links)
No description available.
63

High-level control of singing voice timbre transformations

Thibault, François January 2004 (has links)
No description available.
64

Effects of voice coding and speech rate on a synthetic speech display in a telephone information system

Herlong, David W. January 1988 (has links)
Despite the lack of formal guidelines, synthetic speech displays are used in a growing variety of applications. Telephone information systems permitting human-computer interaction from remote locations are an especially popular implementation of computer-generated speech. Currently, human factors research is needed to specify design characteristics providing usable telephone information systems as defined by task performance and user ratings. Previous research used nonintegrated tasks such as transcription of phonetic syllables, words, or sentences to assess task performance or user preference differences. This study used a computer-driven telephone information system as a real-time, human-computer interface to simulate applications where synthetic speech is used to access data. Subjects used a telephone keypad to navigate through an automated, department store database to locate and transcribe specific information messages. Because speech provides a sequential and transient information display, users may have difficulty navigating through auditory databases. One issue investigated in this study was whether use of alternating male and female voices to code different levels in the database hierarchy would improve user search performance. Other issues investigated were basic intelligibility of these male and female voices as influenced by different levels of speech rate. All factors were assessed as functions of search or transcription task performance and user preference. Analysis of transcription accuracy, search efficiency and time, and subjective ratings revealed an overall significant effect of speech rate on all groups of measures but no significant effects for voice type or coding scheme. Results were used to recommend design guidelines for developing speech displays for telephone information systems. / Master of Science
65

The effects of speech rate, message repetition, and information placement on synthesized speech intelligibility

Merva, Monica Ann 12 March 2013 (has links)
Recent improvements in speech technology have made synthetic speech a viable I/O alternative. However, little research has focused on optimizing the various speech parameters which influence system performance. This study examined the effects of speech rate, message repetition, and the placement of information in a message. Briefly, subjects heard messages generated by a speech synthesizer and were asked to transcribe what they had heard. After entering each transcription, subjects rated the perceived difficulty of the preceding message, and how confident they were of their response. The accuracy of their response, system response time, and response latency were recorded. Transcription accuracy was best for messages spoken at 150 or 180 wpm and for messages repeated either twice or three times. Words at the end of messages were transcribed more accurately than words at the beginning of messages. Response latencies were fastest at 180 wpm with 3 repetitions and rose as the number of repetitions decreased. System response times were shortest when a message was repeated only once. The subjective certainty and difficulty ratings indicated that subjects were aware of errors when incorrectly transcribing a message. These results suggest that a) message rates should lie below 210 wpm, b) a repeat feature should be included in speech interface designs, and c) important information should be contained at the end of messages. / Master of Science
66

Bird song recognition with hidden Markov models

Van der Merwe, Hugo Jacobus 03 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--Stellenbosch University, 2008. / Automatic bird song recognition and transcription is a relatively new field. Reliable automatic recognition systems would be of great benefit to further research in ornithology and conservation, as well as commercially in the very large birdwatching subculture. This study investigated the use of Hidden Markov Models and duration modelling for bird call recognition. Through use of more accurate duration modelling, very promising results were achieved with feature vectors consisting of only pitch and volume. An accuracy of 51% was achieved for 47 calls from 39 birds, with the models typically trained from only one or two specimens. The ALS pitch tracking algorithm was adapted to bird song to extract the pitch. Bird song synthesis was employed to subjectively evaluate the features. Compounded Selfloop Duration Modelling was developed as an alternative duration modelling technique. For long durations, this technique can be more computationally efficient than Ferguson stacks. The application of approximate string matching to bird song was also briefly considered.
67

An investigation of the XMOS XSl architecture as a platform for development of audio control standards

Dibley, James January 2014 (has links)
This thesis investigates the feasiblity of using a new microcontroller architecture, the XMOS XS1, in the research and development of control standards for audio distribution networks. This investigation is conducted in the context of an emerging audio distribution network standard, Ethernet Audio/Video Bridging (`Ethernet AVB'), and an emerging audio control standard, AES-64. The thesis describes these emerging standards, the XMOS XS1 architecture (including its associated programming language, XC), and the open-source implementation of an Ethernet AVB streaming audio device based on the XMOS XS1 architecture. It is shown how the XMOS XS1 architecture and its associated features, focusing on the XC language's mechanisms for concurrency, event-driven programming, and integration of C software modules, enable a powerful implementation of the AES-64 control standard. Feasibility is demonstrated by the implementation of an AES-64 protocol stack and its integration into an XMOS XS1-based Ethernet AVB streaming audio device, providing control of Ethernet AVB features and audio hardware, as well as implementations of advanced AES-64 control mechanisms. It is demonstrated that the XMOS XS1 architecture is a compelling platform for the development of audio control standards, and has enabled the implementation of AES-64 connection management and control over standards-compliant Ethernet AVB streaming audio devices where no such implementation previously existed. The research additionally describes a linear design method for applications based on the XMOS XS1 architecture, and provides a baseline implementation reference for the AES-64 control standard where none previously existed.
68

O design de som de monstros do cinema: uma cartografia dos processos de criação de identidades sonoras na construção de personagens

Ceretta, Fernanda Manzo 26 June 2018 (has links)
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2018-07-04T12:24:09Z No. of bitstreams: 1 Fernanda Manzo Ceretta.pdf: 64197918 bytes, checksum: 89e46d8729046a5162627527f8dcd7bc (MD5) / Made available in DSpace on 2018-07-04T12:24:09Z (GMT). No. of bitstreams: 1 Fernanda Manzo Ceretta.pdf: 64197918 bytes, checksum: 89e46d8729046a5162627527f8dcd7bc (MD5) Previous issue date: 2018-06-26 / Conselho Nacional de Pesquisa e Desenvolvimento Científico e Tecnológico - CNPq / This research analyzes the sound design of movie monsters, especially their voices. Chewbacca (Star Wars, 1977), Godzilla (1954) and Predator (1987) constitute our corpus. These monsters have voices composed by designers who have experimented different creation processes, using sounds generated by nature, body and manipulated objects, in order to create the sound identity of these characters. We investigate the contexts of these creation processes and the resulting sounds in their particularities to make a proposition of a method for creating the sound of monsters. Our method covers the potential sources of base sounds and other sonic characteristics such as frequencies, timbre and intensity. The research was based in the observation of the selected audio-visual materials and in the documentation available regarding the making ofs (which is vast, given the popularity of the selected monsters). The thesis is based mainly on articulations with the works of Rick Altman, Michel Chion and William Whittington, on the cinematographic sound, and of Theo Van Leeuwen in his proposition of sonorous analysis / Este trabalho analisa o sound design de monstros do cinema, sobretudo suas vozes. Chewbacca (Star Wars, 1977), Godzilla (1954) e Predador (1987) constituem o corpus da presente pesquisa. Estes monstros possuem vozes compostas por designers que experimentaram diferentes processos de criação, utilizando sons na natureza, do corpo e de objetos manipulados para criar a identidade sonora destas personagens. investigamos os contextos destes processos de criação e os sons criados, em suas particularidades, para compor uma proposta de método de composição de som de monstros, o qual abarca as potenciais fontes dos sons de base e demais características sonoras, como frequências, timbres e intensidade. A pesquisa foi feita a partir da observação dos materiais audiovisuais selecionados e do resgate da documentação disponível sobre os bastidores da criação dos mesmos (bastante vasta dada a popularidade dos monstros selecionados). A tese se baseia sobretudo em articulações com obras de Rick Altman, Michel Chion, William Whittington acerca do som cinematográfico e em Theo Van Leeuwen em sua proposição de análise sonora
69

System approach to robust acoustic echo cancellation through semi-blind source separation based on independent component analysis

Wada, Ted S. 28 June 2012 (has links)
We live in a dynamic world full of noises and interferences. The conventional acoustic echo cancellation (AEC) framework based on the least mean square (LMS) algorithm by itself lacks the ability to handle many secondary signals that interfere with the adaptive filtering process, e.g., local speech and background noise. In this dissertation, we build a foundation for what we refer to as the system approach to signal enhancement as we focus on the AEC problem. We first propose the residual echo enhancement (REE) technique that utilizes the error recovery nonlinearity (ERN) to "enhances" the filter estimation error prior to the filter adaptation. The single-channel AEC problem can be viewed as a special case of semi-blind source separation (SBSS) where one of the source signals is partially known, i.e., the far-end microphone signal that generates the near-end acoustic echo. SBSS optimized via independent component analysis (ICA) leads to the system combination of the LMS algorithm with the ERN that allows for continuous and stable adaptation even during double talk. Second, we extend the system perspective to the decorrelation problem for AEC, where we show that the REE procedure can be applied effectively in a multi-channel AEC (MCAEC) setting to indirectly assist the recovery of lost AEC performance due to inter-channel correlation, known generally as the "non-uniqueness" problem. We develop a novel, computationally efficient technique of frequency-domain resampling (FDR) that effectively alleviates the non-uniqueness problem directly while introducing minimal distortion to signal quality and statistics. We also apply the system approach to the multi-delay filter (MDF) that suffers from the inter-block correlation problem. Finally, we generalize the MCAEC problem in the SBSS framework and discuss many issues related to the implementation of an SBSS system. We propose a constrained batch-online implementation of SBSS that stabilizes the convergence behavior even in the worst case scenario of a single far-end talker along with the non-uniqueness condition on the far-end mixing system. The proposed techniques are developed from a pragmatic standpoint, motivated by real-world problems in acoustic and audio signal processing. Generalization of the orthogonality principle to the system level of an AEC problem allows us to relate AEC to source separation that seeks to maximize the independence, hence implicitly the orthogonality, not only between the error signal and the far-end signal, but rather, among all signals involved. The system approach, for which the REE paradigm is just one realization, enables the encompassing of many traditional signal enhancement techniques in analytically consistent yet practically effective manner for solving the enhancement problem in a very noisy and disruptive acoustic mixing environment.
70

"Spindex" (speech index) enhances menu navigation user experience of touch screen devices in various input gestures: tapping, wheeling, and flicking

Jeon, Myounghoon 11 November 2010 (has links)
In a large number of electronic devices, users interact with the system by navigating through various menus. Auditory menus can complement or even replace visual menus, so research on auditory menus has recently increased with mobile devices as well as desktop computers. Despite the potential importance of auditory displays on touch screen devices, little research has been attempted to enhance the effectiveness of auditory menus for those devices. In the present study, I investigated how advanced auditory cues enhance auditory menu navigation on a touch screen smartphone, especially for new input gestures such as tapping, wheeling, and flicking methods for navigating a one-dimensional menu. Moreover, I examined if advanced auditory cues improve user experience, not only for visuals-off situations, but also for visuals-on contexts. To this end, I used a novel auditory menu enhancement called a "spindex" (i.e., speech index), in which brief audio cues inform the users of where they are in a long menu. In this study, each item in a menu was preceded by a sound based on the item's initial letter. One hundred and twenty two undergraduates navigated through an alphabetized list of 150 song titles. The study was a split-plot design with manipulated auditory cue type (text-to-speech (TTS) alone vs. TTS plus spindex), visual mode (on vs. off), and input gesture style (tapping, wheeling, and flicking). Target search time and subjective workload for the TTS + spindex were lower than those of the TTS alone in all input gesture types regardless of visual type. Also, on subjective ratings scales, participants rated the TTS + spindex condition higher than the plain TTS on being 'effective' and 'functionally helpful'. The interaction between input methods and output modes (i.e., auditory cue types) and its effects on navigation behaviors was also analyzed based on the two-stage navigation strategy model used in auditory menus. Results were discussed in analogy with visual search theory and in terms of practical applications of spindex cues.

Page generated in 0.091 seconds