Global ETD Search

1	A Digital Waveform Synthesizer Using Walsh Functions Brown, Owen 11 1900 (has links) This thesis describes the design of a digital waveform synthesizer based on the Walsh series representation of a signal. By designing the unit to operate serially, simplicity and economy have been achieved. Although basically meant to be used as a speech synthesizer to be interfaced to a computer, the unit can operate independently as a low frequency function generator capable of producing essentially any finite waveform having a frequency from zero to 200Hz. The mathematics behind the Walsh Series is developed and parameters are adjusted to suit speech synthesis by a short investigation ot the properties of speech. Evolution of the hardware design, including detailed analysis of the final circuitry, is also given. Sources of error are investigated and compared to error measurements made from basic waveforms generated by the synthesizer. Finally, a discussion of potential uses of the synthesizer is included. / <p>This thesis describes the design of a digital waveform synthesizer based on signal representation by the Walsh series. The evolution of the machine design is given, along with a short error analysis. The instrument was constructed and preliminary measurements indicate output waveforms well within the bounds given by error analysis. </p> / Thesis / Master of Engineering (ME) speech synthesizer Walsh Functions
2	Use of Vocal Prosody to Express Emotions in Robotic Speech Crumpton, Joe 14 August 2015 (has links) Vocal prosody (pitch, timing, loudness, etc.) and its use to convey emotions are essential components of speech communication between humans. The objective of this dissertation research was to determine the efficacy of using varying vocal prosody in robotic speech to convey emotion. Two pilot studies and two experiments were performed to address the shortcomings of previous HRI research in this area. The pilot studies were used to determine a set of vocal prosody modification values for a female voice model using the MARY speech synthesizer to convey the emotions: anger, fear, happiness, and sadness. Experiment 1 validated that participants perceived these emotions along with a neutral vocal prosody at rates significantly higher than chance. Four of the vocal prosodies (anger, fear, neutral, and sadness) were recognized at rates approaching the recognition rate (60%) of emotions in person to person speech. During Experiment 2 the robot led participants through a creativity test while making statements using one of the validated emotional vocal prosodies. The ratings of the robot’s positive qualities and the creativity scores by the participant group that heard nonnegative vocal prosodies (happiness, neutral) did not significantly differ from the ratings and scores of the participant group that heard the negative vocal prosodies (anger, fear, sadness). Therefore, Experiment 2 failed to show that the use of emotional vocal prosody in a robot’s speech influenced the participants’ appraisal of the robot or the participants’ performance on this specific task. At this time robot designers and programmers should not expect that vocal prosody alone will have a significant impact on the acceptability or the quality of human-robot interactions. Further research is required to show that multi-modal (vocal prosody along with facial expressions, body language, or linguistic content) expressions of emotions by robots will be effective at improving human-robot interactions. human-robot interaction robot speech synthesizer
3	Hybrid Concatenated-Formant Expressive Speech Synthesizer For Kinesensic Voices Chandra, Nishant 05 May 2007 (has links) Traditional and commercial speech synthesizers are incapable of synthesizing speech with proper emotion or prosody. Conveying prosody in artificially synthesized speech is difficult because of extreme variability in human speech. An arbitrary natural language sentence can have different meanings, depending upon the speaker, speaking style, context, and many other factors. Most concatenated speech synthesizers use phonemes, which are phonetic units defined by the International Phonetic Alphabet (IPA). The 50 phonemes in English are standardized and unique units of sound, but not expression. An earlier work proposed the analogy between speech and music ? ?speech is music, music is speech.? The speech data obtained from the master practitioners, who are trained in kinesensic voice, is marked on a five level intonation scale, which is similar to the music scale. From this speech data, 1324 unique expressive units, called expressemes®, are identified. The expressemes consist of melody and rhythm, which, in digital signal processing, is analogous to pitch, duration and energy of the signal. The expressemes have less acoustic and phonetic variability than phonemes, so they better convey the prosody. The goal is to develop a speech synthesizer which exploits the prosodic content of expressemes in order to synthesize expressive speech, with a small speech database. To create a reasonably small database that captures multiple expressions is a challenge because there may not be a complete set of speech segments available to create an emotion. Methods are suggested whereby acoustic mathematical modeling is used to create missing prosodic speech segments from the base prosody unit. New concatenatedormant hybrid speech synthesizer architecture is developed for this purpose. A pitch-synchronous time-varying frequency-warped wavelet transform based prosody manipulation algorithm is developed for transformation between prosodies. A time-varying frequency-warping transform is developed to smoothly concatenate the temporal and spectral parameters of adjacent expressemes to create intelligible speech. Additionally, issues specific to expressive speech synthesis using expressemes are resolved for example, Ergodic Hidden Markov Model based expresseme segmentation, model creation for F0 and segment duration, and target and join cost calculation. The performance of the hybrid synthesizer is measured against a commercially available synthesizer using objective and perceptual evaluations. Subjects consistently rated the hybrid synthesizer better in five different perceptual tests. 70% of speakers rated the hybrid synthesis as more expressive, and 72% preferred it over the commercial synthesizer. The hybrid synthesizer also got a comparable mean opinion score. Emotional speech synthesis Speech morphing Time-varying frequency-warping Expressive speech synthesizer TTS

1

Page generated in 0.0707 seconds