Global ETD Search

441	Human-machine collaboration for rapid speech transcription Roy, Brandon C. (Brandon Cain) January 2007 (has links) Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007. / Includes bibliographical references (p. 121-127). / Inexpensive storage and sensor technologies are yielding a new generation of massive multimedia datasets. The exponential growth in storage and processing power makes it possible to collect more data than ever before, yet without appropriate content annotation for search and analysis such corpora are of little use. While advances in data mining and machine learning have helped to automate some types of analysis, the need for human annotation still exists and remains expensive. The Human Speechome Project is a heavily data-driven longitudinal study of language acquisition. More than 100,000 hours of audio and video recordings have been collected over a two year period to trace one child's language development at home. A critical first step in analyzing this corpus is to obtain high quality transcripts of all speech heard and produced by the child. Unfortunately, automatic speech transcription has proven to be inadequate for these recordings, and manual transcription with existing tools is extremely labor intensive and therefore expensive. A new human-machine collaborative system for rapid speech transcription has been developed which leverages both the quality of human transcription and the speed of automatic speech processing. Machine algorithms sift through the massive dataset to find and segment speech. The results of automatic analysis are handed off to humans for transcription using newly designed tools with an optimized user interface. The automatic algorithms are tuned to optimize human performance, and errors are corrected by the human and used to iteratively improve the machine performance. When compared with other popular transcription tools, the new system is three- to six-fold faster, while preserving transcription quality. When applied to the Speechome audio corpus, over 100 hours of multitrack audio can be transcribed in about 12 hours by a single human transcriber. / by Brandon C. Roy. / S.M.
442	Bayesian models for visual information retrieval Vasconcelos, Nuno Miguel Borges de Pinho Cruz de January 2000 (has links) Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2000. / Includes bibliographical references (leaves 192-208). / This thesis presents a unified solution to visual recognition and learning in the context of visual information retrieval. Realizing that the design of an effective recognition architecture requires careful consideration of the interplay between feature selection, feature representation, and similarity function, we start by searching for a performance criteria that can simultaneously guide the design of all three components. A natural solution is to formulate visual recognition as a decision theoretical problem, where the goal is to minimize the probability of retrieval error. This leads to a Bayesian architecture that is shown to generalize a significant number of previous recognition approaches, solving some of the most challenging problems faced by these: joint modeling of color and texture, objective guidelines for controlling the trade-off between feature transformation and feature representation, and unified support for local and global queries without requiring image segmentation. The new architecture is shown to perform well on color, texture, and generic image databases, providing a good trade-off between retrieval accuracy, invariance, perceptual relevance of similarity judgments, and complexity. Because all that is needed to perform optimal Bayesian decisions is the ability to evaluate beliefs on the different hypothesis under consideration, a Bayesian architecture is not restricted to visual recognition. On the contrary, it establishes a universal recognition language (the language of probabilities) that provides a computational basis for the integration of information from multiple content sources and modalities. In result, it becomes possible to build retrieval systems that can simultaneously account for text, audio, video, or any other content modalities. Since the ability to learn follows from the ability to integrate information over time, this language is also conducive to the design of learning algorithms. We show that learning is, indeed, an important asset for visual information retrieval by designing both short and long-term learning mechanisms. Over short time scales (within a retrieval session), learning is shown to assure faster convergence to the desired target images. Over long time scales (between retrieval sessions), it allows the retrieval system to tailor itself to the preferences of particular users. In both cases, all the necessary computations are carried out through Bayesian belief propagation algorithms that, although optimal in a decision-theoretic sense, are extremely simple, intuitive, and easy to implement. / by Nuno Miguel Borges de Pinho Cruz de Vasconcelos. / Ph.D.
443	Digital technology for conviviality : making the most of students' energy and imagination in learning environments Sipitakiat, Arnan, 1974- January 2001 (has links) Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2001. / Includes bibliographical references (p. 123-125). / This thesis contributes to the body of research on constructionist philosophy. It expands the conceptual framework to a broader scale by linking constructionism to Ivan Illich's notion of conviviality. An emphasis on developing convivial learning environments has been made. The learning activities were developed with a special highlight on the idea of emergent design. The emphasis on conviviality and emergent design allowed a systematic and theorized framework to identify and discuss the pattern in the developmental process of learning activities, which is an area in the constructionist framework that needs more study. I gave special emphasis on learning activities that involve tool construction. I show how the making of tools could strengthen conviviality. I present a concept of dynamic equilibrium that allows different methods of learning and teaching to intertwine. I present a case study based on a five-week fieldwork conducted at a rural school of northern Thailand. / by Arnan Sipitakiat. / S.M.
444	Narrative guidance of interactivity Galyean, Tinsley Azariah January 1995 (has links) Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995. / Includes bibliographical references. / by Tinsley Azariah Galyean, III. / Ph.D.
445	They have their own thoughts : children's learning of computational ideas from a cultural perspective Hooper, Paula Kay, 1961- January 1998 (has links) Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1998. / Includes bibliographical references (leaves 216-226). / Paula Kay Hooper. / Ph.D.
446	Viewpoints on demand : tailoring the presentation of opinions in video Houbart, Gilberte January 1994 (has links) Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1994. / Includes bibliographical references (p. 93-95). / by Gilberte Houbart. / M.S.
447	A video browser that learns by example Wachman, Joshua Seth January 1996 (has links) Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1996. / Includes bibliographical references (leaves 72-74). / by Joshua Seth Wachman. / M.S.
448	Introducing liquid haptics in high bandwidth human computer interfaces White, Tom, 1971- January 1998 (has links) Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1998. / Includes bibliographical references (leaves 86-91). / Tom White. / M.S.
449	3-D audio using loudspeakers Gardner, William G January 1997 (has links) Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1997. / Includes bibliographical references (p. 145-153). / by William G. Gardner. / Ph.D.
450	Reusing code by reasoning about its purpose Arnold, Kenneth Charles January 2010 (has links) Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2010. / Cataloged from PDF version of thesis. / Includes bibliographical references (p. 103-105). / When programmers face unfamiliar or challenging tasks, code written by others could give them inspiration or reusable pieces. But how can they find code appropriate for their goals? This thesis describes a programming interface, called Zones, that connects code with descriptions of purpose, encouraging annotation, sharing, and reuse of code. The backend, called ProcedureSpace, reasons jointly over both the words that people used to describe code fragments and syntactic features derived from static analysis of that code to enable searching for code given purpose descriptions or vice versa. It uses a technique called Bridge Blending to do joint inference across data of many types, including using domain-specific and commonsense background knowledge to help understand different ways of describing goals. Since Zones uses the same interface for searching as for annotating, users can leave searches around as annotations, even if the search fails, which helps the system learn from user interaction. This thesis describes the design, implementation, and evaluation of the Zones and ProcedureSpace system, showing that reasoning jointly over natural language and programming language helps programmers reuse code. / by Kenneth Charles Arnold. / S.M.

Search results