Global ETD Search

61	Evaluation of the color image and video processing chain and visual quality management for consumer systems / Sarkar, Abhijit. January 2008 (has links) Thesis (M.S.)--Rochester Institute of Technology, 2008. / Typescript. Includes bibliographical references (leaves 188-201).
62	Reconfigurable hardware for color space conversion / Patil, Sreenivas. January 2008 (has links) Thesis (M.S.)--Rochester Institute of Technology, 2008. / Typescript. Includes bibliographical references (leaves 29-32).
63	Youth culture and the struggle for social space the Nigerian video films / Ugor, Paul Ushang. January 2009 (has links) Thesis (Ph.D.)--University of Alberta, 2009. / Title from pdf file main screen (viewed on July 31, 2009). "A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor of Philosophy, Department of English and Film Studies." Includes bibliographical references.
64	Modality Bridging and Unified Multimodal Understanding Akbari, Hassan January 2022 (has links) Multimodal understanding is a vast realm of research that covers multiple disciplines. Hence, it requires a correct understanding of the goal in a generic multimodal understanding research study. The definition of modalities of interest is important since each modality requires its own considerations. On the other hand, it is important to understand whether these modalities should be complimentary to each other or have significant overlap in terms of the information they carry. For example, most of the modalities in biological signals do not have significant overlap with each other, yet they can be used together to improve the range and accuracy of diagnoses. An extreme example of two modalities that have significant overlap is an instructional video and its corresponding instructions in detailed texts. In this study, we focus on multimedia, which includes image, video, audio, and text about real world everyday events, mostly focused on human activities. We narrow our study to the important direction of common space learning since we want to bridge between different modalities using the overlap that a given pair of modalities have.There are multiple applications which require a strong common space to be able to perform desirably. We choose image-text grounding, video-audio autoencoding, video-conditioned text generation, and video-audio-text common space learning for semantic encoding. We examine multiple ideas in each direction and achieve important conclusions. In image-text grounding, we learn that different levels of semantic representations are helpful to achieve a thorough common space that is representative of two modalities. In video-audio autoencoding, we observe that reconstruction objectives can help with a representative common space. Moreover, there is an inherent problem when dealing with multiple modalities at the same time, and that is different levels of granularity. For example, the sampling rate and granularity of video is much higher and more complicated compared to audio. Hence, it might be more helpful to find a more semantically abstracted common space which does not carry redundant details, especially considering the temporal aspect of video and audio modalities. In video-conditioned text generation, we examine the possibility of encoding a video sequence using a Transformer (and later decoding the captions using a Transformer decoder). We further explore the possibility of learning latent states for storing real-world concepts without supervision. Using the observations from these three directions, we propose a unified pipeline based on the Transformer architecture to examine whether it is possible to train a (true) unified pipeline on raw multimodal data without supervision in an end-to-end fashion. This pipeline eliminates ad-hoc feature extraction methods and is independent of any previously trained network, making it simpler and easier to use. Furthermore, since it only utilizes one architecture, which enables us to move towards even more simplicity. Hence, we take an ambitious step forward and further unify this pipeline by sharing only one backbone among four major modalities: image, video, audio, and text. We show that it is not only possible to achieve this goal, but we further show the inherent benefits of such pipeline. We propose a new research direction under multimodal understanding and that is Unified Multimodal Understanding. This study is the first that examines this idea and further pushes its limit by scaling up to multiple tasks, modalities, and datasets. In a nutshell, we examine different possibilities for bridging between a pair of modalities in different applications and observe several limitations and propose solutions for them. Using these observations, we provide a unified and strong pipeline for learning a common space which could be used for many applications. We show that our approaches perform desirably and significantly outperform state-of-the-art in different downstream tasks. We set a new baseline with competitive performance for our proposed research direction, Unified Multimodal Understanding. Computer science Artificial intelligence Multimedia communications Video recordings Sound recordings Electric transformers
65	Nowhere Landscape, for Clarinets, Trombones, Percussion, Violins, and Electronics and “The Map and the Territory: Documenting David Dunn’s Sky Drift” Davis, D. Edward January 2016 (has links) <p>1. nowhere landscape, for clarinets, trombones, percussion, violins, and electronics</p><p>nowhere landscape is an eighty-minute work for nine performers, composed of acoustic and electronic sounds. Its fifteen movements invoke a variety of listening strategies, using slow change, stasis, layering, coincidence, and silence to draw attention to the sonic effects of the environment—inside the concert hall as well as the world outside of it. The work incorporates a unique stage set-up: the audience sits in close proximity to the instruments, facing in one of four different directions, while the musicians play from a number of constantly-shifting locations, including in front of, next to, and behind the audience.</p><p>Much of nowhere landscape’s material is derived from a collection of field recordings</p><p>made by the composer during a road trip from Springfield, MA to Douglas, WY along US- 20, a cross-country route made effectively obsolete by the completion of I-90 in the mid- 20th century. In an homage to artist Ed Ruscha’s 1963 book Twentysix Gasoline Stations, the composer made twenty-six recordings at gas stations along US-20. Many of the movements of nowhere landscape examine the musical potential of these captured soundscapes: familiar and anonymous, yet filled with poignancy and poetic possibility.</p><p>2. “The Map and the Territory: Documenting David Dunn’s Sky Drift”</p><p>In 1977, David Dunn recruited twenty-six musicians to play his work Sky Drift in the</p><p>Anza-Borrego Desert in Southern California. This outdoor performance was documented with photos and recorded with four stationary microphones to tape. A year later, Dunn presented the work in New York City as a “performance/documentation,” playing back the audio recording and projecting slides. In this paper I examine the consequences of this kind of act: what does it mean for a recording of an outdoor work to be shared at an indoor concert event? Can such a complex and interactive experience be successfully flattened into some kind of re-playable documentation? What can a recording capture and what must it exclude?</p><p>This paper engages with these questions as they relate to David Dunn’s Sky Drift and to similar works by Karlheinz Stockhausen and John Luther Adams. These case-studies demonstrate different solutions to the difficulty of documenting outdoor performances. Because this music is often heard from a variety of equally-valid perspectives—and because any single microphone only captures sound from one of these perspectives—the physical set-up of these kind of pieces complicate what it means to even “hear the music” at all. To this end, I discuss issues around the “work itself” and “aura” as well as “transparency” and “liveness” in recorded sound, bringing in thoughts and ideas from Walter Benjamin, Howard Becker, Joshua Glasgow, and others. In addition, the artist Robert Irwin and the composer Barry Truax have written about the conceptual distinctions between “the work” and “not- the-work”; these distinctions are complicated by documentation and recording. Without the context, the being-there, the music is stripped of much of its ability to communicate meaning.</p> / Dissertation Music chamber music documentation field recordings instruments and electronics outdoor music
66	Technology adoption and diffusion in the South African online video Industry: a technopreneurial analysis Matlabo, Tiisetso January 2017 (has links) Thesis (M.M. (Entrepreneurship and New Venture Creation)), University of the Witwatersrand, Faculty of Commerce, Law and Management, Wits Business School, 2016. / Over the past few years the South African market has seen the launch of a number of online video services providers. The leading providers in the industry are Vidi, ON-Tap, MTN front row and ShowMax. The industry has also attracted some international competition with big players like Netflix launching its services in the South African market in January 2016. Although this industry has seen the emergence of many new players, it is still in its infacy stages in South Africa and is still to be seen if it will mature into a long term profit making industry. It is important to research the diffusion of innovation and more specially to look at how technopreneurs that are in this field or considering entering this industry can influence the speed and success of how this new innovation is diffused. This research will focus on two areas. Firstly, it will seek to look at the factors that influence the potential adopter’s propensity to adopt a new product. Secondly the research will look at the role played by the technopreneur in ensuring that online video services are adopted successfully. Since the online video services industry is not yet mature the research was conducted using the mixed method approach. The quantitative research was conducted by distributing online survey questionnaires. These questionnaires were distributed using email, as well as social media networks such as Facebook, Twitter and Linkedin. The qualitative research was conducted by performing interviews with a predetermined list of respondents. The combination of the two types of research led to a better understanding of this topic. The results the research highlighted the fact that the South African market poses very unique challenges for entrepreneurs that want to enter this industry. South African technopreneurs have an advantage against international players like Netflix because they understand challenges of internet access, payment issues as well as preferred content. / XL2018 Video recordings industry--South Africa. Technological innovations--South Africa.
67	A enunciação nos vídeos verticais : o protagonismo do corpo / Pereira, Henrique da Silva. January 2018 (has links) Orientador: Ana Silvia Lopes Davi Médola / Banca: José Carlos Marques / Banca: Conrado Moreira Mendes / Resumo: Esta pesquisa realizou uma investigação acerca da visualidade de videoclipes produzidos em aspecto vertical, com foco nas estratégias enunciativas destes textos audiovisuais. Objetivando identificar a produção de sentido, bem como as relações de comunicação possibilitadas pelos smartphones, foram analisados os quatro videoclipes verticais mais populares no YouTube, no primeiro semestre de 2018, a partir do aporte teórico-metodológico da semiótica discursiva de linha francesa. A figura do corpo humano mantém-se como o ponto central da representação nesses vídeos, reafirmando o discurso de valorização do indivíduo no universo da produção de videoclipes. A partir do corpus analisado, observou-se que a construção visual desses clipes pode referenciar tanto uma estética relacionada à pratica da selfie, valorizando uma lógica narcísica e de celebridade, como também uma estética e montagem características do videoclipe tradicional, especular e fragmentário, voltado especificamente para a fruição nos smartphones / Abstract: This research carried out an investigation about the visuality of music videos produced in the vertical aspect, focusing on the enunciative strategies of these audiovisual texts. Aiming to identify the production of meaning, as well as the communication relations made possible by smartphones, the four most popular vertical music videos on YouTube, in the first half of 2018, were analyzed from the theoretical and methodological approach of French discursive semiotics. The figure of the human body remains the focus of representation in these videos, reaffirming the discourse of valuing the individual while producing music videos. From the corpus analyzed, it was perceived that the visual construction of these clips can refer to an aesthetic related to the practice of selfie, valuing a narcissistic and celebrity logic, as well as an aesthetic and editing characteristic of traditional, specular and fragmentary music videos, specifically for fruition on smartphones / Mestre Recursos audiovisuais. Gravações de vídeo. Semiótica. Video recordings Smartphones.
68	Deep Learning for Action Understanding in Video Shou, Zheng January 2019 (has links) Action understanding is key to automatically analyzing video content and thus is important for many real-world applications such as autonomous driving car, robot-assisted care, etc. Therefore, in the computer vision field, action understanding has been one of the fundamental research topics. Most conventional methods for action understanding are based on hand-crafted features. Like the recent advances seen in image classification, object detection, image captioning, etc, deep learning has become a popular approach for action understanding in video. However, there remain several important research challenges in developing deep learning based methods for understanding actions. This thesis focuses on the development of effective deep learning methods for solving three major challenges. Action detection at fine granularities in time: Previous work in deep learning based action understanding mainly focuses on exploring various backbone networks that are designed for the video-level action classification task. These did not explore the fine-grained temporal characteristics and thus failed to produce temporally precise estimation of action boundaries. In order to understand actions more comprehensively, it is important to detect actions at finer granularities in time. In Part I, we study both segment-level action detection and frame-level action detection. Segment-level action detection is usually formulated as the temporal action localization task, which requires not only recognizing action categories for the whole video but also localizing the start time and end time of each action instance. To this end, we propose an effective multi-stage framework called Segment-CNN consisting of three segment-based 3D ConvNets: (1) a proposal network identifies candidate segments that may contain actions; (2) a classification network learns one-vs-all action classification model to serve as initialization for the localization network; and (3) a localization network fine-tunes the learned classification network to localize each action instance. In another approach, frame-level action detection is effectively formulated as the per-frame action labeling task. We combine two reverse operations (i.e. convolution and deconvolution) into a joint Convolutional-De-Convolutional (CDC) filter, which simultaneously conducts downsampling in space and upsampling in time to jointly model both high-level semantics and temporal dynamics. We design a novel CDC network to predict actions at frame-level and the frame-level predictions can be further used to detect precise segment boundary for the temporal action localization task. Our method not only improves the state-of-the-art mean Average Precision (mAP) result on THUMOS’14 from 41.3% to 44.4% for the per-frame labeling task, but also improves mAP for the temporal action localization task from 19.0% to 23.3% on THUMOS’14 and from 16.4% to 23.8% on ActivityNet v1.3. Action detection in the constrained scenarios: The usual training process of deep learning models consists of supervision and data, which are not always available in reality. In Part II, we consider the scenarios of incomplete supervision and incomplete data. For incomplete supervision, we focus on the weakly-supervised temporal action localization task and propose AutoLoc which is the first framework that can directly predict the temporal boundary of each action instance with only the video-level annotations available during training. To enable the training of such a boundary prediction model, we design a novel Outer-Inner-Contrastive (OIC) loss to help discover the segment-level supervision and we prove that the OIC loss is differentiable to the underlying boundary prediction model. Our method significantly improves mAP on THUMOS14 from 13.7% to 21.2% and mAP on ActivityNet from 7.4% to 27.3%. For the scenario of incomplete data, we formulate a novel task called Online Detection of Action Start (ODAS) in streaming videos to enable detecting the action start time on the fly when a live video action is just starting. ODAS is important in many applications such as early alert generation to allow timely security or emergency response. Specifically, we propose three novel methods to address the challenges in training ODAS models: (1) hard negative samples generation based on Generative Adversarial Network (GAN) to distinguish ambiguous background, (2) explicitly modeling the temporal consistency between data around action start and data succeeding action start, and (3) adaptive sampling strategy to handle the scarcity of training data. Action understanding in the compressed domain: The mainstream action understanding methods including the aforementioned techniques developed by us require first decoding the compressed video into RGB image frames. This may result in significant cost in terms of storage and computation. Recently, researchers started to investigate how to directly perform action understanding in the compressed domain in order to achieve high efficiency while maintaining the state-of-the-art action detection accuracy. The key research challenge is developing effective backbone networks that can directly take data in the compressed domain as input. Our baseline is to take models developed for action understanding in the decoded domain and adapt them to attack the same tasks in the compressed domain. In Part III, we address two important issues in developing the backbone networks that exclusively operate in the compressed domain. First, compressed videos may be produced by different encoders or encoding parameters, but it is impractical to train a different compressed-domain action understanding model for each different format. We experimentally analyze the effect of video encoder variation and develop a simple yet effective training data preparation method to alleviate the sensitivity to encoder variation. Second, motion cues have been shown to be important for action understanding, but the motion vectors in compressed video are often very noisy and not discriminative enough for directly performing accurate action understanding. We develop a novel and highly efficient framework called DMC-Net that can learn to predict discriminative motion cues based on noisy motion vectors and residual errors in the compressed video streams. On three action recognition benchmarks, namely HMDB-51, UCF101 and a subset of Kinetics, we demonstrate that our DMC-Net can significantly shorten the performance gap between state-of-the-art compressed video based methods with and without optical flow, while being two orders of magnitude faster than the methods that use optical flow. By addressing the three major challenges mentioned above, we are able to develop more robust models for video action understanding and improve performance in various dimensions, such as (1) temporal precision, (2) required levels of supervision, (3) live video analysis ability, and finally (4) efficiency in processing compressed video. Our research has contributed significantly to advancing the state of the art of video action understanding and expanding the foundation for comprehensive semantic understanding of video content. Computer science Electrical engineering Machine learning Video recordings Computer vision
69	Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes Tsai, Chun-Yu January 2017 (has links) We demonstrate four novel multimodal methods for efficient video summarization and comprehensive cross-cultural news video understanding. First, For video quick browsing, we demonstrate a multimedia event recounting system. Based on nine people-oriented design principles, it summarizes YouTube-like videos into short visual segments (812sec) and textual words (less than 10 terms). In the 2013 Trecvid Multimedia Event Recounting competition, this system placed first in recognition time efficiency, while remaining above average in description accuracy. Secondly, we demonstrate the summarization of large amounts of online international news videos. In order to understand an international event such as Ebola virus, AirAsia Flight 8501 and Zika virus comprehensively, we present a novel and efficient constrained tensor factorization algorithm that first represents a video archive of multimedia news stories concerning a news event as a sparse tensor of order 4. The dimensions correspond to extracted visual memes, verbal tags, time periods, and cultures. The iterative algorithm approximately but accurately extracts coherent quad-clusters, each of which represents a significant summary of an important independent aspect of the news event. We give examples of quad-clusters extracted from tensors with at least 108 entries derived from international news coverage. We show the method is fast, can be tuned to give preferences to any subset of its four dimensions, and exceeds three existing methods in performance. Thirdly, noting that the co-occurrence of visual memes and tags in our summarization result is sparse, we show how to model cross-cultural visual meme influence based on normalized PageRank, which more accurately captures the rates at which visual memes are reposted in a specified time period in a specified culture. Lastly, we establish the correspondences of videos and text descriptions in different cultures by reliable visual cues, detect culture-specific tags for visual memes and then annotate videos in a cultural settings. Starting with any video with less text or no text in one culture (say, US), we select candidate annotations in the text of another culture (say, China) to annotate US video. Through analyzing the similarity of images annotated by those candidates, we can derive a set of proper tags from the viewpoints of another culture (China). We illustrate cultural-based annotation examples by segments of international news. We evaluate the generated tags by cross-cultural tag frequency, tag precision, and user studies. Computer science Broadcast journalism Memes
70	Alfred cortot, interprète de Frédéric Chopin / Alfred Cortot interprets Chopin Taillandier-Guittard, Inès 06 December 2013 (has links) Si Alfred Cortot se définit comme interprète de Frédéric Chopin, c’est non seulement comme pianiste mais aussi comme théoricien. Le travail d’exégèse qu’il mène, notamment au travers de nombreux écrits, est le corrélat de toute exécution instrumentale. Notre recherche a ainsi pour objet de montrer comment ces deux acceptions du terme « interprétation » se complètent et suscitent un dialogue fructueux entre la concrétude d’un objet sonore et l’abstraction d’une pensée. Cela suppose de considérer non seulement la manière dont Cortot nous donne à réfléchir, en tant que biographe, sur une image particulière de Chopin et de sa musique, mais aussi le procédé par lequel il passe d’une formalisation théorique à une concrétisation de sa pensée dans et par l’interprétation. Notre travail s’organise selon trois axes : un examen des méthodes historiographiques employées par Cortot, une étude de sa démarche herméneutique – en particulier dans l’édition de travail des œuvres de Chopin – et enfin une analyse de quelques enregistrements des Préludes op. 28. / If Alfred Cortot performs and interprets Chopin’s, it is not only as a pianist but also as a theoretician. His exegesis, expounded in many writings, is the correlate of his instrumental performance. The purpose of our research is to show how both aspects of interpretation complement each other and generate a fruitful dialogue between concerte sound and abstracte thought. This implies to consider in which terms Cortot proposes and creates, as a biographer, a singular image of Chopin, but also the process through which he switches from a theoretical formalization to the embodiment of his thought in his performance.Our work is organized along three axes: a review of historiographical methods used by Cortot, a study of the hermeneutical process – especially in the student edition of Chopin’s work – and finally an analysis of some recordings of the Preludes op. 28. Herméneutique musicale Enregistrements sonores Esthétique Musical hermeneutics Recordings Performance

Search results