• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 86
  • 13
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 119
  • 119
  • 23
  • 19
  • 16
  • 16
  • 15
  • 14
  • 13
  • 13
  • 12
  • 11
  • 11
  • 11
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Kelli and the misfits

Joy, Ronald Dean 01 January 1988 (has links)
No description available.
82

Deep networks for sign language video caption

Zhou, Mingjie 12 August 2020 (has links)
In the hearing-loss community, sign language is a primary tool to communicate with people while there is a communication gap between hearing-loss people with normal hearing people. Sign language is different from spoken language. It has its own vocabulary and grammar. Recent works concentrate on the sign language video caption which consists of sign language recognition and sign language translation. Continuous sign language recognition, which can bridge the communication gap, is a challenging task because of the weakly supervised ordered annotations where no frame-level label is provided. To overcome this problem, connectionist temporal classification (CTC) is the most widely used method. However, CTC learning could perform badly if the extracted features are not good. For better feature extraction, this thesis presents the novel self-attention-based fully-inception (SAFI) networks for vision-based end-to-end continuous sign language recognition. Considering the length of sign words differs from each other, we introduce the fully inception network with different receptive fields to extract dynamic clip-level features. To further boost the performance, the fully inception network with an auxiliary classifier is trained with aggregation cross entropy (ACE) loss. Then the encoder of self-attention networks as the global sequential feature extractor is used to model the clip-level features with CTC. The proposed model is optimized by jointly training with ACE on clip-level feature learning and CTC on global sequential feature learning in an end-to-end fashion. The best method in the baselines achieves 35.6% WER on the validation set and 34.5% WER on the test set. It employs a better decoding algorithm for generating pseudo labels to do the EM-like optimization to fine-tune the CNN module. In contrast, our approach focuses on the better feature extraction for end-to-end learning. To alleviate the overfitting on the limited dataset, we employ temporal elastic deformation to triple the real-world dataset RWTH- PHOENIX-Weather 2014. Experimental results on the real-world dataset RWTH- PHOENIX-Weather 2014 demonstrate the effectiveness of our approach which achieves 31.7% WER on the validation set and 31.2% WER on the test set. Even though sign language recognition can, to some extent, help bridge the communication gap, it is still organized in sign language grammar which is different from spoken language. Unlike sign language recognition that recognizes sign gestures, sign language translation (SLT) converts sign language to a target spoken language text which normal hearing people commonly use in their daily life. To achieve this goal, this thesis provides an effective sign language translation approach which gains state-of-the-art performance on the largest real-life German sign language translation database, RWTH-PHOENIX-Weather 2014T. Besides, a direct end-to-end sign language translation approach gives out promising results (an impressive gain from 9.94 to 13.75 BLEU and 9.58 to 14.07 BLEU on the validation set and test set) without intermediate recognition annotations. The comparative and promising experimental results show the feasibility of the direct end-to-end SLT
83

Modality Bridging and Unified Multimodal Understanding

Akbari, Hassan January 2022 (has links)
Multimodal understanding is a vast realm of research that covers multiple disciplines. Hence, it requires a correct understanding of the goal in a generic multimodal understanding research study. The definition of modalities of interest is important since each modality requires its own considerations. On the other hand, it is important to understand whether these modalities should be complimentary to each other or have significant overlap in terms of the information they carry. For example, most of the modalities in biological signals do not have significant overlap with each other, yet they can be used together to improve the range and accuracy of diagnoses. An extreme example of two modalities that have significant overlap is an instructional video and its corresponding instructions in detailed texts. In this study, we focus on multimedia, which includes image, video, audio, and text about real world everyday events, mostly focused on human activities. We narrow our study to the important direction of common space learning since we want to bridge between different modalities using the overlap that a given pair of modalities have.There are multiple applications which require a strong common space to be able to perform desirably. We choose image-text grounding, video-audio autoencoding, video-conditioned text generation, and video-audio-text common space learning for semantic encoding. We examine multiple ideas in each direction and achieve important conclusions. In image-text grounding, we learn that different levels of semantic representations are helpful to achieve a thorough common space that is representative of two modalities. In video-audio autoencoding, we observe that reconstruction objectives can help with a representative common space. Moreover, there is an inherent problem when dealing with multiple modalities at the same time, and that is different levels of granularity. For example, the sampling rate and granularity of video is much higher and more complicated compared to audio. Hence, it might be more helpful to find a more semantically abstracted common space which does not carry redundant details, especially considering the temporal aspect of video and audio modalities. In video-conditioned text generation, we examine the possibility of encoding a video sequence using a Transformer (and later decoding the captions using a Transformer decoder). We further explore the possibility of learning latent states for storing real-world concepts without supervision. Using the observations from these three directions, we propose a unified pipeline based on the Transformer architecture to examine whether it is possible to train a (true) unified pipeline on raw multimodal data without supervision in an end-to-end fashion. This pipeline eliminates ad-hoc feature extraction methods and is independent of any previously trained network, making it simpler and easier to use. Furthermore, since it only utilizes one architecture, which enables us to move towards even more simplicity. Hence, we take an ambitious step forward and further unify this pipeline by sharing only one backbone among four major modalities: image, video, audio, and text. We show that it is not only possible to achieve this goal, but we further show the inherent benefits of such pipeline. We propose a new research direction under multimodal understanding and that is Unified Multimodal Understanding. This study is the first that examines this idea and further pushes its limit by scaling up to multiple tasks, modalities, and datasets. In a nutshell, we examine different possibilities for bridging between a pair of modalities in different applications and observe several limitations and propose solutions for them. Using these observations, we provide a unified and strong pipeline for learning a common space which could be used for many applications. We show that our approaches perform desirably and significantly outperform state-of-the-art in different downstream tasks. We set a new baseline with competitive performance for our proposed research direction, Unified Multimodal Understanding.
84

Phoneme-based Video Indexing Using Phonetic Disparity Search

Barth, Carlos Leon 01 January 2010 (has links)
This dissertation presents and evaluates a method to the video indexing problem by investigating a categorization method that transcribes audio content through Automatic Speech Recognition (ASR) combined with Dynamic Contextualization (DC), Phonetic Disparity Search (PDS) and Metaphone indexation. The suggested approach applies genome pattern matching algorithms with computational summarization to build a database infrastructure that provides an indexed summary of the original audio content. PDS complements the contextual phoneme indexing approach by optimizing topic seek performance and accuracy in large video content structures. A prototype was established to translate news broadcast video into text and phonemes automatically by using ASR utterance conversions. Each phonetic utterance extraction was then categorized, converted to Metaphones, and stored in a repository with contextual topical information attached and indexed for posterior search analysis. Following the original design strategy, a custom parallel interface was built to measure the capabilities of dissimilar phonetic queries and provide an interface for result analysis. The postulated solution provides evidence of a superior topic matching when compared to traditional word and phoneme search methods. Experimental results demonstrate that PDS can be 3.7% better than the same phoneme query, Metaphone search proved to be 154.6% better than the same phoneme seek and 68.1 % better than the equivalent word search.
85

Learning Video Representation from Self-supervision

Chen, Brian January 2023 (has links)
This thesis investigates the problem of learning video representations for video understanding. Previous works have explored the use of data-driven deep learning approaches, which have been shown to be effective in learning useful video representations. However, obtaining large amounts of labeled data can be costly and time-consuming. We investigate self-supervised approach as for multimodal video data to overcome this challenge. Video data typically contains multiple modalities, such as visual, audio, transcribed speech, and textual captions, which can serve as pseudo-labels for representation learning without needing manual labeling. By utilizing these modalities, we can train deep representations over large-scale video data consisting of millions of video clips collected from the internet. We demonstrate the scalability benefits of multimodal self-supervision by achieving new state-of-the-art performance in various domains, including video action recognition, text-to-video retrieval, and text-to-video grounding. We also examine the limitations of these approaches, which often rely on the association assumption involving multiple modalities of data used in self-supervision. For example, the text transcript is often assumed to be about the video content, and two segments of the same video share similar semantics. To overcome this problem, we propose new methods for learning video representations with more intelligent sampling strategies to capture samples that share high-level semantics or consistent concepts. The proposed methods include a clustering component to address false negative pairs in multimodal paired contrastive learning, a novel sampling strategy for finding visually groundable video-text pairs, an investigation of object tracking supervision for temporal association, and a new multimodal task for demonstrating the effectiveness of the proposed model. We aim to develop more robust and generalizable video representations for real-world applications, such as human-to-robot interaction and event extraction from large-scale news sources.
86

The effects of highlight videotapes on the self-efficacy and state sport-confidence of female tennis players

Bjorn, Kiersten January 1995 (has links)
No description available.
87

When piracy meets the Internet: the diverse film consumption of China in an unorthodox globalization.

January 2008 (has links)
Wu, Xiao. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2008. / Includes bibliographical references (leaves 111-124). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Table of Contents --- p.v / Chapter Chapter One: --- Chinese Film Piracy Consumption and Media Globalization --- p.1 / Introduction: The Rampant Film Piracy in China --- p.1 / Literature Review --- p.4 / Focuses in Chinese Film Piracy --- p.4 / Four Theoretical Positions in Media Globalization --- p.7 / Summary --- p.17 / Chapter Chapter Two: --- Problematics of Chinese Film Piracy Consumption --- p.19 / Two Concepts --- p.19 / Diversity --- p.19 / Filmic Gene Pool --- p.20 / Two Arguments and One Deduction --- p.23 / The Argument for the Expanding Global Capital --- p.23 / The Argument for National Protectionism --- p.25 / The Long Tail --- p.26 / The Theoretical Deduction for the Chinese Case --- p.27 / Research Questions --- p.28 / Methodological Note --- p.28 / Chapter Chapter Three: --- A Re-Examination of Chinese Film Piracy Market --- p.32 / The Myth of Market Access --- p.32 / State Censorship Overlooked --- p.34 / The First-Release Obsession --- p.35 / An Internet Take-over? --- p.38 / Summary --- p.39 / Chapter Chapter Four: --- "In Search of the “Invisible"" Audience/Viewers" --- p.42 / The “Official´ح Audience --- p.42 / Chinese Film Audiences Re-Captured --- p.45 / Sketches on the ´بInvisible´ة Viewers --- p.51 / Conclusion --- p.56 / Chapter Chapter Five: --- Structural Analysis for Chinese Film Piracy Consumption --- p.58 / Chinese Piracy Viewers: An Idle Spare of the Nexus? --- p.58 / The Film Piracy Market in China --- p.61 / Summary --- p.63 / Chapter Chapter Six: --- A Brief History of Chinese Piracy Consumption --- p.67 / Video Hall (Mid-1980s to Mid-1990s) --- p.68 / Epoch of the Videodisc (Since mid-1990s) --- p.70 / Online Movie Forums and Blogs (1998-Present) --- p.73 / Online Social Networks of Cinephiles (2004-Present) --- p.76 / The Accompanying Print Media (1999-Present) --- p.78 / Conclusion --- p.82 / Chapter Chapter Seven: --- The Chinese Public Cine-Space --- p.83 / The Publicness of Piracy Viewing --- p.83 / A Public Cine-Space --- p.84 / Cultural Public Sphere: The Concept --- p.84 / The Chinese Internet --- p.85 / The Chinese Online Film Critics --- p.87 / The Chinese Public Cine-Space --- p.89 / A Trajectory of the Online Cine-Space --- p.90 / Mechanism towards Diversity --- p.93 / The Techno-Divide --- p.98 / Chapter Chapter Eight: --- Conclusion --- p.100 / Contributions --- p.100 / Historical Account of Chinese Film Piracy Consumption --- p.100 / Inclusive Model for Diversity of Cultural Market --- p.101 / Weaknesses and Future Suggestions --- p.103 / Final Remark --- p.106 / Appendix A --- p.108 / Bibliography --- p.111
88

Automatic Removal of Complex Shadows From Indoor Videos

Mohapatra, Deepankar 08 1900 (has links)
Shadows in indoor scenarios are usually characterized with multiple light sources that produce complex shadow patterns of a single object. Without removing shadow, the foreground object tends to be erroneously segmented. The inconsistent hue and intensity of shadows make automatic removal a challenging task. In this thesis, a dynamic thresholding and transfer learning-based method for removing shadows is proposed. The method suppresses light shadows with a dynamically computed threshold and removes dark shadows using an online learning strategy that is built upon a base classifier trained with manually annotated examples and refined with the automatically identified examples in the new videos. Experimental results demonstrate that despite variation of lighting conditions in videos our proposed method is able to adapt to the videos and remove shadows effectively. The sensitivity of shadow detection changes slightly with different confidence levels used in example selection for classifier retraining and high confidence level usually yields better performance with less retraining iterations.
89

A novel MPEG-1 partial encryption scheme for the purposes of streaming video

But, Jason January 2004 (has links)
Abstract not available
90

Exhibiting integrity : archival diplomatics to study moving images

Miller, April G. 11 1900 (has links)
This thesis examines the concepts of reliability, authenticity and documentary form as defined by archival diplomatics and their relation to moving image records, for the purpose of exploring the possibility of using them to develop a method for the preservation of the moving image's intellectual integrity over time. To achieve this purpose, the study establishes a correspondence between the tenriinology and the theories used to express these concepts in the two fields through an examination of archival diplomatics and moving images glossaries, dictionaries and literature. Notwithstanding the different understandings of the concepts examined, the thesis finds that when moving images can be regarded as records - that is, as contextual mediated visual and aural representations compiled for the purpose.of.entering into communication - it is possible to use archival diplomatics methodology to analyze them successfully. On the strength of this finding, the thesis proceeds to establish a correspondence between the diplomatic elements of documentary form and the components of an ideal moving image record, demonstrating parallels and explaining and reconciling differences, in order to build a template for the analysis of all kinds of moving image records. This diplomatic instrument is to be used for the identification of the formal elements of a moving image that allow for the maintenance, verification and preservation of its reliability and authenticity over the long term. The necessity of such an instrument derives from the fact that the use of digital technologies for the making, exhibiting and storing of moving images will render the ability to prove their integrity and their preservation increasingly more difficult. The thesis is concluded by a discussion relating the effects of the pervasive use of digital technologies in the field of moving images, and a demonstration of the substantial threat they present for the continuing reliability and authenticity of moving images. This discussion shows the advantages of a close cooperative effort by archivists and moving image theorists in developing interdisciplinary methods for addressing such threats that are rooted in archival diplomatics and fully respect the nature of the moving image record.

Page generated in 0.0995 seconds