Spelling suggestions: "subject:"[een] MULTIMEDIA DOCUMENTS"" "subject:"[enn] MULTIMEDIA DOCUMENTS""
1 |
A novel secure autonomous generalized document model using object oriented techniqueSelim, Hossam Abdelatif Mohamed January 2002 (has links)
No description available.
|
2 |
[en] A CAPTURE AND ACCESS SERVICE FOR ACTIVE SPACES / [pt] UM SERVIÇO DE CAPTURA E ACESSO PARA ESPAÇOS ATIVOSFELIPE ALBUQUERQUE PORTELLA 12 November 2009 (has links)
[pt] Uma das áreas de grande destaque dentro da Computação Ubíqua é a de
aplicações multimídia para Captura & Acesso (C&A). Essas aplicações permitem
a captura de uma experiência ao vivo, normalmente em ambientes instrumentados,
para seu acesso no futuro. Dessa forma transfere-se para os computadores a
responsabilidade de gravar o evento, permitindo que as pessoas tenham seu foco
de atenção na compreensão e interpretação da experiência em si, sem se preocupar
com a tarefa de registrar a informação. A literatura apresenta muitas ferramentas
que permitem a geração automática de documentos multimídia como resultado da
captura de um evento, e esses mesmos documentos são usados como base para
navegação e busca sobre o conteúdo armazenado. Tipicamente, essas ferramentas
de C&A geram documentos que oferecem uma navegação com base apenas na
linha de tempo (timeline) do evento registrado. Esta dissertação propõe uma infraestrutura
genérica de C&A, baseada em serviços reutilizáveis e intercambiáveis,
que explora os recursos oferecidos pela linguagem NCL para investigar novos
paradigmas na engenharia de documentos produzidos por aplicações de C&A,
através da estruturação dos documentos em modelos conceitual, navegacional e de
apresentação. Utilizamos a linguagem NCL tanto para registrar o sincronismo
entre as diferentes mídias gravadas, quanto para gerar diferentes formas de
navegação e apresentação do conteúdo gravado. Os modelos de navegação e
apresentação são gerados com base em metadados fornecidos pelo usuário ou
extraídos automaticamente do conteúdo gravado. / [en] One of the areas of most evidence in Ubiquitous Computing is multimedia
applications of Capture & Access (C&A). This kind of application allows the
capture of a live experience, usually in smart rooms, for future access. In this way,
the responsibility for recording the event is transferred to the computing
infrastructure, allowing users to focus their attention in the comprehension and
interpretation of the experience itself, without worrying about registering the
information. The literature presents many software systems allowing the
automatic generation of hypermedia documents as the result of an event capture,
using the same documents as the basis for navigation and search of the archived
content. Typically these C&A applications generate documents that offer only a
timeline navigation of the captured event. This dissertation proposes a general
C&A infrastructure, based on reusable and interchangeable services, which
explores the features offered by the NCL language (standard language of the
Brazilian Digital TV) to investigate new paradigms in C&A documents
engineering. This is accomplished by structuring the generated documents in
conceptual, navigation and presentation models. The NCL language is used to
represent the synchronism between the different recorded media as well as to
generate different ways to navigate and present the recorded content. These
models of navigation and presentations are based on metadata provided by the
user or automatically extracted from the recorded content.
|
3 |
Using multimedia to teach French language and cultureLemoine, Florence Marie 16 April 2013 (has links)
In order for the study of French to survive in American higher education, it will be necessary to adopt a pedagogy that motivates learners as well as teaches them both language and culture. I argue that the judicious use of visual materials (film, video and graphic novels) is ideal for this undertaking. I further assert--based upon numerous sources from fields such as Second Language Acquisition, cognitive psychology, anthropology and sociolinguistics--that language and culture are inseparable, and that visual materials provide the necessary context to facilitate the teaching of both. Visual materials present both problems and opportunities. I discuss such difficulties as cognitive overload (i.e., students’ being overwhelmed by too much information in too short a period of time) and suggest practical solutions. I also present criteria for the selection of films, such as appropriateness, learning goals and appeal to US university students. I also show how authentic media such as video can be adapted for all proficiency levels (e.g., assigning beginners’ simple word recognition tasks). In considering graphic novels, I suggest a familiar comic strip, Tintin, which is appropriate for beginning to advanced students, and which is likely to appeal to all students, given its American film adaptation. In the appendices, I include applications of the points presented in this report. In the conclusion, I argue that, regardless of the length formal instruction in French, this pedagogy can support practical skills (for example, dealing with people from other cultures) and lifelong learning (for example, staying involved with French culture through the aforementioned media). / text
|
4 |
Proširivi sistem za pronalaženje multimedijalnih dokumenata / Extensible Multimedia Information Retrieval SystemMilosavljević Branko 22 July 2003 (has links)
<p>Oblast pronalaženja informacija kao jedan od osnovnih problema razmatra pronalaženje dokumenata u kolekciji koji su relevantni sa stanovišta korisnika. Ova disertacija se bavi problemima pronalaženja strukturiranih multimedijalnih dokumenata. Strukturirani multimedijalni dokumenti mogu, kao svoje elemente, sadržati objekte različitih tipova medija<br />(tekst, slika, zvuk, ili video). Tema disertacije je formalna specifikacija modela sistema koji omogućava pronalaženje multimedijalnih dokumenata obezbeđujući pri tom proširivost sistema podrškom za različite tipove medija (što uključuje upotrebu različitih postojećih rešenja iz ove oblasti) i proširivost sistema različitim modelima pronalaženja dokumenata. XML jezik se koristi kao jezik za reprezentaciju dokumenata i kao jezik za komunikaciju<br />sistema sa klijentima. Sistem je verifikovan na realnom primeru digitalne biblioteke doktorskih i magistarskih teza pomoću razvijenog prototipa. Prikazana prototipska implementacija koja ispunjava ciljeve u pogledu funkcionalnosti postavljene pred<br />sistem predstavlja potvrdu praktiˇcne vrednosti predloženog modela.</p> / <p>The field of information retrieval deals with retrieval of documents judged as relevant by users. This dissertation focuses on problems in retrieval of structured multimedia documents. Structured multimedia documents comprise objects of different media types (such as text, images, audio or video clips) as their elements. The subject of the dissertation is a formal specification of a multimedia information retrieval system providing extensibility with support for different media types (including utilizing existing solutions in this field) and extensibility with different document retrieval models. XML is used as a language for expressing document content and as a langugage for communication between the system and its clients. The system is verified by a case study on a networked digital library of theses and dissertations. The presented prototype implementation presents a proof of the proposed model’s practical value.</p>
|
5 |
[en] ACTIVE PRESENTATION: A SYSTEM FOR DISTRIBUTED MULTIMEDIA PRESENTATIONS IN UBIQUITOUS COMPUTING ENVIRONMENTS / [pt] ACTIVE PRESENTATION: UM SISTEMA PARA APRESENTAÇÕES DISTRIBUÍDAS EM AMBIENTES DE COMPUTAÇÃO UBÍQUAMARK STROETZEL GLASBERG 21 June 2006 (has links)
[pt] A diminuição do custo e a diversificação dos
dispositivos
computacionais vêm criando ambientes de computação cada
vez mais completos e sofisticados, mas que ainda carecem
de softwares de integração que simplifiquem sua
utilização
e que melhor explorarem seu potencial. Para responder a
esta questão, sistemas de computação ubíqua vêm sendo
desenvolvidos para criar mecanismos de controle remoto e
unificado de aplicações e dispositivos. Além disso,
grupos
de pesquisa na área de multimídia abordam outras
questões
como a sincronização entre dispositivos, gerência de
recursos e definição de padrões de documentos. Este
trabalho visa unificar os esforços de ambas as áreas com
o
objetivo de realizar apresentações multimídia através do
paradigma de computação ubíqua. Neste trabalho,
apresentamos a infraestrutura de execução de
apresentações
ActivePresentation desenvolvida e baseada em diferentes
protótipos. Além disso, propomos um formato de documento
multimídia chamado NCLua para orquestrar tais
apresentações. / [en] The diminishing costs and the diversity of computational
devices are creating
increasingly complex and sophisticated computational
environments.
Nonetheless, such environments still lack integration
software to simplify
their use and to better explore their potential. In
response to this issue,
ubiquitous computing systems are being developed to create
remote control
mechanisms of applications and devices. At the same time,
research groups
in the area of multimedia deal with other issues, such as
synchronization
between devices, resource management and document standard
definitions.
The purpose of this work is to join the efforts of both
areas in order to
build presentations using the ubiquitous computing
paradigm. In this work,
we introduce the ActivePresentation infrastructure, based
and developed
through the study of different prototypes. In addition, we
propose a multimedia
document format called NCLua to orchestrate such
presentations.
|
6 |
Contribution à la modélisation des métadonnées associées aux documents multimédias et à leur enrichissement par l’usage / Contribution to the modeling of metadata associated to multimedia documents and to their enrichment through the usageManzat, Ana-Maria 05 February 2013 (has links)
De nos jours, ce ne sont pas que les collections multimédias qui deviennent de plus en plus volumineuses, mais aussi les métadonnées qui les décrivent. L’extraction des métadonnées est très coûteuse en consommation de ressources. Cela pose le problème de la gestion efficace de ces grands volumes de données, en minimisant cette consommation. Le fait que les utilisateurs sont en constante interaction avec les documents multimédias et les métadonnées complique encore plus cette gestion. Dans cette thèse, nous étudions le problème de la gestion de métadonnées en intégrant les interactions des utilisateurs à deux niveaux: dans le processus de création de métadonnées et dans leur enrichissement. La grande variété de standards et normes de métadonnées existants ne sont pas interopérables. Les solutions proposées à ce problème d’interopérabilité se sont focalisées sur la création d’ontologies qui décrivent les contenus multimédias du point de vue sémantique, sans forcément prendre en compte les standards de métadonnées et d’autres informations de plus bas niveau sur les documents. Pour résoudre ce problème nous proposons un format de métadonnées qui intègre les standards et normes les plus utilisés et qui est flexible et extensible en structure et en vocabulaire. Dans le cadre d’un système de gestion des contenus multimédias, le processus d’indexation est celui qui consomme le plus de ressources, à travers les algorithmes d’indexation qui extraient les métadonnées. Dans les systèmes classiques, cette indexation est accomplie avec un ensemble d’algorithmes d’indexation figé dans le temps, sans se soucier de la consommation des ressources ni de l’évolution des besoins de l’utilisateur. Pour prendre en compte les besoins que l’utilisateur spécifie dans sa requête, afin de n’extraire que les métadonnées nécessaires et ainsi limiter d’un côté le volume de métadonnées à gérer et de l’autre la consommation des ressources, nous proposons de répartir le processus d’indexation en deux phases: une fois à l’acquisition des contenus (indexation implicite), et une deuxième fois, si besoin, au moment de l’exécution de la requête de l’utilisateur (indexation explicite) en ayant recours à une liste d’algorithmes d’indexation déterminée principalement en fonction de la requête de l’utilisateur. L’utilisateur est de plus en plus pris en compte dans les systèmes multimédias à travers ses interactions avec le système et le document. Nous proposons d’aller plus loin dans la prise en compte de l’utilisateur, en considérant ses interactions avec les différentes parties du document mais aussi avec les métadonnées qui décrivent le document. Cela a été réalisé à travers l’extension du format de métadonnées proposée, par l’ajout d une température à chaque élément du format, qui varie dans le temps, étant calculée en fonction de la façon dont l’utilisateur interagit avec le document, mais aussi avec les métadonnées dans une période de temps. Nous avons validé nos propositions dans deux domaines différents: la vidéo surveillance et le commerce électronique. Le projet LINDO nous a permis la validation du format des métadonnées et de la sélection des algorithmes d’indexation dans le cadre de l’indexation explicite, dans le cadre de la vidéo surveillance. Dans le domaine du commerce électronique, nous avons exploité les interactions des utilisateurs réels avec un site de vente en ligne pour calculer la température des métadonnées associées aux pages du site pendant une période de deux mois. Nous avons utilisé cette température pour réaliser le reclassement des résultats obtenus pour une requête de l’utilisateur. Nous avons réalisé un test utilisateur sur une vingtaine de personnes. [...] / Nowadays, not only multimedia collections become larger, but also the metadata describing them. The metadata extraction is the most ressource consumming process in the management of multimedia collections. This raises the problem of the efficient management of these large data volumes while minimizing ressource consumption. Users’ constant interactions with multimedia documents and metadata complicate this management process. In this thesis, we adress this problem of metadata management by integrating users’ interactions at two levels: in the process of metadata creation and in their enrichment. The existing metadata standards are heterogenous and not interoperable. The proposed solutions for this interoperability problem focused on creating ontologies that describe the multimedia contents from a semantic point of view, without necessarily taking into account metadata standards and other low level information. To solve this problem, we propose a metadata format that integrates the most widely used metadata standards and which is flexible and extensible in structure and vocabulary. In a multimedia management system, the indexing process is the most resource consumming, through the indexing algorithms that extract metadata. In conventional systems, the indexing is accomplished with a fixed set of indexing algorithms, without considering the resource consumption and users’ changing needs. To take into account the user’s needs, specified in his query, in ordre to extract only the necessary metadata and thus, on one side, to limit the metadata volume and on the other to reduce the resource consumption, we propose to split the indexing process into two phases: first time, at the contents acquisition time (i.e., implicit indexation), and, a second time, if necessary, at the query execution time (i.e., explicit indexation), employing a list of indexing algorithms determined mainly according to the user’s query. The users are more and more taken into account in multimedia systems through their interactions with the system and the documents. We propose to go further in this consideration, by taking into account users’interactions with different parts of the document, and also with the document’s metadata. This was achieved through the extention of the proposed metadata format, by associating a temperature to each metadata element. This temperature is calculated according to the users’ interactions with the document and with the metadata, in a time period. We have validated our proposals in two different domains: vidéosurveillance and e-commerce. The LINDO project has allowed us to validate the metadata format and indexing algorithms selection in the context of explicit indexation, for a video surceillance use case. For the e-commerce, we have used an online shopping site and the interactions of its real users, for a two months period, to calculate the temperature of the metadata associated to the web pages describing the site’s products. We have used this temperature for reranking the results obtained for a user’s query. We conducted a user study with twenty people, which shows that, for some users’ queries, the results reranking helps the users to find faster the desired information. This thesis has addressed the problem of taking into account the user in the multimedia documents management by: (1 )proposing a model metadata that integrates the most used metadata standards; (2) spliting the multimedia indexing in two steps ( implicit and explicit indexation); (3) enriching the metadata according to the users’ interactions with the system, the multimedia documents and the metadata.
|
7 |
[en] SUPPORTING MULTIMEDIA APPLICATIONS IN STEREOSCOPIC AND DEPTH-BASED 3D VIDEO SYSTEMS / [pt] SUPORTE A APLICAÇÕES MULTIMÍDIA EM SISTEMAS DE VÍDEO 3D ESTEREOSCÓPICOS E BASEADOS EM PROFUNDIDADEROBERTO GERSON DE ALBUQUERQUE AZEVEDO 07 June 2016 (has links)
[pt] Tecnologias de vídeos bidimensionais (2D) têm evoluído rapidamente nos últimos anos. Apesar disso, elas não permitem uma visão realista e imersiva do mundo, pois não oferecem importantes dicas de profundidade para o sistema visual humano. Tecnologias de vídeo tridimensionais (3D) têm como objetivo preencher essa lacuna, provendo representações que permitem a reprodução de informações de profundidade em displays 3D. Embora a representação baseada em vídeos estereoscópicos ainda seja a mais utilizada até o momento, novas representações de vídeo 3D têm emergido, tais como MVV (Multi-view video), 2D plus Z (2D plus depth), MVD (Multi-view plus depth) e LDV (Layered-depth video). A integração de aplicações multimídia com mídias 3D tem o potencial de permitir novos conteúdos interativos, novas experiências com o usuário e novos modelos de negócio. Nesta tese, duas abordagens para a integração de aplicações multimídia em cadeias de transmissão de vídeo 3D fim-a-fim são propostas. Primeiro, uma abordagem que é compatível com cadeias de transmissão de vídeo 3D baseado em vídeos estereoscópicos é discutida. A proposta consiste em extensões para linguagens multimídia 2D e um processo de conversão de aplicações multimídia 2D para sua versão estereoscópica. Essa proposta não requer nenhuma alteração no exibidor de linguagens multimídia 2D para a apresentação de mídias estereoscópicas. Em uma segunda abordagem, extensões adicionais a linguagens multimídia também são propostas visando a integração de aplicações multimídia em cadeias de vídeo 3D baseado em profundidade (2D plus Z ou LDV). Além disso, uma arquitetura para a composição gráfica dessas aplicações, baseada no conceito de LDV e que permite a integração de objetos de mídia baseado em profundidade em exibidores de aplicações multimídias é apresentada. Como um exemplo de aplicação prática das proposta desta tese, ambas são implementadas e integradas em um sistema de vídeo 3D fim-a-fim baseado no Sistema Brasileiro de TV Digital. / [en] Two-dimensional video technologies have evolved quickly in the last few years. Even so, they do not achieve a realistic and immersive view of the world since they do not offer important depth cues to the human vision system. Three-dimensional video (3DV) technologies try to fulfill this gap through video representations that enable 3D displays to provide those additional depth cues. Although CSV (Conventional Stereoscopic Video) has been the most widely-used 3DV representation, other 3DV representations have emerged during the last years. Examples of those representations include MVV (Multi-view video), 2D plus Z (2D plus depth), MVD (Multi-view plus depth), and LDV (Layered-depth Video). Although end-to-end 3DV delivery chains based on those 3DV formats have been studied, the integration of interactive multimedia applications into those 3DV delivery chains has not yet been explored enough. The integration of multimedia applications with 3D media using those new representations has the potential of allowing new rich content, user experiences and business models. In this thesis, two approaches for the integration of multimedia applications into 3DV end-to-end delivery chains are proposed. First, a backward-compatible approach for integrating CSV-based media into 2D-only multimedia languages is discussed. In this proposal, it is possible to add depth information to 2D-only media objects. The proposal consists of extensions to multimedia languages and a process for converting the original multimedia application into its stereoscopic version. It does not require any change on the language player and is ready-to-run in current CSV-based 3DV delivery chains and digital receiver s hardware. Second, extensions to multimedia languages based on layered-depth media are proposed and a software architecture for the graphics composition of multimedia applications using those extensions is presented. As an example, both proposals are implemented and integrated into an end-to-end 3DV delivery chain based on the Brazilian Digital TV System.
|
8 |
Operadores de interação multimídia para criação automática de documentos: Interactors / Media-oriented operators for authoring multimedia documents: interactorsOliveros, Didier Augusto Vega 11 April 2011 (has links)
Neste trabalho foi investigado o problema de autoria automatizada de informação multimídia sob a perspectiva da computação ubíqua de modo geral, e da interação do usuário com aplicações de captura e accesso (C&A) de modo particular. O objetivo do projeto foi a definição de operadores sobre interação do usuário em ambientes e em aplicações para permitir a geração automática de documentos multimídia interativos, um dos temas de pesquisa da área de engenharia de documentos. A abordagem da proposta foi a generalização dos operadores Inkteractors, definidos sobre a interação do usuário com aplicações baseadas em tinta eletrônica, considerando a interação do usuário na voz, mensagens de texto, vídeo e lousa. Como resultado foram definido os novos Interactors: operadores de interação sobre informação capturada em aplicações que envolvem interação do usuário com as mídias. Os Interactors foram validados no contexto de engenharia de documentos ao serem utilizados para a geração automática de documentos multimídia interativos, associados a aplicações de C&A para oferecer novas possibilidades de indexar, visualizar e acessar os documentos multimídia / This study investigated the problem of automated authoring of multimedia information from the perspective of ubiquitous computing in general, and the user interaction with applications of capture and acess (C&A) in particular. The project goal was to formalize operators on user interaction environments and applications to enable automatic generation of interactive multimedia documents, one of the themes of the research area of document engineering. The proposed approach is a generalization of the Inkteractors operators, defined on the user interaction with electronic ink-based applications on the users interaction with digital voice, text messaging, video and whiteboard. As a result we defined the new Interactors: interaction operators of captured information in applications that involve user interaction with the media. TheInteractors were validated in the context of document engineering to be used for the automatic generation of interactive multimedia documents, and in C&A aplications to offer new possibilities for indexing, viewing and accessing multimedia documents
|
9 |
Models and operators for extension of active multimedia documents via annotations / Modelos e operadores para extensão de documentos multimídia ativos via anotaçõesMartins, Diogo Santana 18 November 2013 (has links)
Multimedia production is an elaborate activity composed of multiple information management and transformation tasks that support an underlying creative goal. Examples of these activities are structuring, organization, modification and versioning of media elements, all of which depend on the maintenance of supporting documentation and metadata. In professional productions, which can count on proper human and material resources, such documentation is maintained by the production crew, being key to secure a high quality in the final content. In less resourceful configurations, such as amateur-oriented productions, at least reasonable quality standards are desirable in most cases, however the perceived difficulty in managing and transforming content can inhibit amateurs on producing content with acceptable quality. This problem has been tackled in many fronts, for instance via annotation methods, smart browsing methods and authoring techniques, just to name a few. In this dissertation, the primary objective is to take advantage of user-created annotations in order to aid amateur-oriented multimedia authoring. In order to support this objective, the contributions are built around an authoring approach based on structured multimedia documents. First, a custom language for Web-based multimedia documents is defined, based on SMIL (Synchronized Multimedia Integration Language). This language brings several contributions, such as the formalization of an extended graph-based temporal layout model, live editing of document elements and extended reuse features. Second, a model for document annotation and an algebra for document transformations are defined, both of which allows composition and extraction of multimedia document fragments based on annotations. Third, the previous contributions are integrated into a Web-based authoring tool, which allows manipulating a document while it is active. Such manipulations encompass several interaction techniques for enriching, editing, publishing and extending multimedia documents. The contributions have been instantiated with multimedia sessions obtained from synchronous collaboration tools, in scenarios of video-based lectures, meetings and video-based qualitative research. Such instantiations demonstrate the applicability and utility of the contributions / Produção multimídia é uma atividade complexa composta por múltiplas atividades de gerência e transformação de informação, as quais suportam um objetivo de criar conteúdo. Exemplos dessas atividades são estruturação, organização, modificação e versionamento de elementos de mídia, os quais dependem da manutenção de documentos auxiliares e metadados. Em produções profissionais, as quais podem contar com recursos humanos e materiais adequados, tal documentação é mantida pela equipe de produção, sendo instrumental para garantir a uma alta qualidade no produto final. Em configurações com menos recursos, como produções amadoras, ao menos padrões razoáveis de qualidade são desejados na maioria dos casos, contudo a dificuldade em gerenciar e transformar conteúdo pode inibir amadores a produzir conteúdo com qualidade aceitável. Esse problema tem sido atacado em várias frentes, por exemplo via métodos de anotação, métodos de navegação e técnicas de autoria, apenas para nomear algumas. Nesta tese, o objetivo principal é tirar proveito de anotações criadas pelo usuário com o intuito de apoiar autoria multimídia por amadores. De modo a subsidiar esse objetivo, as contribuições são construídas em torno uma abordagem de autoria baseada em documentos multimídia estruturados. Primeiramente, uma linguagem customizada para documentos multimídia baseados na Web é definida, baseada na linguagem SMIL (Synchronized Multimedia Integration Language). Esta linguagem traz diversas contribuições, como a formalização de um modelo estendido para formatação temporal baseado em grafos, edição ao vivo de elementos de um documento e funcionalidades de reúso. Em segundo, um modelo para anotação de documentos e uma álgebra para transformação de documentos são definidos, ambos permitindo composição e extração de fragmentos de documentos multimídia com base em anotações. Em terceiro, as contribuições anteriores são integradas em uma ferramenta de autoria baseada na Web, a qual permite manipular um documento enquanto o mesmo está ativo. Tais manipulações envolvem diferentes técnicas de interação com o objetivo de enriquecer, editar, publicar e estender documentos multimídia interativos. As contribuições são instanciadas com sessões multimídia obtidas de ferramentas de colaboração síncrona, em cenários de aulas baseadas em vídeos, reuniões e pesquisa qualitativa baseada em vídeos. Tais instanciações demonstram a aplicabilidade e utilidade das contribuições
|
10 |
Identification non-supervisée de personnes dans les flux télévisés / Unsupervised person recognition in TV broadcastPoignant, Johann 18 October 2013 (has links)
Ce travail de thèse a pour objectif de proposer plusieurs méthodes d'identification non-supervisées des personnes présentes dans les flux télévisés à l'aide des noms écrits à l'écran. Comme l'utilisation de modèles biométriques pour reconnaître les personnes présentes dans de larges collections de vidéos est une solution peu viable sans connaissance a priori des personnes à identifier, plusieurs méthodes de l'état de l'art proposent d'employer d'autres sources d'informations pour obtenir le nom des personnes présentes. Ces méthodes utilisent principalement les noms prononcés comme source de noms. Cependant, on ne peut avoir qu'une faible confiance dans cette source en raison des erreurs de transcription ou de détection des noms et aussi à cause de la difficulté de savoir à qui fait référence un nom prononcé. Les noms écrits à l'écran dans les émissions de télévision ont été peu utilisés en raison de la difficulté à extraire ces noms dans des vidéos de mauvaise qualité. Toutefois, ces dernières années ont vu l'amélioration de la qualité des vidéos et de l'incrustation des textes à l'écran. Nous avons donc ré-évalué, dans cette thèse, l'utilisation de cette source de noms. Nous avons d'abord développé LOOV (pour Lig Overlaid OCR in Vidéo), un outil d'extraction des textes sur-imprimés à l'image dans les vidéos. Nous obtenons avec cet outil un taux d'erreur en caractères très faible. Ce qui nous permet d'avoir une confiance importante dans cette source de noms. Nous avons ensuite comparé les noms écrits et les noms prononcés dans leurs capacités à fournir le nom des personnes présentes dans les émissions de télévisions. Il en est ressorti que deux fois plus de personnes sont nommables par les noms écrits que par les noms prononcés extraits automatiquement. Un autre point important à noter est que l'association entre un nom et une personne est intrinsèquement plus simple pour les noms écrits que pour les noms prononcés. Cette très bonne source de noms nous a donc permis de développer plusieurs méthodes de nommage non-supervisé des personnes présentes dans les émissions de télévision. Nous avons commencé par des méthodes de nommage tardives où les noms sont propagés sur des clusters de locuteurs. Ces méthodes remettent plus ou moins en cause les choix fait lors du processus de regroupement des tours de parole en clusters de locuteurs. Nous avons ensuite proposé deux méthodes (le nommage intégré et le nommage précoce) qui intègrent de plus en plus l'information issue des noms écrits pendant le processus de regroupement. Pour identifier les personnes visibles, nous avons adapté la méthode de nommage précoce pour des clusters de visages. Enfin, nous avons aussi montré que cette méthode fonctionne aussi pour nommer des clusters multi-modaux voix-visage. Avec cette dernière méthode, qui nomme au cours d'un unique processus les tours de paroles et les visages, nous obtenons des résultats comparables aux meilleurs systèmes ayant concouru durant la première campagne d'évaluation REPERE / In this thesis we propose several methods for unsupervised person identification in TV broadcast using the names written on the screen. As the use of biometric models to recognize people in large video collections is not a viable option without a priori knowledge of people present in this videos, several methods of the state-of-the-art proposes to use other sources of information to get the names of those present. These methods mainly use the names pronounced as source of names. However, we can not have a good confidence in this source due to transcription or detection names errors and also due to the difficulty of knowing to who refers a pronounced name. The names written on the screen in TV broadcast have not be used in the past due to the difficulty of extracting these names in low quality videos. However, recent years have seen improvements in the video quality and overlay text integration. We therefore re-evaluated in this thesis, the use of this source of names. We first developed LOOV (for LIG Overlaid OCR in Video), this tool extract overlaid texts written in video. With this tool we obtained a very low character error rate. This allows us to have an important confidence in this source of names. We then compared the written names and pronounced names in their ability to provide the names of person present in TV broadcast. We found that twice persons are nameable by written names than by pronounced names with an automatic extraction of them. Another important point to note is that the association between a name and a person is inherently easier for written names than for pronounced names. With this excellent source of names we were able to develop several unsupervised naming methods of people in TV broadcast. We started with late naming methods where names are propagated onto speaker clusters. These methods question differently the choices made during the diarization process. We then proposed two methods (integrated naming and early naming) that incorporate more information from written names during the diarization process. To identify people appear on screen, we adapted the early naming method for faces clusters. Finally, we have also shown that this method also works for multi-modal speakers-faces clusters. With the latter method, that named speech turn and face during a single process, we obtain comparable score to the best systems that contribute during the first evaluation REPERE
|
Page generated in 0.0308 seconds