Global ETD Search

11	La plate-forme RAMSES pour un triple écran interactif : application à la génération automatique de télévision interactive / The RAMSES platform for triple display : application to automatic generation of interactive television Royer, Julien 16 December 2009 (has links) Avec la révolution du numérique, l’usage de la vidéo a fortement évolué durant les dernières décennies, passant du cinéma à la télévision, puis au web, du récit fictionnel au documentaire et de l’éditorialisation à la création par l’utilisateur. Les médias sont les vecteurs pour échanger des informations, des connaissances, des « reportages » personnels, des émotions... L’enrichissement automatique des documents multimédias est toujours un sujet de recherche depuis l’avènement des médias. Dans ce contexte, nous proposons dans un premier temps une modélisation des différents concepts et acteurs mis en œuvre pour analyser automatiquement des documents multimédias afin de déployer dynamiquement des services interactifs en relation avec le contenu des médias. Nous définissons ainsi les concepts d’analyseur, de service interactif, de description d’un document multimédia et enfin les fonctions nécessaires pour faire interagir ceux-ci. Le modèle d’analyse obtenu se démarque de la littérature en proposant une architecture modulaire, ouverte et évolutive. Nous présentons ensuite l’implantation de ces concepts dans le cadre d’un prototype de démonstration. Ce prototype permet ainsi de mettre en avant les contributions avancées dans la description des modèles. Une implantation ainsi que des recommandations sont détaillées pour chacun des modèles. Afin de montrer les résultats d’implantation des solutions proposées sur la plateforme telles que les standards MPEG-7 pour la description, MPEG-4 BIFS pour les scènes interactives ou encore OSGI pour l’architecture générale, nous présentons différents exemples de services interactifs intégrés dans la plateforme. Ceux-ci permettent de vérifier les capacités d’adaptation aux besoins d’un ou plusieurs services interactifs. / The concept developed in this thesis is to propose an architecture model allowing automatic multimedia analysis and inserting pertinent interactive contents accordingly to multimedia content. Until nowadays, studies are mainly trying to provide tools and frameworks to generate a full description of the multimedia. It can be compared as trying to describe the world since the system must have huge description capabilities. Actually, it is not possible to represent the world through a tree of concepts and relationships due to time and computer limitations. Therefore, according to the amount of multimedia analyzers developed all over the world, this thesis proposes a platform able to host, combine and share existing multimedia analyzers. Furthermore, we only consider user’s requirements to select only required elements from multimedia platform to analyze the multimedia. In order to easily adapt the platform to the service requirements, we propose a modular architecture based on plug-in multimedia analyzers to generate the contextual description of the media. Besides, we provide an interactive scene generator to dynamically create related interactive scenes. We choose the MPEG-7 standard to implement the multimedia’s description and MPEG-4 BIFS standard to implement interactive scenes into multimedia. We also present experimental results on different kind of interactive services using video real time information extraction. The main implemented example of interactive services concerns an interactive mobile TV application related to parliament session. This application aims to provide additional information to users by inserting automatically interactive contents (complementary information, subject of the current session…) into original TV program. In addition, we demonstrate the capacity of the platform to adapt to multiple domain applications through a set of simple interactive services (goodies, games...). Analyseur multimédia Métadonnée Scène interactive MPEG-4 BIFS MPEG-7 OSGI Multimedia analyzers Metadata Interactive scenes MPEG-4 BIFS MPEG-7 OSGI
12	La structuration sémantique des contenus des documents audiovisuels selon les points de vue de la production Bui Thi, Minh Phung 26 June 2003 (has links) (PDF) Dans le contexte des progrès sensibles en termes de technologies de l'information et de normes associées à la vidéo, notre travail propose des outils de description flexible et à forte teneur sémantique des contenus audiovisuels au niveau du plan – le PSDS (Production Shot Description Scheme) – orientés par l'approche sémiotique et selon les points de vue des professionnels de la production. Nous constatons que les contenus audio-visuels ont une triple dimension sémantique – la sémantique technique, la sémantique du monde narratif et la sémiotique – et chaque niveau a sa propre description ontologique. La sémiotique complète la sémantique technique et thématique du contenu de la vidéo en expliquant pour quelles raisons les structures dynamiques du texte filmique peuvent produire ces interprétations sémantiques. Mobilisant tant des techniques d'analyse automatiques des média que des modèles existants, des terminologies, des théories et des discours du monde de cinéma, cette approche peut offrir une traduction naturelle et simple entre les différents niveaux sémantiques. La description orientée-objet à multiples points de vue des contenus implique de mettre en évidence dans l'arbre ontologique les unités significatives du domaine et leurs caractéristiques. Ces unités constituent les informations fondamentales pour l'appréhension du sens du récit. Elles sont représentées par des concepts qui constituent un réseau sémantique où les utilisateurs peuvent naviguer à la recherche d'informations. Chaque concept est un noeud du réseau sémantique du domaine visé. Représentées ensuite selon le formalisme Mpeg-7 et XML Schema, les connaissances intégrées dans les schémas de description du plan de la vidéo peuvent servir de base à la construction d'environnements interactifs d'édition des images. La vidéo y devient un flux informationnel dont les données peuvent être balisées, annotées, analysées et éditées. Les métadonnées analysées dans notre travail, comprenant des informations relevant de trois étapes de la production (pré-production, production et post-production) doivent permettre aux applications de gérer et manipuler les objets de la vidéo, ainsi que les représentations de leur sémantique, afin de les réutiliser dans plusieurs offres d'accès telles que l'indexation du contenu, la recherche, le filtrage, l'analyse et l'appréhension des images du film. Indexation audiovisuelle XML Schema Mpeg-7 Conceptualisation Ontology Production audiovisuelle Sémiologie
13	A Neuro-Fuzzy Approach for Multiple Human Objects Segmentation Huang, Li-Ming 03 September 2003 (has links) We propose a novel approach for segmentation of human objects, including face and body, in image sequences. In modern video coding techniques, e.g., MPEG-4 and MPEG-7, human objects are usually the main focus for multimedia applications. We combine temporal and spatial information and employ a neuro-fuzzy mechanism to extract human objects. A fuzzy self-clustering technique is used to divide the video frame into a set of segments. The existence of a face within a candidate face region is ensured by searching for possible constellations of eye-mouth triangles and verifying each eye-mouth combination with the predefined template. Then rough foreground and background are formed based on a combination of multiple criteria. Finally, human objects in the base frame and the remaining frames of the video stream are precisely located by a fuzzy neural network which is trained by a SVD-based hybrid learning algorithm. Through experiments, we compare our system with two other approaches, and the results have shown that our system can detect face locations and extract human objects more accurately. MPEG-4 Image Segmentation Facial Feature Detection Video MPEG-7 Face Detection Neuro-Fuzzy Modeling
14	Automatic Image Annotation By Ensemble Of Visual Descriptors Akbas, Emre 01 August 2006 (has links) (PDF) Automatic image annotation is the process of automatically producing words to de- scribe the content for a given image. It provides us with a natural means of semantic indexing for content based image retrieval. In this thesis, two novel automatic image annotation systems targeting di&amp / #64256 / erent types of annotated data are proposed. The &amp / #64257 / rst system, called Supervised Ensemble of Visual Descriptors (SEVD), is trained on a set of annotated images with prede&amp / #64257 / ned class labels. Then, the system auto- matically annotates an unknown sample depending on the classi&amp / #64257 / cation results. The second system, called Unsupervised Ensemble of Visual Descriptors (UEVD), assumes no class labels. Therefore, the annotation of an unknown sample is accomplished by unsupervised learning based on the visual similarity of images. The available auto- matic annotation systems in the literature mostly use a single set of features to train a single learning architecture. On the other hand, the proposed annotation systems utilize a novel model of image representation in which an image is represented with a variety of feature sets, spanning an almost complete visual information comprising color, shape, and texture characteristics. In both systems, a separate learning entity is trained for each feature set and these entities are gathered under an ensemble learning approach. Empirical results show that both SEVD and UEVD outperform some of the state-of-the-art automatic image annotation systems in equivalent experimental setups. QA General 15707
15	Toward The Frontiers Of Stacked Generalization Architecture For Learning Mertayak, Cuneyt 01 September 2007 (has links) (PDF) In pattern recognition, &ldquo / bias-variance&rdquo / trade-off is a challenging issue that the scientists has been working to get better generalization performances over the last decades. Among many learning methods, two-layered homogeneous stacked generalization has been reported to be successful in the literature, in different problem domains such as object recognition and image annotation. The aim of this work is two-folded. First, the problems of stacked generalization are attacked by a proposed novel architecture. Then, a set of success criteria for stacked generalization is studied. A serious drawback of stacked generalization architecture is the sensitivity to curse of dimensionality problem. In order to solve this problem, a new architecture named &ldquo / unanimous decision&rdquo / is designed. The performance of this architecture is shown to be comparably similar to two layered homogeneous stacked generalization architecture in low number of classes while it performs better than stacked generalization architecture in higher number of classes. Additionally, a new success criterion for two layered homogeneous stacked generalization architecture is proposed based on the individual properties of the used descriptors and it is verified in synthetic datasets. QA Computer Software 76.75-76.765
16	Ontology Based Semantic Retrieval Of Video Contents Using Metadata Akpinar, Samet 01 September 2007 (has links) (PDF) The aim of this thesis is the development of an infrastructure which is used for semantic retrieval of multimedia contents. Motivated by the needs of semantic search and retrieval of multimedia contents, operating directly on the MPEG-7 based annotations can be thought as a reasonable way for meeting these needs as MPEG-7 is a common standard providing a wide multimedia content description schema. However, it is clear that the MPEG-7 formalism is deficient about the semantics and reasoning support. From this perspective, additionally, we need to represent MPEG-7 descriptions in a new formalism in order to fill the gap about semantics and reasoning. Then, the semantic web and multimedia technologies intercept at this point of multimedia semantics. In this thesis, OWL Web Ontology Language, which is based on description logic has been utilized to model a connection between the ontology semantics and video metadata. Modeling the domain of the videos using ontologies and the MPEG-7 descriptions, and reasoning on the videos by the help of the logical formalism of these ontologies are the main objectives of the thesis.
17	Hanolistic: A Hierarchical Automatic Image Annotation System Using Holistic Approach Oztimur, Ozge 01 January 2008 (has links) (PDF) Automatic image annotation is the process of assigning keywords to digital images depending on the content information. In one sense, it is a mapping from the visual content information to the semantic context information. In this thesis, we propose a novel approach for automatic image annotation problem, where the annotation is formulated as a multivariate mapping from a set of independent descriptor spaces, representing a whole image, to a set of words, representing class labels. For this purpose, a hierarchical annotation architecture, named as HANOLISTIC (Hierarchical Image Annotation System Using Holistic Approach), is dened with two layers. At the rst layer, called level-0 annotator, each annotator is fed by a set of distinct descriptor, extracted from the whole image. This enables us to represent the image at each annotator by a dierent visual property of a descriptor. Since, we use the whole image, the problematic segmentation process is avoided. Training of each annotator is accomplished by a supervised learning paradigm, where each word is represented by a class label. Note that, this approach is slightly dierent then the classical training approaches, where each data has a unique label. In the proposed system, since each image has one or more annotating words, we assume that an image belongs to more than one class. The output of the level-0 annotators indicate the membership values of the words in the vocabulary, to belong an image. These membership values from each annotator is, then, aggregated at the second layer by using various rules, to obtain meta-layer annotator. The rules, employed in this study, involves summation and/or weighted summation of the output of layer-0 annotators. Finally, a set of words from the vocabulary is selected based on the ranking of the output of meta-layer. The hierarchical annotation system proposed in this thesis outperforms state of the art annotation systems based on segmental and holistic approaches. The proposed system is examined in-depth and compared to the other systems in the literature by means of using several performance criteria. QA General 15707
18	Ontology-based Spatio-temporal Video Management System Simsek, Atakan 01 September 2009 (has links) (PDF) In this thesis, a system, called Ontology-Based Spatio-Temporal Video Management System (OntoVMS) is developed in order to supply a framework which can be used for semantic data modeling and querying in video files. OntoVMS supports semantic data modeling which can be divided into concept modeling, spatio-temporal relation and trajectory data modeling. The system uses Rhizomik MPEG-7 Ontology as the core ontology. Moreover ontology expression capability is extended by automatically attaching domain ontologies. OntoVMS supports querying of all spatial relations such as directional relations (north, south ...), mixed directional relations (northeast, southwest ...), distance relations (near, far), positional relations (above, below ...) and topological relations (inside, touch ...) / temporal relations such as starts, equal, precedes / and trajectories of objects of interest. In order to enhance querying capability, compound queries are added to the framework so that the user can combine simple queries by using &quot / (&quot / , &quot / )&quot / , &quot / AND&quot / and &quot / OR&quot / operators. Finally, the use of the system is demonstrated with a semi-automatic face annotation tool. QA Computer Software 76.75-76.765
19	Natural Language Query Processing In Ontology Based Multimedia Databases Alaca Aygul, Filiz 01 May 2010 (has links) (PDF) In this thesis a natural language query interface is developed for semantic and spatio-temporal querying of MPEG-7 based domain ontologies. The underlying ontology is created by attaching domain ontologies to the core Rhizomik MPEG-7 ontology. The user can pose concept, complex concept (objects connected with an &ldquo / AND&rdquo / or &ldquo / OR&rdquo / connector), spatial (left, right . . . ), temporal (before, after, at least 10 minutes before, 5 minutes after . . . ), object trajectory and directional trajectory (east, west, southeast . . . , left, right, upwards . . . ) queries to the system. Furthermore, the system handles the negative meaning in the user input. When the user enters a natural language (NL) input, it is parsed with the link parser. According to query type, the objects, attributes, spatial relation, temporal relation, trajectory relation, time filter and time information are extracted from the parser output by using predefined rules. After the information extraction, SPARQL queries are generated, and executed against the ontology by using an RDF API. Results are retrieved and they are used to calculate spatial, temporal, and trajectory relations between objects. The results satisfying the required relations are displayed in a tabular format and user can navigate through the multimedia content. QA General 15707
20	Semi-automatic Semantic Video Annotation Tool Aydinlilar, Merve 01 December 2011 (has links) (PDF) Semantic annotation of video content is necessary for indexing and retrieval tasks of video management systems. Currently, it is not possible to extract all high-level semantic information from video data automatically. Video annotation tools assist users to generate annotations to represent video data. Generated annotations can also be used for testing and evaluation of content based retrieval systems. In this study, a semi-automatic semantic video annotation tool is presented. Generated annotations are in MPEG-7 metadata format to ensure interoperability. With the help of image processing and pattern recognition solutions, annotation process is partly automated and annotation time is reduced. Annotations can be done for spatio-temporal decompositions of video data. Extraction of low-level visual descriptions are included to obtain complete descriptions.

Search results