• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 63
  • 9
  • 6
  • 4
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 101
  • 101
  • 30
  • 25
  • 23
  • 23
  • 23
  • 22
  • 19
  • 17
  • 16
  • 15
  • 12
  • 12
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Motion Capture of Deformable Surfaces in Multi-View Studios

Cagniart, Cedric 16 July 2012 (has links) (PDF)
In this thesis we address the problem of digitizing the motion of three-dimensional shapes that move and deform in time. These shapes are observed from several points of view with cameras that record the scene's evolution as videos. Using available reconstruction methods, these videos can be converted into a sequence of three-dimensional snapshots that capture the appearance and shape of the objects in the scene. The focus of this thesis is to complement appearance and shape with information on the motion and deformation of objects. In other words, we want to measure the trajectory of every point on the observed surfaces. This is a challenging problem because the captured videos are only sequences of images, and the reconstructed shapes are built independently from each other. While the human brain excels at recreating the illusion of motion from these snapshots, using them to automatically measure motion is still largely an open problem. The majority of prior works on the subject has focused on tracking the performance of one human actor, and used the strong prior knowledge on the articulated nature of human motion to handle the ambiguity and noise inherent to visual data. In contrast, the presented developments consist of generic methods that allow to digitize scenes involving several humans and deformable objects of arbitrary nature. To perform surface tracking as generically as possible, we formulate the problem as the geometric registration of surfaces and deform a reference mesh to fit a sequence of independently reconstructed meshes. We introduce a set of algorithms and numerical tools that integrate into a pipeline whose output is an animated mesh. Our first contribution consists of a generic mesh deformation model and numerical optimization framework that divides the tracked surface into a collection of patches, organizes these patches in a deformation graph and emulates elastic behavior with respect to the reference pose. As a second contribution, we present a probabilistic formulation of deformable surface registration that embeds the inference in an Expectation-Maximization framework that explicitly accounts for the noise and in the acquisition. As a third contribution, we look at how prior knowledge can be used when tracking articulated objects, and compare different deformation model with skeletal-based tracking. The studies reported by this thesis are supported by extensive experiments on various 4D datasets. They show that in spite of weaker assumption on the nature of the tracked objects, the presented ideas allow to process complex scenes involving several arbitrary objects, while robustly handling missing data and relatively large reconstruction artifacts.
62

Lipreading across multiple views

Lucey, Patrick Joseph January 2007 (has links)
Visual information from a speaker's mouth region is known to improve automatic speech recognition (ASR) robustness, especially in the presence of acoustic noise. Currently, the vast majority of audio-visual ASR (AVASR) studies assume frontal images of the speaker's face, which is a rather restrictive human-computer interaction (HCI) scenario. The lack of research into AVASR across multiple views has been dictated by the lack of large corpora that contains varying pose/viewpoint speech data. Recently, research has concentrated on recognising human be- haviours within &quotmeeting " or &quotlecture " type scenarios via &quotsmart-rooms ". This has resulted in the collection of audio-visual speech data which allows for the recognition of visual speech from both frontal and non-frontal views to occur. Using this data, the main focus of this thesis was to investigate and develop various methods within the confines of a lipreading system which can recognise visual speech across multiple views. This reseach constitutes the first published work within the field which looks at this particular aspect of AVASR. The task of recognising visual speech from non-frontal views (i.e. profile) is in principle very similar to that of frontal views, requiring the lipreading system to initially locate and track the mouth region and subsequently extract visual features. However, this task is far more complicated than the frontal case, because the facial features required to locate and track the mouth lie in a much more limited spatial plane. Nevertheless, accurate mouth region tracking can be achieved by employing techniques similar to frontal facial feature localisation. Once the mouth region has been extracted, the same visual feature extraction process can take place to the frontal view. A novel contribution of this thesis, is to quantify the degradation in lipreading performance between the frontal and profile views. In addition to this, novel patch-based analysis of the various views is conducted, and as a result a novel multi-stream patch-based representation is formulated. Having a lipreading system which can recognise visual speech from both frontal and profile views is a novel contribution to the field of AVASR. How- ever, given both the frontal and profile viewpoints, this begs the question, is there any benefit of having the additional viewpoint? Another major contribution of this thesis, is an exploration of a novel multi-view lipreading system. This system shows that there does exist complimentary information in the additional viewpoint (possibly that of lip protrusion), with superior performance achieved in the multi-view system compared to the frontal-only system. Even though having a multi-view lipreading system which can recognise visual speech from both front and profile views is very beneficial, it can hardly considered to be realistic, as each particular viewpoint is dedicated to a single pose (i.e. front or profile). In an effort to make the lipreading system more realistic, a unified system based on a single camera was developed which enables a lipreading system to recognise visual speech from both frontal and profile poses. This is called pose-invariant lipreading. Pose-invariant lipreading can be performed on either stationary or continuous tasks. Methods which effectively normalise the various poses into a single pose were investigated for the stationary scenario and in another contribution of this thesis, an algorithm based on regularised linear regression was employed to project all the visual speech features into a uniform pose. This particular method is shown to be beneficial when the lipreading system was biased towards the dominant pose (i.e. frontal). The final contribution of this thesis is the formulation of a continuous pose-invariant lipreading system which contains a pose-estimator at the start of the visual front-end. This system highlights the complexity of developing such a system, as introducing more flexibility within the lipreading system invariability means the introduction of more error. All the works contained in this thesis present novel and innovative contributions to the field of AVASR, and hopefully this will aid in the future deployment of an AVASR system in realistic scenarios.
63

Light field remote vision / Algorithmes de traitement et de visualisation pour la vision plénoptique à grande distance

Nieto, Grégoire 03 October 2017 (has links)
Les champs de lumière ont attisé la curiosité durant ces dernières décennies. Capturés par une caméra plénoptique ou un ensemble de caméras, ils échantillonnent la fonction plénoptique qui informe sur la radiance de n'importe quel rayon lumineux traversant la scène observée. Les champs lumineux offrent de nombreuses applications en vision par ordinateur comme en infographie, de la reconstruction 3D à la segmentation, en passant par la synthèse de vue, l'inpainting ou encore le matting par exemple.Dans ce travail nous nous attelons au problème de reconstruction du champ de lumière dans le but de synthétiser une image, comme si elle avait été prise par une caméra plus proche du sujet de la scène que l'appareil de capture plénoptique. Notre approche consiste à formuler la reconstruction du champ lumineux comme un problème de rendu basé image (IBR). La plupart des algorithmes de rendu basé image s'appuient dans un premier temps sur une reconstruction 3D approximative de la scène, appelée proxy géométrique, afin d'établir des correspondances entre les points image des vues sources et ceux de la vue cible. Une nouvelle vue est générée par l'utilisation conjointe des images sources et du proxy géométrique, bien souvent par la projection des images sources sur le point de vue cible et leur fusion en intensité.Un simple mélange des couleurs des images sources ne garantit pas la cohérence de l'image synthétisée. Nous proposons donc une méthode de rendu direct multi-échelles basée sur les pyramides de laplaciens afin de fusionner les images sources à toutes les fréquences, prévenant ainsi l'apparition d'artefacts de rendu.Mais l'imperfection du proxy géométrique est aussi la cause d'artefacts de rendu, qui se traduisent par du bruit en haute fréquence dans l'image synthétisée. Nous introduisons une nouvelle méthode de rendu variationnelle avec des contraintes sur les gradients de l'image cible dans le but de mieux conditionner le système d'équation linéaire à résoudre et supprimer les artefacts de rendu dus au proxy.Certaines scènes posent de grandes difficultés de reconstruction du fait du caractère non-lambertien éventuel de certaines surfaces~; d'autre part même un bon proxy ne suffit pas, lorsque des réflexions, transparences et spécularités remettent en cause les règles de la parallaxe. Nous proposons méthode originale basée sur l'approximation locale de l'espace plénoptique à partir d'un échantillonnage épars afin de synthétiser n'importe quel point de vue sans avoir recours à la reconstruction explicite d'un proxy géométrique. Nous évaluons notre méthode à la fois qualitativement et quantitativement sur des scènes non-triviales contenant des matériaux non-lambertiens.Enfin nous ouvrons une discussion sur le problème du placement optimal de caméras contraintes pour le rendu basé image, et sur l'utilisation de nos algorithmes pour la vision d'objets dissimulés derrière des camouflages.Les différents algorithmes proposés sont illustrés par des résultats sur des jeux de données plénoptiques structurés (de type grilles de caméras) ou non-structurés. / Light fields have gathered much interest during the past few years. Captured from a plenoptic camera or a camera array, they sample the plenoptic function that provides rich information about the radiance of any ray passing through the observed scene. They offer a pletora of computer vision and graphics applications: 3D reconstruction, segmentation, novel view synthesis, inpainting or matting for instance.Reconstructing the light field consists in recovering the missing rays given the captured samples. In this work we cope with the problem of reconstructing the light field in order to synthesize an image, as if it was taken by a camera closer to the scene than the input plenoptic device or set of cameras. Our approach is to formulate the light field reconstruction challenge as an image-based rendering (IBR) problem. Most of IBR algorithms first estimate the geometry of the scene, known as a geometric proxy, to make correspondences between the input views and the target view. A new image is generated by the joint use of both the input images and the geometric proxy, often projecting the input images on the target point of view and blending them in intensity.A naive color blending of the input images do not guaranty the coherence of the synthesized image. Therefore we propose a direct multi-scale approach based on Laplacian rendering to blend the source images at all the frequencies, thus preventing rendering artifacts.However, the imperfection of the geometric proxy is also a main cause of rendering artifacts, that are displayed as a high-frequency noise in the synthesized image. We introduce a novel variational rendering method with gradient constraints on the target image for a better-conditioned linear system to solve, removing the high-frequency noise due to the geometric proxy.Some scene reconstructions are very challenging because of the presence of non-Lambertian materials; moreover, even a perfect geometric proxy is not sufficient when reflections, transparencies and specularities question the rules of parallax. We propose an original method based on the local approximation of the sparse light field in the plenoptic space to generate a new viewpoint without the need for any explicit geometric proxy reconstruction. We evaluate our method both quantitatively and qualitatively on non-trivial scenes that contain non-Lambertian surfaces.Lastly we discuss the question of the optimal placement of constrained cameras for IBR, and the use of our algorithms to recover objects that are hidden behind a camouflage.The proposed algorithms are illustrated by results on both structured (camera arrays) and unstructured plenoptic datasets.
64

Motion Capture of Deformable Surfaces in Multi-View Studios / Acquisition de surfaces déformables à partir d'un système multicaméra calibré

Cagniart, Cédric 16 July 2012 (has links)
Cette thèse traite du suivi temporel de surfaces déformables. Ces surfaces sont observées depuis plusieurs points de vue par des caméras qui capturent l'évolution de la scène et l'enregistrent sous la forme de vidéos. Du fait des progrès récents en reconstruction multi-vue, cet ensemble de vidéos peut être converti en une série de clichés tridimensionnels qui capturent l'apparence et la forme des objets dans la scène. Le problème au coeur des travaux rapportés par cette thèse est de complémenter les informations d'apparence et de forme avec des informations sur les mouvements et les déformations des objets. En d'autres mots, il s'agit de mesurer la trajectoire de chacun des points sur les surfaces observées. Ceci est un problème difficile car les vidéos capturées ne sont que des séquences d'images, et car les formes reconstruites à chaque instant le sont indépendemment les unes des autres. Si le cerveau humain excelle à recréer l'illusion de mouvement à partir de ces clichés, leur utilisation pour la mesure automatisée du mouvement reste une question largement ouverte. La majorité des précédents travaux sur le sujet se sont focalisés sur la capture du mouvement humain et ont bénéficié de la nature articulée de ce mouvement qui pouvait être utilisé comme a-priori dans les calculs. La spécificité des développements présentés ici réside dans la généricité des méthodes qui permettent de capturer des scènes dynamiques plus complexes contenant plusieurs acteurs et différents objets déformables de nature inconnue a priori. Pour suivre les surfaces de la façon la plus générique possible, nous formulons le problème comme celui de l'alignement géométrique de surfaces, et déformons un maillage de référence pour l'aligner avec les maillages indépendemment reconstruits de la séquence. Nous présentons un ensemble d'algorithmes et d'outils numériques intégrés dans une chaîne de traitements dont le résultat est un maillage animé. Notre première contribution est une méthode de déformation de maillage qui divise la surface en une collection de morceaux élémentaires de surfaces que nous nommons patches. Ces patches sont organisés dans un graphe de déformation, et une force est appliquée sur cette structure pour émuler une déformation élastique par rapport à la pose de référence. Comme seconde contribution, nous présentons une formulation probabiliste de l'alignement de surfaces déformables qui modélise explicitement le bruit dans le processus d'acquisition. Pour finir, nous étudions dans quelle mesure les a-prioris sur la nature articulée du mouvement peuvent aider, et comparons différents modèles de déformation à une méthode de suivi de squelette. Les développements rapportés par cette thèse sont validés par de nombreuses expériences sur une variété de séquences. Ces résultats montrent qu'en dépit d'a-prioris moins forts sur les surfaces suivies, les idées présentées permettent de traiter des scènes complexes contenant de multiples objets tout en se comportant de façon robuste vis-a-vis de données fragmentaires et d'erreurs de reconstruction. / In this thesis we address the problem of digitizing the motion of three-dimensional shapes that move and deform in time. These shapes are observed from several points of view with cameras that record the scene's evolution as videos. Using available reconstruction methods, these videos can be converted into a sequence of three-dimensional snapshots that capture the appearance and shape of the objects in the scene. The focus of this thesis is to complement appearance and shape with information on the motion and deformation of objects. In other words, we want to measure the trajectory of every point on the observed surfaces. This is a challenging problem because the captured videos are only sequences of images, and the reconstructed shapes are built independently from each other. While the human brain excels at recreating the illusion of motion from these snapshots, using them to automatically measure motion is still largely an open problem. The majority of prior works on the subject has focused on tracking the performance of one human actor, and used the strong prior knowledge on the articulated nature of human motion to handle the ambiguity and noise inherent to visual data. In contrast, the presented developments consist of generic methods that allow to digitize scenes involving several humans and deformable objects of arbitrary nature. To perform surface tracking as generically as possible, we formulate the problem as the geometric registration of surfaces and deform a reference mesh to fit a sequence of independently reconstructed meshes. We introduce a set of algorithms and numerical tools that integrate into a pipeline whose output is an animated mesh. Our first contribution consists of a generic mesh deformation model and numerical optimization framework that divides the tracked surface into a collection of patches, organizes these patches in a deformation graph and emulates elastic behavior with respect to the reference pose. As a second contribution, we present a probabilistic formulation of deformable surface registration that embeds the inference in an Expectation-Maximization framework that explicitly accounts for the noise and in the acquisition. As a third contribution, we look at how prior knowledge can be used when tracking articulated objects, and compare different deformation model with skeletal-based tracking. The studies reported by this thesis are supported by extensive experiments on various 4D datasets. They show that in spite of weaker assumption on the nature of the tracked objects, the presented ideas allow to process complex scenes involving several arbitrary objects, while robustly handling missing data and relatively large reconstruction artifacts.
65

Infrared image-based modeling and rendering

Wretstam, Oskar January 2017 (has links)
Image based modeling using visual images has undergone major development during the earlier parts of the 21th century. In this thesis a system for automated uncalibrated scene reconstruction using infrared images is implemented and tested. An automated reconstruction system could serve to simplify thermal inspection or as a demonstration tool. Thermal images will in general have lower resolution, less contrast and less high frequency content as compared to visual images. These characteristics of infrared images further complicates feature extraction and matching, key steps in the reconstruction process. In order to remedy the complication preprocessing methods are suggested and tested as well. Infrared modeling will also impose additional demands on the reconstruction as it is of importance to maintain thermal accuracy of the images in the product. Three main results are obtained from this thesis. Firstly, it is possible to obtain camera calibration and pose as well as a sparse point cloud reconstruction from an infrared image sequence using the suggested implementation. Secondly, correlation of thermal measurements from the images used to reconstruct three dimensional coordinates is presented and analyzed. Lastly, from the preprocessing evaluation it is concluded that the tested methods are not suitable. The methods will increase computational cost while improvements in the model are not proportional. / Bildbaserad modellering med visuella bilder har genomgått en stor utveckling under de tidigare delarna av 2000-talet. Givet en sekvens bestående av vanliga tvådimensionella bilder på en scen från olika perspektiv så är målet att rekonstruera en tredimensionell modell. I denna avhandling implementeras och testas ett system för automatiserad okalibrerad scenrekonstruktion från infraröda bilder. Okalibrerad rekonstruktion refererar till det faktum att parametrar för kameran, såsom fokallängd och fokus, är okända och enbart bilder används som indata till systemet. Ett stort användingsområde för värmekameror är inspektion. Temperaturskillnader i en bild kan indikera till exempel dålig isolering eller hög friktion. Om ett automatiserat system kan skapa en tredimensionell modell av en scen så kan det bidra till att förenkla inspektion samt till att ge en bättre överblick. Värmebilder kommer generellt att ha lägre upplösning, mindre kontrast och mindre högfrekvensinnehåll jämfört med visuella bilder. Dessa egenskaper hos infraröda bilder komplicerar extraktion och matchning av punkter i bilderna vilket är viktiga steg i rekonstruktionen. För att åtgärda komplikationen förbehandlas bilderna innan rekonstruktionen, ett urval av metoder för förbehandling har testats. Rekonstruktion med värmebilder kommer också att ställa ytterligare krav på rekonstruktionen, detta eftersom det är viktigt att bibehålla termisk noggrannhet från bilderna i modellen. Tre huvudresultat erhålls från denna avhandling. För det första är det möjligt att beräkna kamerakalibrering och position såväl som en gles rekonstruktion från en infraröd bildsekvens, detta med implementationen som föreslås i denna avhandling. För det andra presenteras och analyseras korrelationen för temperaturmätningar i bilderna som används för rekonstruktionen. Slutligen så visar den testade förbehandlingen inte en förbättring av rekonstruktionen som är propotionerlig med den ökade beräkningskomplexiteten.
66

Multi-View Video Transmission over the Internet

Abdullah Jan, Mirza, Ahsan, Mahmododfateh January 2010 (has links)
3D television using multiple views rendering is receiving increasing interest. In this technology a number of video sequences are transmitted simultaneously and provides a larger view of the scene or stereoscopic viewing experience. With two views stereoscopic rendition is possible. Nowadays 3D displays are available that are capable of displaying several views simultaneously and the user is able to see different views by moving his head. The thesis work aims at implementing a demonstration system with a number of simultaneous views. The system will include two cameras, computers at both the transmitting and receiving end and a multi-view display. Besides setting up the hardware, the main task is to implement software so that the transmission can be done over an IP-network. This thesis report includes an overview and experiences of similar published systems, the implementation of real time video, its compression, encoding, and transmission over the internet with the help of socket programming and finally the multi-view display in 3D format.  This report also describes the design considerations more precisely regarding the video coding and network protocols.
67

Design and Analysis of Techniques for Multiple-Instance Learning in the Presence of Balanced and Skewed Class Distributions

Wang, Xiaoguang January 2015 (has links)
With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, the Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decision-making processes. Existing knowledge discovery and data analyzing techniques have shown great success in many real-world applications such as applying Automatic Target Recognition (ATR) methods to detect targets of interest in imagery, drug activity prediction, computer vision recognition, and so on. Among these techniques, Multiple-Instance (MI) learning is different from standard classification since it uses a set of bags containing many instances as input. The instances in each bag are not labeled | instead the bags themselves are labeled. In this area many researchers have accomplished a lot of work and made a lot of progress. However, there still exist some areas which are not covered. In this thesis, we focus on two topics of MI learning: (1) Investigating the relationship between MI learning and other multiple pattern learning methods, which include multi-view learning, data fusion method and multi-kernel SVM. (2) Dealing with the class imbalance problem of MI learning. In the first topic, three different learning frameworks will be presented for general MI learning. The first uses multiple view approaches to deal with MI problem, the second is a data fusion framework, and the third framework, which is an extension of the first framework, uses multiple-kernel SVM. Experimental results show that the approaches presented work well on solving MI problem. The second topic is concerned with the imbalanced MI problem. Here we investigate the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. For this problem, we propose three solution frameworks: a data re-sampling framework, a cost-sensitive boosting framework and an adaptive instance-weighted boosting SVM (with the name IB_SVM) for MI learning. Experimental results - on both benchmark datasets and application datasets - show that the proposed frameworks are proved to be effective solutions for the imbalanced problem of MI learning.
68

Une approche multi-vue pour la modélisation système de propriétés fonctionnelles et non-fonctionnelles / Modeling functional and non-functional properties of systems based on a multi-view approach

Gómez Cárdenas, Carlos Ernesto 20 December 2013 (has links)
Au niveau système, un ensemble d'experts spécifient des propriétés fonctionnelles et non fonctionnelles en utilisant chacun leurs propres modèles théoriques, outils et environnements. Chacun essaye d'utiliser les formalismes les plus adéquats en fonction des propriétés à vérifier. Cependant, chacune des vues d'expertise pour un domaine s'appuie sur un socle commun et impacte direct ou indirectement les modèles décrits par les autres experts. Il est donc indispensable de maintenir une cohérence sémantique entre les différents points de vue, et de pouvoir réconcilier et agréger chacun des points de vue avant de poursuivre les différentes phases d'analyse. Cette thèse propose un modèle, dénommé PRISMSYS, qui s'appuie sur une approche multi-vue dirigée par les modèles et dans laquelle pour chacun des domaines, chaque expert décrit les concepts de son domaine et la relation que ces concepts entretiennent avec le modèle socle. L'approche permet de maintenir la cohérence sémantique entre les différentes vues à travers la manipulation d'événements et d'horloges logiques. PRISMSYS est basé sur un profil UML qui s'appuie autant que possible sur les profils SysML et MARTE. Le modèle sémantique qui maintien la cohérence est spécifié avec le langage CCSL qui est un langage formel déclaratif pour la spécification de relations causales et temporelles entre les événements de différentes vues. L'environnement proposé par PRISMSYS permet la co-simulation du modèle et l'analyse. L'approche est illustrée en s'appuyant sur une architecture matérielle dans laquelle le domaine d'analyse privilégié est un domaine de consommation de puissance. / At the system-level, experts specify functional and non-functional properties by employing their own theoretical models, tools and environments. Such experts attempt to use the most adequate formalisms to verify the defined system properties in a specific domain. Nevertheless, each one of these experts' views is supported on a common base and impacts directly or indirectly the models described by the other experts. Therefore, it is essential to keep a semantic coherence among the different points of view, and also to be able to reconcile and to include all the points of view before undertaking the different phases of the analysis. This thesis proposes a specific domain model called PRISMSYS. This model is based on a model-driven multi-view approach where the concepts, and the relationships between them, are described for each experts' domain. Moreover, these concepts maintain a relation with a backbone model. PRISMSYS allows keeping a semantic coherence among the different views by means of the manipulation of events and logical clocks. PRISMSYS is represented in an UML profile, supported as much as possible by SysML and MARTE. The semantic model, which preserves the view coherence, is specified by using CCSL, a declarative formal language for the specification of causal and temporal relationships between events of different views. The environment proposed by PRISMSYS allows the co-simulation of the model and its analysis. The approach is illustrated taking as case study an electronic system, where the main domain analysis is power consumption.
69

Relational Representation Learning Incorporating Textual Communication for Social Networks

Yi-Yu Lai (10157291) 01 March 2021 (has links)
<div>Representation learning (RL) for social networks facilitates real-world tasks such as visualization, link prediction and friend recommendation. Many methods have been proposed in this area to learn continuous low-dimensional embedding of nodes, edges or relations in social and information networks. However, most previous network RL methods neglect social signals, such as textual communication between users (nodes). Unlike more typical binary features on edges, such as post likes and retweet actions, social signals are more varied and contain ambiguous information. This makes it more challenging to incorporate them into RL methods, but the ability to quantify social signals should allow RL methods to better capture the implicit relationships among real people in social networks. Second, most previous work in network RL has focused on learning from homogeneous networks (i.e., single type of node, edge, role, and direction) and thus, most existing RL methods cannot capture the heterogeneous nature of relationships in social networks. Based on these identified gaps, this thesis aims to study the feasibility of incorporating heterogeneous information, e.g., texts, attributes, multiple relations and edge types (directions), to learn more accurate, fine-grained network representations. </div><div> </div><div>In this dissertation, we discuss a preliminary study and outline three major works that aim to incorporate textual interactions to improve relational representation learning. The preliminary study learns a joint representation that captures the textual similarity in content between interacting nodes. The promising results motivate us to pursue broader research on using social signals for representation learning. The first major component aims to learn explicit node and relation embeddings in social networks. Traditional knowledge graph (KG) completion models learn latent representations of entities and relations by interpreting them as translations operating on the embedding of the entities. However, existing approaches do not consider textual communications between users, which contain valuable information to provide meaning and context for social relationships. We propose a novel approach that incorporates textual interactions between each pair of users to improve representation learning of both users and relationships. The second major component focuses on analyzing how users interact with each other via natural language content. Although the data is interconnected and dependent, previous research has primarily focused on modeling the social network behavior separately from the textual content. In this work, we model the data in a holistic way, taking into account the connections between the social behavior of users and the content generated when they interact, by learning a joint embedding over user characteristics and user language. In the third major component, we consider the task of learning edge representations in social networks. Edge representations are especially beneficial as we need to describe or explain the relationships, activities, and interactions among users. However, previous work in this area lack well-defined edge representations and ignore the relational signals over multiple views of social networks, which typically contain multi-view contexts (due to multiple edge types) that need to be considered when learning the representation. We propose a new methodology that captures asymmetry in multiple views by learning well-defined edge representations and incorporates textual communications to identify multiple sources of social signals that moderate the impact of different views between users.</div>
70

Integral Video Coding

Yang, Fan January 2014 (has links)
In recent years, 3D camera products and prototypes based on Integral imaging (II) technique have gradually emerged and gained broad attention. II is a method that spatially samples the natural light (light field) of a scene, usually using a microlens array or a camera array and records the light field using a high resolution 2D image sensor. The large amount of data generated by II and the redundancy it contains together lead to the need for an efficient compression scheme. During recent years, the compression of 3D integral images has been widely researched. Nevertheless, there have not been many approaches proposed regarding the compression of integral videos (IVs). The objective of the thesis is to investigate efficient coding methods for integral videos. The integral video frames used are captured by the first consumer used light field camera Lytro. One of the coding methods is to encode the video data directly by an H.265/HEVC encoder. In other coding schemes the integral video is first converted to an array of sub-videos with different view perspectives. The sub-videos are then encoded either independently or following a specific reference picture pattern which uses a MVHEVC encoder. In this way the redundancy between the multi-view videos is utilized instead of the original elemental images. Moreover, by varying the pattern of the subvideo input array and the number of inter-layer reference pictures, the coding performance can be further improved. Considering the intrinsic properties of the input video sequences, a QP-per-layer scheme is also proposed in this thesis. Though more studies would be required regarding time and complexity constraints for real-time applications as well as dramatic increase of number of views, the methods proposed inthis thesis prove to be an efficient compression for integral videos.

Page generated in 0.34 seconds