• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 13
  • 1
  • Tagged with
  • 19
  • 19
  • 7
  • 5
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Creating an Autonomously Interactive Virtual Character

Zhang, Liang 12 July 2023 (has links)
This thesis describes a set of techniques and methods to create an autonomously interactive character as a virtual opponent in a Virtual Reality (VR) application for anticipation training in sports karate. Convincing karate movements are created by blending adequate motion capture data in real-time with motion blending. Attack techniques with a precise attacking position are achieved by using hybrid Inverse Kinematics. Stepping movements are created with a motion graph approach, enabling the virtual character to adapt the position relative to the user’s position. An offline preprocessing method is proposed to cope with a large amount of motion capture data. The realized character can autonomously and interactively respond to the trainee’s movement and perform a suitable hand attack. A user study among karate athletes shows a general acceptance of the implemented system as a useful tool for anticipation training.:List of Figures List of Tables 1 Introduction 2 Human Motion Synthesis and Animation 3 Combining Inverse Kinematics with Motion Blending 4 Automatic Motion Base Creation 5 Create an Autonomous Interactive Character 6 Conclusion and Discussion Appendices Bibliography
12

Statistical Models for Human Motion Synthesis / Modèles statistiques pour la synthèse du mouvement humain

Wang, Qi 09 July 2018 (has links)
Cette thèse porte sur la synthèse de séquences de motion capture avec des modèles statistiques. La synthèse de ce type de séquences est une tâche pertinente pour des domaines d'application divers tels que le divertissement, l'interaction homme-machine, la robotique, etc. Du point de vue de l'apprentissage machine, la conception de modèles de synthèse consiste à apprendre des modèles génératifs, ici pour des données séquentielles. Notre point de départ réside dans deux problèmes principaux rencontrés lors de la synthèse de données de motion capture, assurer le réalisme des positions et des mouvements, et la gestion de la grande variabilité dans ces données. La variabilité vient d'abord des caractéristiques individuelles, nous ne bougeons pas tous de la même manière mais d'une façon qui dépend de notre personnalité, de notre sexe, de notre âge de notre morphologie, et de facteurs de variation plus court terme tels que notre état émotionnel, que nous soyons fatigués, etc.Une première partie présente des travaux préliminaires que nous avons réalisés en étendant des approches de l'état de l'art basées sur des modèles de Markov cachés et des processus gaussiens pour aborder les deux problèmes principaux liés au réalisme et à la variabilité. Nous décrivons d'abord une variante de modèles de Markov cachés contextuels pour gérer la variabilité dans les données en conditionnant les paramètres des modèles à une information contextuelle supplémentaire telle que l'émotion avec laquelle un mouvement a été effectué. Nous proposons ensuite une variante d'une méthode de l'état de l'art utilisée pour réaliser une tâche de synthèse de mouvement spécifique appelée Inverse Kinematics, où nous exploitons les processus gaussiens pour encourager le réalisme de chacune des postures d'un mouvement généré. Nos résultats montrent un certain potentiel de ces modèles statistiques pour la conception de systèmes de synthèse de mouvement humain. Pourtant, aucune de ces technologies n'offre la flexibilité apportée par les réseaux de neurones et la récente révolution de l'apprentissage profond et de l'apprentissage Adversarial que nous abordons dans la deuxième partie.La deuxième partie de la thèse décrit les travaux que nous avons réalisés avec des réseaux de neurones et des architectures profondes. Nos travaux s'appuient sur la capacité des réseaux neuronaux récurrents à traiter des séquences complexes et sur l'apprentissage Adversarial qui a été introduit très récemment dans la communauté du Deep Learning pour la conception de modèles génératifs performants pour des données complexes, notamment images. Nous proposons une première architecture simple qui combine l'apprentissage Adversarial et des autoencodeurs de séquences, qui permet de mettre au point des systèmes performants de génération aléatoire de séquences réalistes de motion capture. A partir de cette architecture de base, nous proposons plusieurs variantes d'architectures neurales conditionnelles qui permettent de concevoir des systèmes de synthèse que l'on peut contrôler dans une certaine mesure en fournissant une information de haut niveau à laquelle la séquence générée doit correspondre, par exemple l'émotion avec laquelle une activité est réalisée. Pour terminer nous décrivons une dernière variante qui permet de réaliser de l'édition de séquences de motion capture, où le système construit permet de générer une séquence dans le style d'une autre séquence, réelle. / This thesis focuses on the synthesis of motion capture data with statistical models. Motion synthesis is a task of interest for important application fields such as entertainment, human-computer interaction, robotics, etc. It may be used to drive a virtual character that can be involved in the applications of the virtual reality, animation films or computer games. This thesis focuses on the use of statistical models for motion synthesis with a strong focus on neural networks. From the machine learning point of view designing synthesis models consists in learning generative models. Our starting point lies in two main problems one encounters when dealing with motion capture data synthesis, ensuring realism of postures and motion, and handling the large variability in the synthesized motion. The variability in the data comes first from core individual features, we do not all move the same way but accordingly to our personality, our gender, age, and morphology etc. Moreover there are other short term factors of variation like our emotion, the fact that we are interacting with somebody else, that we are tired etc. Data driven models have been studied for generating human motion for many years. Models are learned from labelled datasets where motion capture data are recorded while actors are performed various activities like walking, dancing, running, etc. Traditional statistical models such as Hidden Markov Models, Gaussian Processes have been investigated for motion synthesis, demonstrating strengths but also weaknesses. Our work focuses in this line of research and concerns the design of generative models for sequences able to take into account some contextual information, which will represent the factors of variation. A first part of the thesis present preliminary works that we realised by extending previous approaches relying on Hidden Markov Models and Gaussian Processes to tackle the two main problems related to realism and variability. We first describe an attempt to extend contextual Hidden Markov Models for handling variability in the data by conditioning the parameters of the models to an additional contextual information such as the emotion which which a motion was performed. We then propose a variant of a traditional method for performing a specific motion synthesis task called Inverse Kinematics, where we exploit Gaussian Processes to enforce realism of each of the postures of a generated motion. These preliminary results show some potential of statistical models for designing human motion synthesis systems. Yet none of these technologies offers the flexibility brought by neural networks and the recent deep learning revolution.The second part of the thesis describes the works we realized with neural networks and deep architectures. It builds on recurrent neural networks for dealing with sequences and on adversarial learning which was introduced very recently in the deep learning community for designing accurate generative models for complex data. We propose a simple system as a basis synthesis architecture, which combines adversarial learning with sequence autoencoders, and that allows randomly generating realistic motion capture sequences. Starting from this architecture we design few conditional neural models that allow to design synthesis systems that one can control up to some extent by either providing a high level information that the generated sequence should match (e.g. the emotion) or by providing a sequence in the style of which a sequence should be generated.
13

Design and Analysis of Cam-Link Mechanisms

Chen, Hsin-pao 16 July 2009 (has links)
The basic planar cam mechanisms and link mechanisms are widely used in industrial automatic machines. In determining the design method and design procedure for the cam-link mechanism, the basic kinematic synthesis and motion curve generation method require effective design procedure and optimization method to determine the kinematic structure of the mechanism and its kinematic performance clearly. In order to determine the result of the multi-objective optimization problem for the cam-link mechanism, the genetic algorithm is defined as the problem solver and begins this dissertation. By considering the influences of the parameters in the evolving procedure and by defining the conditions of the parameters and variables properly, the best solutions of the multi-objective optimization problem can then be solved successfully. By comparing the curves for the motion synthesis of the cam-link mechanism, the existing motion functions with their kinematic characteristics used in cam mechanisms are introduced and the rational B-spline motion function is proposed. By using the genetic algorithm to approximate the motion curves that is combined with trigonometric functions, the flexibility of the rational B-spline is demonstrated. Furthermore, to minimize different kinematic characteristics of the single-objective minimization problems, these problems are also searched by using rational B-splines with genetic algorithm for having better results. For synthesizing different structures of cam-link mechanisms, first of all is to derive the kinematics of the two planar link mechanisms and four planar cam mechanisms and then the genetic algorithm is used here to find out the minimal cam dimension with the limits of the motion curves, the pressure angles, and the radius of curvatures. Then, the kinematic synthesis problem of the function generation slider-crank mechanisms as the slider starts at the toggle position is discussed. Through the analysis finds out that when using the traditional motion functions with acceleration continuity to synthesize the slider motion, the angular acceleration of the crank cannot be continuous. To achieve the acceleration continuity of the crank motion, the curve that contains the fourth derivatives of the displacement with respect to time are set to be zeros can fulfill the continuity requirement. Then using the structural synthesis design procedure, by following the input and output relations of the link mechanisms and cam mechanisms with design constraints to select the proper structures of the mechanisms. To apply the cam-link mechanism in real application, a machine containing a slider-crank mechanism as toggle mechanism is introduced. Through the design constraints of space and motion limits to find out the possible mechanism structure and define the dimensions and then analyze the kinematics and kinetostatics of the machine. By using the genetic algorithm to solve the multi-objective optimization problem, the parameters of the rational B-spline are adjusted to have optimal kinematics and minimal kinetostatics to reduce the contact stress and to improve the fatigue life of the cam follower. Due to the existing problem of the slider-crank mechanism that contains discontinuous acceleration at the toggle position, to prove the correctness of the theoretical results, the experimental tests are measured and verified with the theoretical results in high similarity. The results show that when the slider motion curves begin at the toggle position with boundary motion constraints up to fourth or more than fourth derivatives of the displacement with respect to time that are set to be zeros, the angular accelerations of the cranks are continuous. In summary, this dissertation provides suggestions of the kinematic characteristics for the designer to design cam-link mechanisms that contain a slider-crank mechanism as the toggle mechanism and design methods for the synthesis, analysis and experimental test of the simple function generation cam-link mechanism.
14

Animation de capture de mouvement de surface / Surface motion capture animation

Boukhayma, Adnane 06 December 2017 (has links)
En tant qu'une nouvelle alternative à la MoCap standard, la capture de surface 4D est motivée par la demande croissante des produits médiatiques de contenu 3D très réaliste. Ce type de données fournit une information complète et réelle de la forme, l'apparence et la cinématique de l'objet dynamique d'intérêt. On aborde dans cet ouvrage certaines taches liées à l'acquisition et l'exploitation de données 4D, obtenues à travers les vidéos multi-vues, avec un intérêt particulier aux formes humaines en mouvements. Parmi ces problèmes, certains ont déjà reçu beaucoup d'intérêt de la part des communautés de vision par ordinateur et d'infographie, mais plusieurs challenges restent ouverts à cet égard. En particulier, nous abordons la synthèse d'animation basée sur des exemples, la modélisation d'apparence et le transfert sémantique de mouvement.On introduit premièrement une méthode pour générer des animations en utilisant des séquences de maillages de mouvements élémentaires d'une forme donnée. De nouveaux mouvements satisfaisant des contraints hauts-niveaux sont construites en combinant et interpolant les trames des données observées. Notre méthode apporte une amélioration local au processus de la synthèse grâce à l'optimisation des transitions interpolées, ainsi qu'une amélioration globale avec une structure organisatrice optimale qu'on appelle le graph essentiel.On aborde en suite la construction de représentations efficaces de l'apparence de formes en mouvement observées à travers des vidéos multi-vue. On propose une représentation par sujet qui identifie la structure sous-jacente de l'information d'apparence relative à uneforme particulière. La représentation propre résultante représente les variabilités d'apparence dues au point de vue et au mouvement avec les textures propres, et celles dues aux imprécisions locales dans le modèle géométrique avec les déformations propres. Outre fournir des représentation compactes, ces décompositions permettent aussi l'interpolation et la complétion des apparences.Finalement, on s'intéresse au problème de transfert de mouvement entre deux modèles 4D capturés. Étant donné des ensembles d'apprentissages pour deux sujets avec des correspondances sémantiques éparses entre des poses clés, notre méthode est capable de transférer un nouveau mouvement capturé d'un sujet vers l'autre. Nous contribuons principalement un nouveau modèle de transfert basé sur l'interpolation non-linéaire de pose et de déplacement qui utilise les processus Gaussiens de régression. / As a new alternative to standard motion capture, 4D surface capture is motivated by the increasing demand from media production for highly realistic 3D content.Such data provides real full shape, appearance and kinematic information of the dynamic object of interest. We address in this work some of the tasks related to the acquisition and the exploitation of 4D data, as obtained through multi-view videos, with an emphasis on corpus of moving subjects. Some of these problems have already received a great deal of interest from the graphics and vision communities, but a number of challenges remain open in this respect. We address namely example based animation synthesis, appearance modelling and semantic motion transfer.We first propose a method to generate animations using video-based mesh sequences of elementary movements of a shape. New motions that satisfy high-level user-specified constraints are built by recombining and interpolating the frames in the observed mesh sequences. Our method brings local improvement to the synthesis process through optimized interpolated transitions, and global improvement with an optimal organizing structure that we call the essential graph.We then address the problem of building efficient appearance representations of shapes observed from multiple viewpoints and in several movements. We propose a per subject representation that identifies the underlying manifold structure of the appearance information relative to a shape. The resulting Eigen representation encodes shape appearance variabilities due to viewpoint and illumination, with Eigen textures, and due to local inaccuracies in the geometric model, with Eigen warps. In addition to providing compact representations, such decompositions also allow for appearance interpolation and appearance completion.We finally address the problem of transferring motion between captured 4D models. Given 4D training sets for two subjects for which a sparse set of semantically corresponding key-poses are known, our method is able to transfer a newly captured motion from one subject to the other. The method contributes a new transfer model based on non-linear pose and displacement interpolation that builds on Gaussian process regression.
15

Head motion synthesis : evaluation and a template motion approach

Braude, David Adam January 2016 (has links)
The use of conversational agents has increased across the world. From providing automated support for companies to being virtual psychologists they have moved from an academic curiosity to an application with real world relevance. While many researchers have focused on the content of the dialogue and synthetic speech to give the agents a voice, more recently animating these characters has become a topic of interest. An additional use for character animation technology is in the film and video game industry where having characters animated without needing to pay for expensive labour would save tremendous costs. When animating characters there are many aspects to consider, for example the way they walk. However, to truly assist with communication automated animation needs to duplicate the body language used when speaking. In particular conversational agents are often only an animation of the upper parts of the body, so head motion is one of the keys to a believable agent. While certain linguistic features are obvious, such as nodding to indicate agreement, research has shown that head motion also aids understanding of speech. Additionally head motion often contains emotional cues, prosodic information, and other paralinguistic information. In this thesis we will present our research into synthesising head motion using only recorded speech as input. During this research we collected a large dataset of head motion synchronised with speech, examined evaluation methodology, and developed a synthesis system. Our dataset is one of the larger ones available. From it we present some statistics about head motion in general. Including differences between read speech and story telling speech, and differences between speakers. From this we are able to draw some conclusions as to what type of source data will be the most interesting in head motion research, and if speaker-dependent models are needed for synthesis. In our examination of head motion evaluation methodology we introduce Forced Canonical Correlation Analysis (FCCA). FCCA shows the difference between head motion shaped noise and motion capture better than standard methods for objective evaluation used in the literature. We have shown that for subjective testing it is best practice to use a variation of MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) based testing, adapted for head motion. Through experimentation we have developed guidelines for the implementation of the test, and the constraints on the length. Finally we present a new system for head motion synthesis. We make use of simple templates of motion, automatically extracted from source data, that are warped to suit the speech features. Our system uses clustering to pick the small motion units, and a combined HMM and GMM based approach for determining the values of warping parameters at synthesis time. This results in highly natural looking motion that outperforms other state of the art systems. Our system requires minimal human intervention and produces believable motion. The key innovates were the new methods for segmenting head motion and creating a process similar to language modelling for synthesising head motion.
16

Radial Basis Functions Applied to Integral Interpolation, Piecewise Surface Reconstruction and Animation Control

Langton, Michael Keith January 2009 (has links)
This thesis describes theory and algorithms for use with Radial Basis Functions (RBFs), emphasising techniques motivated by three particular application areas. In Part I, we apply RBFs to the problem of interpolating to integral data. While the potential of using RBFs for this purpose has been established in an abstract theoretical context, their use has been lacking an easy to check sufficient condition for finding appropriate parent basic functions, and explicit methods for deriving integral basic functions from them. We present both these components here, as well as explicit formulations for line segments in two dimensions and balls in three and five dimensions. We also apply these results to real-world track data. In Part II, we apply Hermite and pointwise RBFs to the problem of surface reconstruction. RBFs are used for this purpose by representing the surface implicitly as the zero level set of a function in 3D space. We develop a multilevel piecewise technique based on scattered spherical subdomains, which requires the creation of algorithms for constructing sphere coverings with desirable properties and for blending smoothly between levels. The surface reconstruction method we develop scales very well to large datasets and is very amenable to parallelisation, while retaining global-approximation-like features such as hole filling. Our serial implementation can build an implicit surface representation which interpolates at over 42 million points in around 45 minutes. In Part III, we apply RBFs to the problem of animation control in the area of motion synthesis---controlling an animated character whose motion is entirely the result of simulated physics. While the simulation is quite well understood, controlling the character by means of forces produced by virtual actuators or muscles remains a very difficult challenge. Here, we investigate the possibility of speeding up the optimisation process underlying most animation control methods by approximating the physics simulator with RBFs.
17

Motion synthesis for high degree-of-freedom robots in complex and changing environments

Yang, Yiming January 2018 (has links)
The use of robotics has recently seen significant growth in various domains such as unmanned ground/underwater/aerial vehicles, smart manufacturing, and humanoid robots. However, one of the most important and essential capabilities required for long term autonomy, which is the ability to operate robustly and safely in real-world environments, in contrast to industrial and laboratory setup is largely missing. Designing robots that can operate reliably and efficiently in cluttered and changing environments is non-trivial, especially for high degree-of-freedom (DoF) systems, i.e. robots with multiple actuators. On one hand, the dexterity offered by the kinematic redundancy allows the robot to perform dexterous manipulation tasks in complex environments, whereas on the other hand, such complex system also makes controlling and planning very challenging. To address such two interrelated problems, we exploit robot motion synthesis from three perspectives that feed into each other: end-pose planning, motion planning and motion adaptation. We propose several novel ideas in each of the three phases, using which we can efficiently synthesise dexterous manipulation motion for fixed-base robotic arms, mobile manipulators, as well as humanoid robots in cluttered and potentially changing environments. Collision-free inverse kinematics (IK), or so-called end-pose planning, a key prerequisite for other modules such as motion planning, is an important and yet unsolved problem in robotics. Such information is often assumed given, or manually provided in practice, which significantly limiting high-level autonomy. In our research, by using novel data pre-processing and encoding techniques, we are able to efficiently search for collision-free end-poses in challenging scenarios in the presence of uneven terrains. After having found the end-poses, the motion planning module can proceed. Although motion planning has been claimed as well studied, we find that existing algorithms are still unreliable for robust and safe operations in real-world applications, especially when the environment is cluttered and changing. We propose a novel resolution complete motion planning algorithm, namely the Hierarchical Dynamic Roadmap, that is able to generate collision-free motion trajectories for redundant robotic arms in extremely complicated environments where other methods would fail. While planning for fixed-base robotic arms is relatively less challenging, we also investigate into efficient motion planning algorithms for high DoF (30 - 40) humanoid robots, where an extra balance constraint needs to be taken into account. The result shows that our method is able to efficiently generate collision-free whole-body trajectories for different humanoid robots in complex environments, where other methods would require a much longer planning time. Both end-pose and motion planning algorithms compute solutions in static environments, and assume the environments stay static during execution. While human and most animals are incredibly good at handling environmental changes, the state-of-the-art robotics technology is far from being able to achieve such an ability. To address this issue, we propose a novel state space representation, the Distance Mesh space, in which the robot is able to remap the pre-planned motion in real-time and adapt to environmental changes during execution. By utilizing the proposed end-pose planning, motion planning and motion adaptation techniques, we obtain a robotic framework that significantly improves the level of autonomy. The proposed methods have been validated on various state-of-the-art robot platforms, such as UR5 (6-DoF fixed-base robotic arm), KUKA LWR (7-DoF fixed-base robotic arm), Baxter (14-DoF fixed-base bi-manual manipulator), Husky with Dual UR5 (15-DoF mobile bi-manual manipulator), PR2 (20-DoF mobile bi-manual manipulator), NASA Valkyrie (38-DoF humanoid) and many others, showing that our methods are truly applicable to solve high dimensional motion planning for practical problems.
18

A study of transfer learning on data-driven motion synthesis frameworks / En studie av kunskapsöverföring på datadriven rörelse syntetiseringsramverk

Chen, Nuo January 2022 (has links)
Various research has shown the potential and robustness of deep learning-based approaches to synthesise novel motions of 3D characters in virtual environments, such as video games and films. The models are trained with the motion data that is bound to the respective character skeleton (rig). It inflicts a limitation on the scalability and the applicability of the models since they can only learn motions from one particular rig (domain) and produce motions in that domain only. Transfer learning techniques can be used to overcome this issue and allow the models to better adapt to other domains with limited data. This work presents a study of three transfer learning techniques for the proposed Objective-driven motion generation model (OMG), which is a model for procedurally generating animations conditioned on positional and rotational objectives. Three transfer learning approaches for achieving rig-agnostic encoding (RAE) are proposed and experimented with: Feature encoding (FE), Feature clustering (FC) and Feature selection (FS), to improve the learning of the model on new domains with limited data. All three approaches demonstrate significant improvement in both the performance and the visual quality of the generated animations, when compared to the vanilla performance. The empirical results indicate that the FE and the FC approaches yield better transferring quality than the FS approach. It is inconclusive which of them performs better, but the FE approach is more computationally efficient, which makes it the more favourable choice for real-time applications. / Många studier har visat potentialen och robustheten av djupinlärningbaserade modeller för syntetisering av nya rörelse för 3D karaktärer i virtuell miljö, som datorspel och filmer. Modellerna är tränade med rörelse data som är bunden till de respektive karaktärskeletten (rig). Det begränsar skalbarheten och tillämpningsmöjligheten av modellerna, eftersom de bara kan lära sig av data från en specifik rig (domän) och därmed bara kan generera animationer i den domänen. Kunskapsöverföringsteknik (transfer learning techniques) kan användas för att överkomma denna begränsning och underlättar anpassningen av modeller på nya domäner med begränsade data. I denna avhandling presenteras en studie av tre kunskapsöverföringsmetoder för den föreslagna måldriven animationgenereringsnätverk (OMG), som är ett neural nätverk-baserad modell för att procedurellt generera animationer baserade på positionsmål och rotationsmål. Tre metoder för att uppnå rig-agnostisk kodning är presenterade och experimenterade: Feature encoding (FE), Feature clustering (FC) and Feature selection (FS), för att förbättra modellens lärande på nya domäner med begränsade data. All tre metoderna visar signifikant förbättring på både prestandan och den visuella kvaliteten av de skapade animationerna, i jämförelse med den vanilla prestandan. De empiriska resultaten indikerar att både FE och FC metoderna ger bättre överföringskvalitet än FS metoden. Det går inte att avgöra vilken av de presterar bättre, men FE metoden är mer beräkningseffektiv, vilket är fördelaktigt för real-time applikationer.
19

Exploring Normalizing Flow Modifications for Improved Model Expressivity / Undersökning av normalizing flow-modifikationer för förbättrad modelluttrycksfullhet

Juschak, Marcel January 2023 (has links)
Normalizing flows represent a class of generative models that exhibit a number of attractive properties, but do not always achieve state-of-the-art performance when it comes to perceived naturalness of generated samples. To improve the quality of generated samples, this thesis examines methods to enhance the expressivity of discrete-time normalizing flow models and thus their ability to capture different aspects of the data. In the first part of the thesis, we propose an invertible neural network architecture as an alternative to popular architectures like Glow that require an individual neural network per flow step. Although our proposal greatly reduces the number of parameters, it has not been done before, as such architectures are believed to not be powerful enough. For this reason, we define two optional extensions that could greatly increase the expressivity of the architecture. We use augmentation to add Gaussian noise variables to the input to achieve arbitrary hidden-layer widths that are no longer dictated by the dimensionality of the data. Moreover, we implement Piecewise Affine Activation Functions that represent a generalization of Leaky ReLU activations and allow for more powerful transformations in every individual step. The resulting three models are evaluated on two simple synthetic datasets – the two moons dataset and one generated from a mixture of eight Gaussians. Our findings indicate that the proposed architectures cannot adequately model these simple datasets and thus do not represent alternatives to current stateof-the-art models. The Piecewise Affine Activation Function significantly improved the expressivity of the invertible neural network, but could not make use of its full potential due to inappropriate assumptions about the function’s input distribution. Further research is needed to ensure that the input to this function is always standard normal distributed. We conducted further experiments with augmentation using the Glow model and could show minor improvements on the synthetic datasets when only few flow steps (two, three or four) were used. However, in a more realistic scenario, the model would encompass many more flow steps. Lastly, we generalized the transformation in the coupling layers of modern flow architectures from an elementwise affine transformation to a matrixbased affine transformation and studied the effect this had on MoGlow, a flow-based model of motion. We could show that McMoGlow, our modified version of MoGlow, consistently achieved a better training likelihood than the original MoGlow on human locomotion data. However, a subjective user study found no statistically significant difference in the perceived naturalness of the samples generated. As a possible reason for this, we hypothesize that the improvements are subtle and more visible in samples that exhibit slower movements or edge cases which may have been underrepresented in the user study. / Normalizing flows representerar en klass av generativa modeller som besitter ett antal eftertraktade egenskaper, men som inte alltid uppnår toppmodern prestanda när det gäller upplevd naturlighet hos genererade data. För att förbättra kvaliteten på dessa modellers utdata, undersöker detta examensarbete metoder för att förbättra uttrycksfullheten hos Normalizing flows-modeller i diskret tid, och därmed deras förmåga att fånga olika aspekter av datamaterialet. I den första delen av uppsatsen föreslår vi en arkitektur uppbyggt av ett inverterbart neuralt nätverk. Vårt förslag är ett alternativ till populära arkitekturer som Glow, vilka kräver individuella neuronnät för varje flödessteg. Även om vårt förslag kraftigt minskar antalet parametrar har detta inte gjorts tidigare, då sådana arkitekturer inte ansetts vara tillräckligt kraftfulla. Av den anledningen definierar vi två oberoende utökningar till arkitekturen som skulle kunna öka dess uttrycksfullhet avsevärt. Vi använder så kallad augmentation, som konkatenerar Gaussiska brusvariabler till observationsvektorerna för att uppnå godtyckliga bredder i de dolda lagren, så att deras bredd inte längre begränsas av datadimensionaliteten. Dessutom implementerar vi Piecewise Affine Activation-funktioner (PAAF), vilka generaliserar Leaky ReLU-aktiveringar genom att möjliggöra mer kraftfulla transformationer i varje enskilt steg. De resulterande tre modellerna utvärderas med hjälp av två enkla syntetiska datamängder - ”the two moons dataset” och ett som genererats genom att blanda av åtta Gaussfördelningar. Våra resultat visar att de föreslagna arkitekturerna inte kan modellera de enkla datamängderna på ett tillfredsställande sätt, och därmed inte utgör kompetitiva alternativ till nuvarande moderna modeller. Den styckvisa aktiveringsfunktionen förbättrade det inverterbara neurala nätverkets uttrycksfullhet avsevärt, men kunde inte utnyttja sin fulla potential på grund av felaktiga antaganden om funktionens indatafördelning. Ytterligare forskning behövs för att hantera detta problem. Vi genomförde ytterligare experiment med augmentation av Glow-modellen och kunde påvisa vissa förbättringar på de syntetiska dataseten när endast ett fåtal flödessteg (två, tre eller fyra) användes. Däremot omfattar modeller i mer realistiska scenarion många fler flödessteg. Slutligen generaliserade vi transformationen i kopplingslagren hos moderna flödesarkitekturer från en elementvis affin transformation till en matrisbaserad affin transformation, samt studerade vilken effekt detta hade på MoGlow, en flödesbaserad modell av 3D-rörelser. Vi kunde visa att McMoGlow, vår modifierade version av MoGlow, konsekvent uppnådde bättre likelihood i träningen än den ursprungliga MoGlow gjorde på mänskliga rörelsedata. En subjektiv användarstudie på exempelrörelser genererade från MoGlow och McMoGlow visade dock ingen statistiskt signifikant skillnad i användarnas uppfattning av hur naturliga rörelserna upplevdes. Som en möjlig orsak till detta antar vi att förbättringarna är subtila och mer synliga i situationer som uppvisar långsammare rörelser eller i olika gränsfall som kan ha varit underrepresenterade i användarstudien.

Page generated in 0.0932 seconds