Global ETD Search

171	Odhad pózy kamery z přímek pomocí přímé lineární transformace / Camera Pose Estimation from Lines using Direct Linear Transformation Přibyl, Bronislav Unknown Date (has links) Tato disertační práce se zabývá odhadem pózy kamery z korespondencí 3D a 2D přímek, tedy tzv. perspektivním problémem n přímek (angl. Perspective- n -Line, PnL). Pozornost je soustředěna na případy s velkým počtem čar, které mohou být efektivně řešeny metodami využívajícími lineární formulaci PnL. Dosud byly známy pouze metody pracující s korespondencemi 3D bodů a 2D přímek. Na základě tohoto pozorování byly navrženy dvě nové metody založené na algoritmu přímé lineární transformace (angl. Direct Linear Transformation, DLT): Metoda DLT-Plücker-Lines pracující s korespondencemi 3D a 2D přímek a metoda DLT-Combined-Lines pracující jak s korespondencemi 3D bodů a 2D přímek, tak s korespondencemi 3D přímek a 2D přímek. Ve druhém případě je redundantní 3D informace využita k redukci minimálního počtu požadovaných korespondencí přímek na 5 a ke zlepšení přesnosti metody. Navržené metody byly důkladně testovány za různých podmínek včetně simulovaných a reálných dat a porovnány s nejlepšími existujícími PnL metodami. Metoda DLT-Combined-Lines dosahuje výsledků lepších nebo srovnatelných s nejlepšími existujícími metodami a zároveň je značně rychlá. Tato disertační práce také zavádí jednotný rámec pro popis metod pro odhad pózy kamery založených na algoritmu DLT. Obě navržené metody jsou definovány v tomto rámci.
172	Extraction de comportements reproductibles en avatar virtuel Dare, Kodjine 10 1900 (has links) Face à une image représentant une personne, nous (les êtres humains) pouvons visualiser les différentes parties de la personne en trois dimensions (tridimensionnellement – 3D) malgré l'aspect bidimensionnel (2D) de l'image. Cette compétence est maîtrisée grâce à des années d'analyse des humains. Bien que cette estimation soit facilement réalisable par les êtres humains, elle peut être difficile pour les machines. Dans ce mémoire, nous décrivons une approche qui vise à estimer des poses à partir de vidéos dans le but de reproduire les mouvements observés par un avatar virtuel. Nous poursuivons en particulier deux objectifs dans notre travail. Tout d'abord, nous souhaitons extraire les coordonnées d’un individu dans une vidéo à l’aide de méthodes 2D puis 3D. Dans le second objectif, nous explorons la reconstruction d'un avatar virtuel en utilisant les coordonnées 3D de façon à transférer les mouvements humains vers l'avatar. Notre approche qui consiste à compléter l’estimation des coordonnées 3D par des coordonnes 2D permettent d’obtenir de meilleurs résultats que les méthodes existantes. Finalement nous appliquons un transfert des positions par image sur le squelette d'un avatar virtuel afin de reproduire les mouvements extraits de la vidéo. / Given an image depicting a person, we (human beings) can visualize the different parts of the person in three dimensions despite the two-dimensional aspect of the image. This perceptual skill is mastered through years of analyzing humans. While this estimation is easily achievable for human beings, it can be challenging for machines. 3D human pose estimation uses a 3D skeleton to represent the human body posture. In this thesis, we describe an approach that aims at estimating poses from video with the objective of reproducing the observed movements by a virtual avatar. We aim two main objectives in our work. First, we achieve the extraction of initial body parts coordinates in 2D using a method that predicts joint locations by part affinities (PAF). Then, we estimate 3D body parts coordinates based on a human full 3D mesh reconstruction approach supplemented by the previously estimated 2D coordinates. Secondly, we explore the reconstruction of a virtual avatar using the extracted 3D coordinates with the prospect to transfer human movements towards the animated avatar. This would allow to extract the behavioral dynamics of a human. Our approach consists of multiple subsequent stages that show better results in the estimation and extraction than similar solutions due to this supplement of 2D coordinates. With the final extracted coordinates, we apply a transfer of the positions (per frame) to the skeleton of a virtual avatar in order to reproduce the movements extracted from the video. Extraction de comportements Avatar virtuel Simulation d’émotions Vidéo Estimation pose 3D Reproduction de comportements Behavior extraction Virtual avatar Emotion simulation 3D pose estimation Behavior reproduction
173	Estimation de pose 2D par réseau convolutif Huppé, Samuel 04 1900 (has links) Magic: The Gathering} est un jeu de cartes à collectionner stochastique à information imparfaite inventé par Richard Garfield en 1993. Le but de ce projet est de proposer un pipeline d'apprentissage machine permettant d'accomplir la détection et la localisation des cartes du jeu \textit{Magic} au sein d'une image typique des tournois de ce jeu. Il s'agit d'un problème de pose d'objets 2D à quatre degrés de liberté soit, la position sur deux axes, la rotation et l'échelle, dans un contexte où les cartes peuvent être superposées. À travers ce projet, nous avons développé une approche par données synthétiques à deux réseaux capable, collectivement d'identifier, et de régresser ces paramètres avec une précision significative. Dans le cadre de ce projet, nous avons développé un algorithme d'apprentissage profond par données synthétiques capable de positionner une carte avec une précision d'un demi pixel et d'une rotation de moins d'un degré. Finalement, nous avons montré que notre jeu de données synthétique est suffisamment réaliste pour permettre à nos réseaux de généraliser aux cas d'images réelles. / Magic: The Gathering} is an imperfect information, stochastic, collectible card game invented by Richard Garfield in 1993. The goal of this project is to propose a machine learning pipeline capable of detecting and localising \textit{Magic} cards within an image. This is a 2D pose problem with 4 degrees of freedom, namely translation in $x$ and $y$, rotation, and scale, in a context where cards can be superimposed on one another. We tackle this problem by relying on deep learning using a combination of two separate neural networks. Our final pipeline has the ability to tackle real-world images and gives, with a very good degree of precision, the poses of cards within an image. Through the course of this project, we have developped a method of realistic synthetic data generation to train both our models to tackle real world images. The results show that our pose subnetwork is able to predict position within half a pixel, rotation within one degree and scale within 2 percent. Vision par ordinateur Estimation de pose Apprentissage machine Apprentissage profond Autoencodeurs Réseaux convolutifs Jeux de données synthétiques Computer Vision Pose Estimation Machine Learning Deep Learning Autoencoders Convolutional Neural Networks Synthetic Datasets
174	Skeleton Tracking for Sports Using LiDAR Depth Camera / Skelettspårning för sport med LiDAR-djupkamera Efstratiou, Panagiotis January 2021 (has links) Skeletal tracking can be accomplished deploying human pose estimation strategies. Deep learning is shown to be the paramount approach in the realm where in collaboration with a ”light detection and ranging” depth camera the development of a markerless motion analysis software system seems to be feasible. The project utilizes a trained convolutional neural network in order to track humans doing sport activities and to provide feedback after biomechanical analysis. Implementations of four filtering methods are presented regarding movement’s nature, such as kalman filter, fixedinterval smoother, butterworth and moving average filter. The software seems to be practicable in the field evaluating videos at 30Hz, as it is demonstrated by indoor cycling and hammer throwing events. Nonstatic camera behaves quite well against a standstill and upright person while the mean absolute error is 8.32% and 6.46% referential to left and right knee angle, respectively. An impeccable system would benefit not only the sports domain but also the health industry as a whole. / Skelettspårning kan åstadkommas med hjälp av metoder för uppskattning av mänsklig pose. Djupinlärningsmetoder har visat sig vara det främsta tillvägagångssättet och om man använder en djupkamera med ljusdetektering och varierande omfång verkar det vara möjligt att utveckla ett markörlöst system för rörelseanalysmjukvara. I detta projekt används ett tränat neuralt nätverk för att spåra människor under sportaktiviteter och för att ge feedback efter biomekanisk analys. Implementeringar av fyra olika filtreringsmetoder för mänskliga rörelser presenteras, kalman filter, utjämnare med fast intervall, butterworth och glidande medelvärde. Mjukvaran verkar vara användbar vid fälttester för att utvärdera videor vid 30Hz. Detta visas genom analys av inomhuscykling och släggkastning. En ickestatisk kamera fungerar ganska bra vid mätningar av en stilla och upprättstående person. Det genomsnittliga absoluta felet är 8.32% respektive 6.46% då vänster samt höger knävinkel användes som referens. Ett felfritt system skulle gynna såväl idrottssom hälsoindustrin. Depth camera Human pose estimation Kinematics Motion Capture Skeletal tracking Djupkamera Uppskattning av mänsklig pose Kinematik Rörelseregistreringssystem Skelettspårning Sport and Fitness Sciences Idrottsvetenskap Signal Processing Signalbehandling Computer Sciences Datavetenskap (datalogi)
175	[pt] REDES DE GRAFOS SEMÂNTICOS COM ATENÇÃO E DECOMPOSIÇÃO DE TENSORES PARA VISÃO COMPUTACIONAL E COMPUTAÇÃO GRÁFICA / [en] SEMANTIC GRAPH ATTENTION NETWORKS AND TENSOR DECOMPOSITIONS FOR COMPUTER VISION AND COMPUTER GRAPHICS LUIZ JOSE SCHIRMER SILVA 02 July 2021 (has links) [pt] Nesta tese, propomos novas arquiteturas para redes neurais profundas utlizando métodos de atenção e álgebra multilinear para aumentar seu desempenho. Também exploramos convoluções em grafos e suas particularidades. Nos concentramos aqui em problemas relacionados à estimativa de pose em tempo real. A estimativa de pose é um problema desafiador em visão computacional com muitas aplicações reais em áreas como realidade aumentada, realidade virtual, animação por computador e reconstrução de cenas 3D. Normalmente, o problema a ser abordado envolve estimar a pose humana 2D ou 3D, ou seja, as partes do corpo de pessoas em imagens ou vídeos, bem como seu posicionamento e estrutura. Diveros trabalhos buscam atingir alta precisão usando arquiteturas baseadas em redes neurais de convolução convencionais; no entanto, erros causados por oclusão e motion blur não são incomuns, e ainda esses modelos são computacionalmente pesados para aplicações em tempo real. Exploramos diferentes arquiteturas para melhorar o tempo de processamento destas redes e, como resultado, propomos dois novos modelos de rede neural para estimativa de pose 2D e 3D. Também apresentamos uma nova arquitetura para redes de atenção em grafos chamada de atenção em grafos semânticos. / [en] This thesis proposes new architectures for deep neural networks with attention enhancement and multilinear algebra methods to increase their performance. We also explore graph convolutions and their particularities. We focus here on the problems related to real-time pose estimation. Pose estimation is a challenging problem in computer vision with many real applications in areas including augmented reality, virtual reality, computer animation, and 3D scene reconstruction. Usually, the problem to be addressed involves estimating the 2D and 3D human pose, i.e., the anatomical keypoints or body parts of persons in images or videos. Several papers propose approaches to achieve high accuracy using architectures based on conventional convolution neural networks; however, mistakes caused by occlusion and motion blur are not uncommon, and those models are computationally very intensive for real-time applications. We explore different architectures to improve processing time, and, as a result, we propose two novel neural network models for 2D and 3D pose estimation. We also introduce a new architecture for Graph attention networks called Semantic Graph Attention. [pt] ESTIMATIVA DE POSE [pt] APLICACOES EM TEMPO REAL [pt] REDES NEURAIS PARA GRAFOS [pt] DECOMPOSICAO DE TENSORES [pt] MODELOS DE ATENCAO [pt] REDES NEURAIS DE CONVOLUCAO [en] POSE ESTIMATION [en] REAL TIME APPLICATIONS [en] GRAPH NEURAL NETWORKS [en] TENSOR DECOMPOSITION [en] ATTENTION MODELS [en] CONVOLUTIONAL NEURAL NETWORKS
176	Movement Estimation with SLAM through Multimodal Sensor Fusion Cedervall Lamin, Jimmy January 2024 (has links) In the field of robotics and self-navigation, Simultaneous Localization and Mapping (SLAM) is a technique crucial for estimating poses while concurrently creating a map of the environment. Robotics applications often rely on various sensors for pose estimation, including cameras, inertial measurement units (IMUs), and more. Traditional discrete SLAM, utilizing stereo camera pairs and inertial measurement units, faces challenges such as time offsets between sensors. A solution to this issue is the utilization of continuous-time models for pose estimation. This thesis delves into the exploration and implementation of a continuous-time SLAM system, investigating the advantages of multi-modal sensor fusion over discrete stereo vision models. The findings indicate that incorporating an IMU into the system enhances pose estimation, providing greater robustness and accuracy compared to relying solely on visual SLAM. Furthermore, leveraging the continuous model's derivative and smoothness allows for decent pose estimation with fewer measurements, reducing the required quantity of measurements and computational resources. slam discrete-slam continuous-slam synchronous asynchronous computer vision BRISK opencv ceres visual inertial sensor fusion multimodal Simultaneous Localization and Mapping time offset pose estimation quaternions movement estimation Media and Communication Technology Medieteknik
177	Research and Application of 6D Pose Estimation for Mobile 3D Cameras / Forskning och tillämpning av 6D Pose Estimation för mobila 3D-kameror Ruichao, Qian January 2022 (has links) This work addresses the deep-learning-based 6 Degree-of-Freedom (DoF) pose estimation utilizing 3D cameras on an iPhone 13 Pro. The task of pose estimation is to estimate the spatial rotation and translation of an object given its 2D or 3D images. During the pose estimation network training process, a common way to expand the training dataset is to generate synthetic images, which requires the 3D mesh of the target object. Although several famous datasets provide the 3D object files, it is still a problem when one wants to generate a customized real-world object. The typical 3D scanners are mainly designed for industrial usage and are usually expensive. We investigated in this project whether the 3D cameras on Apple devices can replace the industrial 3D scanners in the pose estimation pipeline and what might influence the results during scanning. During the data synthesis, we introduced a pose sampling method to equally sample on a sphere. Random transformation and background images from the SUN2012 dataset are applied, and the synthetic image is rendered through Blender. We picked five testing objects with different sizes and surfaces. Each object is scanned both by front TrueDepth camera and rear Light Detection and Ranging (LiDAR) camera with the ‘3d Scanner App’ on iOS. The network we used is based on PVNet, which uses a pixel-wise voting scheme to find 2D keypoints on RGB images and utilizes uncertainty-driven Perspective-n-Point (PnP) to compute the pose. We achieved both quantitative and qualitative results for each instance. i) TrueDepth camera outperforms Light Detection and Ranging (LiDAR) camera in most scenarios, ii) when an object has less reflective surface and high-contrast texture, the advantage of TrueDepth is more obvious. We also picked three baseline objects from Linemod dataset. Although the average accuracy is lower than the original paper, the performance of our baseline instances shows a similar trend to the original paper’s results. In conclusion, we proved that the 3D cameras on iPhone are capable of the pose estimation pipeline. / Detta arbete tar upp den djupinlärningsbaserade 6 Degree-of-Freedom (DoF) poseringsuppskattning med 3D-kameror på en iPhone 13 Pro. Uppgiften med poseuppskattning är att uppskatta den rumsliga rotationen och translationen av ett objekt givet dess 2D- eller 3D-bilder. Ett vanligt sätt att utöka träningsdataup- psättningen under träningsprocessen för positionsuppskattning är att generera syntetiska bilder, vilket kräver 3D-nätet för målobjektet. Även om flera kända datamängder tillhandahåller 3D-objektfilerna, är det fortfarande ett problem när man vill generera ett anpassat verkligt objekt. De typiska 3D-skannrarna är främst designade för industriell användning och är vanligtvis dyra. Vi undersökte i detta projekt om 3D-kamerorna på Apple-enheter kan ersätta de industriella 3D-skannrarna i poseskattningspipelinen och vad som kan påverka resultaten under skanning. Under datasyntesen introducerade vi en posesamplingsmetod för att sampla lika mycket på en sfär. Slumpmässig transformation och bakgrundsbilder från SUN2012-datauppsättningen tillämpas, och den syntetiska bilden renderas genom Blender. Vi valde ut fem testobjekt med olika storlekar och ytor. Varje objekt skannas både av den främre TrueDepth-kameran och den bakre ljusdetektions- och avståndskameran (LiDAR) med "3d-skannerappenpå iOS. Nätverket vi använde är baserat på PVNet, som använder ett pixelvis röstningsschema för att hitta 2D-nyckelpunkter på RGB-bilder och använder osäkerhetsdrivet Perspective-n-Point (PnP) för att beräkna poseringen. Vi uppnådde både kvantitativa och kvalitativa resultat för varje instans. i) TrueDepth-kameran överträffar Light Detection and Ranging-kameran (LiDAR) i de flesta scenarier, ii) när ett objekt har mindre reflekterande yta och högkontraststruktur är fördelen med TrueDepth högre. Vi valde också tre baslinjeobjekt från Linemod dataset. Även om den genomsnittliga noggrannheten är lägre än originalpapperet, visar prestandan för våra baslinjeinstanser en liknande trend som originalpapperets resultat. Sammanfattningsvis bevisade vi att 3D-kamerorna på iPhone är kapabla att göra positionsuppskattning. 6 Degree-of-Freedom (DoF) pose estimation deep learning Light Detection and Ranging (LiDAR) structure light TrueDepth 6 frihetsgrader (DoF) poseringsuppskattning djupinlärning ljusdetektion och avstånd (LiDAR) strukturljus TrueDepth Elektroteknik och elektronik
178	Pose Classification of Horse Behavior in Video : A deep learning approach for classifying equine poses based on 2D keypoints / Pose-klassificering av Hästbeteende i Video : En djupinlärningsmetod för klassificering av hästposer baserat på 2D-nyckelpunkter Söderström, Michaela January 2021 (has links) This thesis investigates whether Computer Vision can be a useful tool in interpreting the behaviors of monitored horses. In recent years, research in the field of Computer Vision has primarily focused on people, where pose estimation and action recognition are popular research areas. The thesis presents a pose classification network, where input features are described by estimated 2D key- points of horse body parts. The network output classifies three poses: ’Head above the wither’, ’Head aligned with the wither’ and ’Head below the wither’. The 2D reconstructions of keypoints are obtained using DeepLabCut applied to raw video surveillance data of a single horse. The estimated keypoints are then fed into a Multi-layer preceptron, which is trained to classify the mentioned classes. The network shows promising results with good performance. We found label noise when we spot-checked random samples of predicted poses and comparing them to the ground truth, as some of the labeled data consisted of false ground truth samples. Despite this fact, the conclusion is that satisfactory results are achieved with our method. Particularly, the keypoint estimates were sufficient enough for these poses for the model to succeed to classify a hold-out set of poses. / Uppsatsen undersöker främst om datorseende kan vara ett användbart verktyg för att tolka beteendet hos övervakade hästar. Under de senaste åren har forskning inom datorseende främst fokuserat på människor, där pose-estimering och händelseigenkänning är populära forskningsområden. Denna avhandling presenterar ett poseklassificeringsnätverk där indata beskrivs av uppskattade 2Dnyckelpunkter (eller så kallade intressepunkter) för hästkroppsdelar. Nätverket klassificerar tre poser: ’Huvud ovanför manken’, ’Huvud i linje med manken’ och ’Huvudet nedanför manken’. 2D-rekonstruktioner av nyckelpunkter erhålls med hjälp av DeepLabCut, applicerad på rå videoövervakningsdata för en häst. De uppskattade nyckelpunkterna matas sedan in i ett flerskikts- preceptron, som tränas för att klassificera de nämnda klasserna. Nätverket visar lovande resultat med bra prestanda. Vi hittade brus i etiketterna vid slumpmässiga stickprover av förutspådda poser som jämfördes med sanna etiketter där några etiketter bestod av falska sanna etiketter. Trots detta är slutsatsen att tillfredsställande resultat uppnås med vår metod. Speciellt var de estimerade nyckelpunkterna tillräckliga för dessa poser för att nätverket skulle lyckas med att klassificera ett separat dataset av samma osedda poser. Deep Learning Computer Visison Horse behavior Pose estimation 2D key- points Pose classification DeepLabCut Djupinlärning Datorseende Hästbeteende Pose-estimering Nyckelpunkter Intressepunkter Pose-klassificering DeepLabCut Computer and Information Sciences Data- och informationsvetenskap
179	Mobility anomaly detection with intelligent video surveillance Ebrahimi, Fatemeh 06 1900 (has links) Dans ce mémoire, nous présentons une étude visant à améliorer les soins aux personnes âgées grâce à la mise en œuvre d'un système de vidéosurveillance intelligent avancé. Ce système est conçu pour exploiter la puissance des algorithmes d’apprentissage profond pour détecter les anomalies de mobilité, avec un accent particulier sur l’identification des quasi-chutes. L’importance d’identifier les quasi-chutes réside dans le fait que les personnes qui subissent de tels événements au cours de leurs activités quotidiennes courent un risque accru de subir des chutes à l’avenir pouvant mener à des blessures graves et une hospitalisation. L’une des principales réalisations de notre étude est le développement d’un auto-encodeur capable de détecter les anomalies de mobilité, en particulier les quasi-chutes, en identifiant des erreurs de reconstruction élevées sur cinq images consécutives. Pour extraire avec précision une structure squelettique de la personne, nous avons utilisé MoveNet et affiné ce modèle sur sept points clés. Par la suite, nous avons utilisé un ensemble complet de 20 caractéristiques, englobant les positions des articulations, les vitesses, les accélérations, les angles et les accélérations angulaires, pour entraîner l’auto-encodeur. Afin d'évaluer l'efficacité de notre modèle, nous avons effectué des tests rigoureux à l'aide de 100 vidéos d'activités quotidiennes simulées enregistrées dans un laboratoire d'appartement, la moitié des vidéos contenant des cas de quasi-chutes. Un autre ensemble de 50 vidéos a été utilisé pour l’entrainement. Les résultats de notre phase de test sont très prometteurs, car ils indiquent que notre modèle est capable de détecter efficacement les quasi-chutes avec une sensibilité, une spécificité et une précision impressionnantes de 90 %. Ces résultats soulignent le potentiel de notre modèle à améliorer considérablement les soins aux personnes âgées dans leur environnement de vie. / In this thesis, we present a comprehensive study aimed at enhancing elderly care through the implementation of an advanced intelligent video surveillance system. This system is designed to leverage the power of deep learning algorithms to detect mobility anomalies, with a specific focus on identifying near-falls. The significance of identifying near-falls lies in the fact that individuals who experience such events during their daily activities are at an increased risk of experiencing falls in the future that can lead to serious injury and hospitalization. A key achievement of our study is the successful development of an autoencoder capable of detecting mobility anomalies, particularly near-falls, by pinpointing high reconstruction errors across five consecutive frames. To precisely extract a person's skeletal structure, we utilized MoveNet and focused on seven key points. Subsequently, we employed a comprehensive set of 20 features, encompassing joint positions, velocities, accelerations, angles, and angular accelerations, to train the model. In order to assess the efficacy of our model, we conducted rigorous testing using 100 videos of simulated daily activities recorded in an apartment laboratory, with half of the videos containing instances of near-falls. Another set of 50 videos was used for training. The results from our testing phase are highly promising, as they indicate that our model is able to effectively detect near-falls with an impressive 90% sensitivity, specificity, and accuracy. These results underscore the potential of our model to significantly enhance elderly care within their living environments. Vidéosurveillance Quasi-chute Détection d'anomalies MoveNet Extraction de squelette Estimation de pose Reconnaissance d'activité humaine Vdeo surveillance Near-fall Anomaly detection Autoencoder Skeleton extraction Pose estimation Human activity recognition Auto-encodeur
180	Analyzing Lower Limb Motion Capture with Smartphone : Possible improvements using machine learning / Analys av rörelsefångst för nedre extremiteterna med smartphone : Möjliga förbättringar med hjälp av maskininlärning Brink, Anton January 2024 (has links) Human motion analysis (HMA) can play a crucial role in sports and healthcare by providing unique insights on movement mechanics in the form of objective measurements and quantitative data. Traditional, state of the art, marker-based techniques, despite their accuracy, come with financial and logistical barriers, and are restricted to laboratory settings. Markerless systems offer much improved affordability and portability, and can potentially be used outside of laboratories. However, these advantages come with a significant cost in accuracy. This thesis attempts to address the challenge of democratizing HMA by leveraging recent advances in smartphone technology and machine learning.\newline\newlineThis thesis evaluates two modalities of performing markerless HMA: Single smartphone using Apple Arkit, and multiple smartphone setup using OpenCap, and compares both to a state of the art multiple-camera marker-based system from Vicon. Additionally, this thesis presents and evaluates two approaches to improving the single smartphone modality: Employing a Gaussian Process Model (GPR), and a Long-short-term-memory (LSTM) neural network to refine the single smartphone data to align with the marker-based result. Specific movements were recorded simultaneously with all three modalities on 13 subjects to build a dataset. From this, GPR and LSTM models were trained and applied to refine the single camera modality data. Lower limb joint angles, and joint centers were evaluated across the different modalities, and analyzed for potential use in real-world applications. While the findings of this thesis are promising, as both the GPR and LSTM models improve the accuracy of Apple Arkit, and OpenCap providing accurate and consistent results. It is important to acknowledge limitations regarding demographic diversity and how real-world environmental factors may influence its application. This thesis contributes to the efforts in narrowing the gap between marker-based HMA methods, and more accessible solutions. / Rörelseanalys av människokroppen (HMA) kan spela en betydelsefull roll i både idrott och hälso- och sjukvården. Genom objektiv och kvantitativ data ger den unik insikt i mekaniken bakom rörelser. Traditionella, toppmoderna, markör-baserade tekniker är mycket precisa, men medför finansiella och logistikbaserade barriärer, och finns endast tillgängliga i laboratorier. Markör-fria system erbjuder mycket bättre pris, portabilitet och kan potentiellt användas utanför laboratorier. Dessa fördelar går dock hand i hand med en betydande minskning av nogrannhet. Denna avhandling försöker ta itu med utmaningen att demokratisera HMA genom att utnyttja de senaste framstegen inom smartphoneteknik och maskininlärning. Denna avhandling utvärderar två sätt att utföra markör-fri HMA: Genom att använda en smartphone som kör Apple Arkit, och en uppsättning med flera smartphones som kör OpenCap. Båda modaliteter jämförs med ett markör-baserat system som använder flera kameror, från Vicon. Dessutom presenteras och utvärderas två metoder för att förbättra modaliteten med endast en smartphone: Användning av en Gaussisk Process modell för Regression (GPR) och ett Long-short-term-memory (LSTM) neuronnät för att förbättra data från en smartphone modalititeten, så att det bättre överenstämmer med det markör-baserade resultatet. Specifika rörelser spelades in samtidigt med alla tre modaliteter på 13 försökspersoner för att bygga upp ett dataset. Utifrån detta tränades GPR- och LSTM-modeller och användas för att förbättra data från en kamera modaliteten (Apple Arkit). Ledvinklar och ledcentra för de nedre extremiteterna utvärderades i de olika modaliteterna och analyserades för potentiell använding i verkliga tillämpningar. Även om resultaten av denna avhandling är lovande, då både GPR- och LSTM-modellerna förbättrar nogrannheten hos Apple Arkit, och OpenCap ger korrekta och konsekventa resultat, så är det viktigt att erkänna begränsningarna när det gäller demografisk mångfald och hur miljöfaktorer i verkligheten kan påverka tillämpningen. Motion Analysis Markerless Pose Estimation Machine Learning Gaussian Process Regression Long-Short-Term-Memory Neural Network Apple Arkit OpenCap Vicon Medical Engineering Medicinteknik Bioinformatics and Systems Biology Bioinformatik och systembiologi Computer Sciences Datavetenskap (datalogi)

Search results