21 |
Detecting and Tracking Moving Objects from a Small Unmanned Air VehicleDeFranco, Patrick 01 March 2015 (has links) (PDF)
As the market for unmanned air vehicles (UAVs) rapidly expands, the need for algorithmsthat improve the capabilities of those vehicles is also growing. One valuable capability for UAVsis that of persistent tracking—the ability to find and track another moving object, usually on theground, from an aerial platform. This thesis presents a method for tracking multiple ground targetsfrom an airborne camera. Moving objects on the ground are detected by using frame-to-frameregistration. The detected objects are then tracked using the newly developed recursive RANSACalgorithm. Much video tracking work has focused on using appearance-based processing for tracking,with some approaches using dynamic trackers such as Kalman filters. This work demonstratesa fusion of computer vision and dynamic tracking to increase the ability of an unmanned air platformto identify and robustly track moving targets. With a C++ implementation of the algorithmsrunning on the open source Robot Operating System (ROS) framework, the system developed iscapable of processing 1920x1080 resolution video at over seven frames per second on a desktopcomputer.
|
22 |
On the effect of architecture on deep learning based features for homography estimation / Angående effekten av arkitektur på djupinlärningsbaserade särdrag för homografi-estimeringÄhdel, Victor January 2018 (has links)
Keypoint detection and description is the first step of homography and essential matrix estimation, which in turn is used in Visual Odometry and Visual SLAM. This work explores the effect (in terms of speed and accuracy) of using different deep learning architectures for such keypoints. The fully convolutional networks — with heads for both the detector and descriptor — are trained through an existing self-supervised method, where correspondences are obtained through known randomly sampled homographies. A new strategy for choosing negative correspondences for the descriptor loss is presented, which enables more flexibility in the architecture design. The new strategy turns out to be essential as it enables networks that outperform the learnt baseline at no cost in inference time. Varying the model size leads to a trade-off in speed and accuracy, and while all models outperform ORB in homography estimation, only the larger models approach SIFT’s performance; performing about 1-7% worse. Training for longer and with additional types of data might give the push needed to outperform SIFT. While the smallest models are 3× faster and use 50× fewer parameters than the learnt baseline, they still require 3× as much time as SIFT while performing about 10-30% worse. However, there is still room for improvement through optimization methods that go beyond architecture modification, e.g. quantization, which might make the method faster than SIFT. / Nyckelpunkts-detektion och deskriptor-skapande är det första steget av homografi och essentiell matris estimering, vilket i sin tur används inom Visuell Odometri och Visuell SLAM. Det här arbetet utforskar effekten (i form av snabbhet och exakthet) av användandet av olika djupinlärnings-arkitekturer för sådana nyckelpunkter. De hel-faltade nätverken – med huvuden för både detektorn och deskriptorn – tränas genom en existerande själv-handledd metod, där korrespondenser fås genom kända slumpmässigt valda homografier. En ny strategi för valet av negativa korrespondenser för deskriptorns träning presenteras, vilket möjliggör mer flexibilitet i designen av arkitektur. Den nya strategin visar sig vara väsentlig då den möjliggör nätverk som presterar bättre än den lärda baslinjen utan någon kostnad i inferenstid. Varieringen av modellstorleken leder till en kompromiss mellan snabbhet och exakthet, och medan alla modellerna presterar bättre än ORB i homografi-estimering, så är det endast de större modellerna som närmar sig SIFTs prestanda; där de presterar 1-7% sämre. Att träna längre och med ytterligare typer av data kanske ger tillräcklig förbättring för att prestera bättre än SIFT. Även fast de minsta modellerna är 3× snabbare och använder 50× färre parametrar än den lärda baslinjen, så kräver de fortfarande 3× så mycket tid som SIFT medan de presterar runt 10-30% sämre. Men det finns fortfarande utrymme för förbättring genom optimeringsmetoder som övergränsar ändringar av arkitekturen, som till exempel kvantisering, vilket skulle kunna göra metoden snabbare än SIFT.
|
23 |
Image transition techniques using projective geometryWong, Tzu Yen January 2009 (has links)
[Truncated abstract] Image transition effects are commonly used on television and human computer interfaces. The transition between images creates a perception of continuity which has aesthetic value in special effects and practical value in visualisation. The work in this thesis demonstrates that better image transition effects are obtained by incorporating properties of projective geometry into image transition algorithms. Current state-of-the-art techniques can be classified into two main categories namely shape interpolation and warp generation. Many shape interpolation algorithms aim to preserve rigidity but none preserve it with perspective effects. Most warp generation techniques focus on smoothness and lack the rigidity of perspective mapping. The affine transformation, a commonly used mapping between triangular patches, is rigid but not able to model perspective effects. Image transition techniques from the view interpolation community are effective in creating transitions with the correct perspective effect, however, those techniques usually require more feature points and algorithms of higher complexity. The motivation of this thesis is to enable different views of a planar surface to be interpolated with an appropriate perspective effect. The projective geometric relationship which produces the perspective effect can be specified by two quadrilaterals. This problem is equivalent to finding a perspectively appropriate interpolation for projective transformation matrices. I present two algorithms that enable smooth perspective transition between planar surfaces. The algorithms only require four point correspondences on two input images. ...The second algorithm generates transitions between shapes that lie on the same plane which exhibits a strong perspective effect. It recovers the perspective transformation which produces the perspective effect and constrains the transition so that the in-between shapes also lie on the same plane. For general image pairs with multiple quadrilateral patches, I present a novel algorithm that is transitionally symmetrical and exhibits good rigidity. The use of quadrilaterals, rather than triangles, allows an image to be represented by a small number of primitives. This algorithm uses a closed form force equilibrium scheme to correct the misalignment of the multiple transitional quadrilaterals. I also present an application for my quadrilateral interpolation algorithm in Seitz and Dyer's view morphing technique. This application automates and improves the calculation of the reprojection homography in the postwarping stage of their technique. Finally I unify different image transition research areas into a common framework, this enables analysis and comparison of the techniques and the quality of their results. I highlight that quantitative measures can greatly facilitate the comparisons among different techniques and present a quantitative measure based on epipolar geometry. This novel quantitative measure enables the quality of transitions between images of a scene from different viewpoints to be quantified by its estimated camera path.
|
24 |
Compréhension de scènes urbaines par combinaison d'information 2D/3D / Urban scenes understanding by combining 2D/3D informationBauda, Marie-Anne 13 June 2016 (has links)
Cette thèse traite du problème de segmentation sémantique d'une séquence d'images calibrées acquises dans un environnement urbain. Ce problème consiste, plus précisément, à partitionner chaque image en régions représentant les objets de la scène (façades, routes, etc.). Ainsi, à chaque région est associée une étiquette sémantique. Dans notre approche, l'étiquetage s'opère via des primitives visuelles de niveau intermédiaire appelés super-pixels, lesquels regroupent des pixels similaires au sens de différents critères proposés dans la littérature, qu'ils soient photométriques (s'appuyant sur les couleurs) ou géométriques (limitant la taille des super-pixels formés). Contrairement à l'état de l'art, où les travaux récents traitant le même problème s'appuient en entrée sur une sur-segmentation initiale sans la remettre en cause, notre idée est de proposer, dans un contexte multi-vues, une nouvelle approche de constructeur de superpixels s'appuyant sur une analyse tridimensionnelle de la scène et, en particulier, de ses structures planes. Pour construire de «meilleurs» superpixels, une mesure de planéité locale, qui quantifie à quel point la zone traitée de l'image correspond à une surface plane de la scène, est introduite. Cette mesure est évaluée à partir d'une rectification homographique entre deux images proches, induites par un plan candidat au support des points 3D associés à la zone traitée. Nous analysons l'apport de la mesure UQI (Universal Quality Image) et montrons qu'elle se compare favorablement aux autres métriques qui ont le potentiel de détecter des structures planes. On introduit ensuite un nouvel algorithme de construction de super-pixels, fondé sur l'algorithme SLIC (Simple Linear Iterative Clustering) dont le principe est de regrouper les plus proches voisins au sens d'une distance fusionnant similarités en couleur et en distance, et qui intègre cette mesure de planéité. Ainsi la sur-segmentation obtenue, couplée à la cohérence interimages provenant de la validation de la contrainte de planéité locale de la scène, permet d'attribuer une étiquette à chaque entité et d'obtenir ainsi une segmentation sémantique qui partitionne l'image en objets plans. / This thesis deals with the semantic segmentation problem of a calibrated sequence of images acquired in an urban environment. The problem is, specifically, to partition each image into regions representing the objects in the scene such as facades, roads, etc. Thus, each region is associated with a semantic tag. In our approach, the labelling is done through mid-level visual features called super-pixels, which are groups of similar pixels within the meaning of some criteria proposed in research such as photometric criteria (based on colour) or geometrical criteria thus limiting the size of super-pixel formed. Unlike the state of the art, where recent work addressing the same problem are based on an initial over-segmentation input without calling it into question, our idea is to offer, in a multi-view environment, another super-pixel constructor approach based on a three-dimensional scene analysis and, in particular, an analysis of its planar structures. In order to construct "better" super-pixels, a local flatness measure is introduced which quantifies at which point the zone of the image in question corresponds to a planar surface of the scene. This measure is assessed from the homographic correction between two close images, induced by a candidate plan as support to the 3D points associated with the area concerned. We analyze the contribution of the UQI measure (Universal Image Quality) and demonstrate that it compares favorably with other metrics which have the potential to detect planar structures. Subsequently we introduce a new superpixel construction algorithm based on the SLIC (Simple Linear Iterative Clustering) algorithm whose principle is to group the nearest neighbors in terms of a distance merging similarities in colour and distance, and which includes this local planarity measure. Hence the over-segmentation obtained, coupled with the inter-image coherence as a result of the validation of the local flatness constraint related to the scene, allows assigning a label to each entity and obtaining in this way a semantic segmentation which divides the image into planar objects.
|
25 |
A calibration method for laser-triangulating 3D cameras / En kalibreringsmetod för lasertriangulerande 3D-kamerorAndersson, Robert January 2008 (has links)
A laser-triangulating range camera uses a laser plane to light an object. If the position of the laser relative to the camera as well as certrain properties of the camera is known, it is possible to calculate the coordinates for all points along the profile of the object. If either the object or the camera and laser has a known motion, it is possible to combine several measurements to get a three-dimensional view of the object. Camera calibration is the process of finding the properties of the camera and enough information about the setup so that the desired coordinates can be calculated. Several methods for camera calibration exist, but this thesis proposes a new method that has the advantages that the objects needed are relatively inexpensive and that only objects in the laser plane need to be observed. Each part of the method is given a thorough description. Several mathematical derivations have also been added as appendices for completeness. The proposed method is tested using both synthetic and real data. The results show that the method is suitable even when high accuracy is needed. A few suggestions are also made about how the method can be improved further.
|
26 |
Eye Tracking in User Interfaces / Eye Tracking in User InterfacesJurzykowski, Michal January 2012 (has links)
Tato diplomová práce byla vytvořena během studijního pobytu na Uviversity of Estern Finland, Joensuu, Finsko. Tato diplomová práce se zabývá využitím technologie sledování pohledu neboli také sledování pohybu očí (Eye-Tracking) pro interakci člověk-počítač (Human-Computer Interaction (HCI)). Navržený a realizovaný systém mapuje pozici bodu pohledu/zájmu (the point of gaze), která odpovídá souřadnicím v souřadnicovém systému kamery scény do souřadnicového systému displeje. Zároveň tento systém kompenzuje pohyby uživatele a tím odstraňuje jeden z hlavních problémů využití sledování pohledu v HCI. Toho je dosaženo díky stanovení transformace mezi projektivním prostorem scény a projektivním prostorem displeje. Za použití význačných bodů (interesting points), které jsou nalezeny a popsány pomocí metody SURF, vyhledání a spárování korespondujících bodů a vypočítání homografie. Systém byl testován s využitím testovacích bodů, které byly rozložené po celé ploše displeje.
|
27 |
Convolutional Neural Network Optimization for Homography EstimationDiMascio, Michelle Augustine January 2018 (has links)
No description available.
|
28 |
Padel court detection systemWennerblom, David, Arronet, Andrey January 2023 (has links)
The aim of this thesis is to examine the possibility of a court detection program for sports videos that can identify the court even when some important elements are not visible. The study will also analyze what external factors may impact the program's accuracy in detecting all relevant elements. These questions are answered through a combination of computer vision techniques and algorithms. The study utilizes Design Science Research (DSR) as its research methodology to develop an artifact. A dataset of padel sports videos are evaluated to measure the artifacts accuracy. The artifact utilizes multiple computer vision techniques from the OpenCV library to detect relevant lines and edges and project them onto the frame using a predetermined court model as reference. The findings indicated that the developed artifact demonstrated a relatively consistent level of accuracy in court detection across multiple courts, whenever a detection was made. However, the frequency of successful detections exhibited some inconsistency. The research also found that external factors did not significantly influence the accuracy of court detection, yet they posed challenges to the program's overall consistency.
|
29 |
Filtering Techniques for Pose Estimation with Applications to Unmanned Air VehiclesReady, Bryce Benson 29 November 2012 (has links) (PDF)
This work presents two novel methods of estimating the state of a dynamic system in a Kalman Filtering framework. The first is an application specific method for use with systems performing Visual Odometry in a mostly planar scene. Because a Visual Odometry method inherently provides relative information about the pose of a platform, we use this system as part of the time update in a Kalman Filtering framework, and develop a novel way to propagate the uncertainty of the pose through this time update method. Our initial results show that this method is able to reduce localization error significantly with respect to pure INS time update, limiting drift in our test system to around 30 meters for tens of seconds. The second key contribution of this work is the Manifold EKF, a generalized version of the Extended Kalman Filter which is explicitly designed to estimate manifold-valued states. This filter works for a large number of commonly useful manifolds, and may have applications to other manifolds as well. In our tests, the Manifold EKF demonstrated significant advantages in terms of consistency when compared to other filtering methods. We feel that these promising initial results merit further study of the Manifold EKF, related filters, and their properties.
|
30 |
Estimations of 3D velocities from a single camera view in ice hockey / Beräkningar av 3D-hastigheter från en kameravinkel i ishockeyBjering, Beatrice January 2019 (has links)
Ice hockey is a contact sport with a high risk of brain injuries such as concussions. This is a serious health concern and there is a need of better understanding of the relationship between the kinematics of the head and concussions. The velocity and the direction of impact are factors that might affect the severity of the concussions. Therefore the understanding of concussions can be improved by extracting velocities from video analysis. In this thesis a prototype to extract 3D velocities from one single camera view was developed by using target tracking algorithms and homography. A validation of the method was done where the mean error was estimated to 21.7%. The prototype evaluated 60 cases of tackles where 30 resulted in concussions and the other 30 tackles did not result in concussions. No significant difference in the velocities between the two groups could be found. The mean velocity for the tackles that resulted in concussions were 6.55 m/s for the attacking player and 4.59 m/s for the injured player. The prototype was also compared with velocities extracted through SkillSpector from a previous bachelor thesis. There was a significant difference between the velocities compiled with SkillSpector and the developed prototype in this thesis. A validation of SkillSpector was also made, which showed that it had a mean error of 37.4%. / Ishockey är en kontaktsport med hög risk för hjärnskador, så som hjärnskakningar. Detta är ett stort hälsoproblem och det finns ett behov av större förståelse mellan huvudets kinematik och hjärnskakningar. Hastigheten och riktningen av kollisionerna är faktorer som kan påverka svårighetsgraden av hjärnskakningarna. Därför kan förståelsen av hjärnskakningar förbättras genom att extrahera hastigheter med videoanalys. I denna rapport utvecklades en prototyp för att ta fram 3D hastigheter från en kameravinkel genom att använda målsökningsalgoritmer och homografi. En validering av prototypen gjordes där medelfelet uppskattades till 21.7%. Prototypen utvärderade även 60 fall av tacklingar där 30 resulterade hjärnskakningar och där de andra 30 tacklingarna inte resulterade i hjärnskakningar. Ingen signifikant skillnad mellan de två grupperna kunde påvisas. Medelhastigheten för tacklingarna som resulterade i hjärnskakning var 6.55 m/s för den attackerande spelaren och 4.59 m/s för den skadade spelaren. Prototypen jämfördes också med hastigheter som tagits fram med SkillSpector i ett tidigare kandidatexamensarbete. Det var en signifikant skillnad mellan de hastigheter som togs fram med prototypen och de som tog fram med SkillSpector. En validering av SkillSpector gjordes också, som visade att medelfelet var 37.4%.
|
Page generated in 0.0416 seconds