Spelling suggestions: "subject:"landmark detection"" "subject:"sandmark detection""
1 |
Driver's Gaze Zone Estimation in Realistic Driving Environment by KinectLuo, Chong 07 September 2018 (has links)
Driver's distraction is one of the main areas, which researchers are focusing on, in design of Advanced Drivers Assistance Systems (ADASs). Head pose and eye-gaze direction are two reliable indicators of a driver's gaze and the current focus of attention. Compared with other methods that make use of head pose only, methods that combine eye information can achieve higher accuracy. The naturalistic driving environment always presents unique challenges (e.g., unstable illumination, jolts, etc.) to video-based gaze estimation and tracking systems. Some methods can achieve relatively high proficiency in the stationary laboratory environment, but they may not be suitable for the unstable driving environment. In addition, performing in real time or near-real time is another consideration for gaze estimation in an ADAS. Therefore, these special challenges need to be overcome to design ADASs.
In this thesis, we proposed a new driver's gaze zone estimation framework designed for the naturalistic driving environment. The framework combines head and eye information to estimate the gaze zone of the driver in both daytime and nighttime. The framework is composed of five main components: Facial Landmark Detection, Head Pose Estimation, Iris Center Detection, Upper Eyelid Information Extraction, and Gaze Zone Estimation. First, Constrained Local Neural Field (CLNF) is applied to obtain the facial landmarks in the image plane and the 3D model of the face in the object frame. In addition, extracting region of interest (ROI) is utilized as an optimization strategy for CLNF facial landmark detection. Second, head pose estimation can be regarded as a Perspective-n-Point (PnP) problem. Levenberg-Marquardt optimization method is used to solve the PnP problem based on the 2D landmark locations in the image plane and their corresponding 3D locations in the object frame. Third, a regression model-based method is employed to obtain the iris center from eye landmarks detected in the previous part. For upper eyelid information extraction, a quadratic function is utilized to model the upper eyelid, and the second-order coefficient is extracted. Finally, the head pose and the eye information are combined to form a feature vector, and Random Decision Forest classifier is utilized to estimate the current gaze zone of the driver from the feature vector extracted.
The experiment is carried out in the realistic driving environment in both daytime and nighttime with three volunteers by Kinect sensor V2 for Windows that is put at the back of windshield. Weighted and unweighted accuracy are utilized as evaluation metrics in gaze zone estimation. Weighted accuracy evaluates gaze zones with different significance while unweighted accuracy treats each gaze zone equally. Experiment results show that the gaze zone estimation framework proposed in this work has better performance compared to the reference in the daytime. The weighted and unweighted accuracy of gaze zone estimation reach 96.6% and 95.0% for daytime, respectively. For nighttime, the weighted and unweighted accuracy can reach 96% and 91.4%.
|
2 |
An Implementation Of Mono And Stereo Slam System Utilizing Efficient Map Management StrategyKalay, Adnan 01 September 2008 (has links) (PDF)
For an autonomous mobile robot, localization and map building are vital capabilities. The localization ability provides the robot location information, so the robot can navigate in the environment. On the other hand, the robot can interact with its environment using a model of the environment (map information) which is provided by map building mechanism. These two capabilities depends on each other and simultaneous operation of them is called SLAM (Simultaneous Localization and Map Building). While various sensors are used for this algorithm, vision-based approaches are relatively new and have attracted more interest in recent years.
In this thesis work, a versatile Visual SLAM system is constructed and presented. In the core of this work is a vision-based simultaneous localization and map building algorithm which uses point features in the environment as visual landmarks and Extended Kalman Filter for state estimation. A detailed analysis of this algorithm is made including state estimation, feature extraction and data association steps. The algorithm is extended to be used for both stereo and single camera systems. The core of both algorithms is same and we mention the differences of both algorithms originated from the measurement dissimilarity. The algorithm is run also in different motion modes, namely predefined, manual and autonomous. Secondly, a map management strategy is developed especially for extended environments. When the robot runs the SLAM algorithm in large environments, the constructed map contains a great number of landmarks obviously. The efficiency algorithm takes part, when the total number of features exceeds a critical value for the system. In this case, the current map is rarefied without losing the geometrical distribution of the landmarks. Furthermore, a well-organized graphical user interface is implemented which enables the operator to select operational modes, change various parameters of the main SLAM algorithm and see the results of the SLAM operation both textually and graphically. Finally, a basic mission concept is defined in our system, in order to illustrate what robot can do using the outputs of the SLAM algorithm. All of these ideas mentioned are implemented in this thesis, experiments are conducted using a real robot and the analysis results are discussed by comparing the algorithm outputs with ground-truth measurements.
|
3 |
Nejistota modelů hlubokého učení při analýze lékařských obrazových dat / Deep Learning Model Uncertainty in Medical Image AnalysisDrevický, Dušan January 2019 (has links)
Táto práca sa zaoberá určením neistoty v predikciách modelov hlbokého učenia. Aj keď sa týmto modelom darí dosahovať vynikajúce výsledky v mnohých oblastiach počítačového videnia, ich výstupy sú väčšinou deterministické a neposkytujú mnoho informácií o tom, ako si je model istý svojou predpoveďou. To je obzvlášť dôležité pri analýze lekárskych obrazových dát, kde môžu mať omyly vysokú cenu a schopnosť detekovať neisté predikcie by umožnila dohliadajúcemu lekárovi spracovať relevantné prípady manuálne. V tejto práci aplikujem niekoľko rôznych metrík vyvinutých v nedávnom výskume pre určenie neistoty na modely hlbokého učenia natrénované pre lokalizáciu cefalometrických landmarkov. Následne ich vyhodnotím a porovnávam v sade experimentov, ktorých úlohou je určiť, nakoľko jednotlivé metriky poskytujú užitočnú informáciu o tom, ako si je model istý svojou predpoveďou.
|
4 |
A study on the integration of phonetic landmarks into large vocabulary continuous speech decoding / Une étude sur l'intégration de repères phonétiques dans le décodage de la parole continue à grand vocabulaireZiegler, Stefan 17 January 2014 (has links)
Cette thèse étudie l'intégration de repères phonétiques dans la reconnaissance automatique de la parole (RAP) continue à grand vocabulaire. Les repères sont des événements à temps discret indiquant la présence d’événements phonétiques dans le signal de parole. Le but est de développer des détecteurs de repères qui sont motivés par la connaissance phonétique afin de modéliser quelques événements phonétiques plus précisément. La thèse présente deux approches de détection de repères, qui utilisent l'information extraite par segments et étudie deux méthodes différentes pour intégrer les repères dans le décodage, qui sont un élagage basé sur les repères et une approche reposant sur les combinaisons pondérées. Alors que les deux approches de détection de repères présentées améliorent les performance de reconnaissance de la parole comparées à l'approche de référence, elles ne surpassent pas les prédictions phonétiques standards par trame. Ces résultats indiquant que la RAP guidée par des repères nécessite de l'information phonétique très hétérogène pour être efficace, la thèse présente une troisième méthode d'intégration conçue pour intégrer un nombre arbitraire de flux de repères hétérogènes et asynchrones dans la RAP. Les résultats indiquent que cette méthode est en effet en mesure d'améliorer le système de référence, pourvu que les repères fournissent de l'information complémentaire aux modèles acoustiques standards. / This thesis studies the integration of phonetic landmarks into standard statistical large vocabulary continuous speech recognition (LVCSR). Landmarks are discrete time instances that indicate the presence of phonetic events in the speech signal. The goal is to develop landmark detectors that are motivated by phonetic knowledge in order to model selected phonetic classes more precisely than it is possible with standard acoustic models. The thesis presents two landmark detection approaches, which make use of segment-based information and studies two different methods to integrate landmarks into the decoding, which are landmark-based pruning and a weighted combination approach. While both approaches improve speech recognition performance compared to the baseline using weighted combination of landmarks and acoustic scores during decoding, they do not outperform standard frame-based phonetic predictions. Since these results indicate that landmark-driven LVCSR requires the integration of very heterogeneous information, the thesis presents a third integration framework that is designed to integrate an arbitrary number of heterogeneous and asynchronous landmark streams into LVCSR. The results indicate that this framework is indeed ale to improve the baseline system, as soon as landmarks provide complementary information to the regular acoustic models.
|
5 |
Vision-based Robot Localization Using Artificial And Natural LandmarksArican, Zafer 01 August 2004 (has links) (PDF)
In mobile robot applications, it is an important issue for a robot to know where it is. Accurate localization becomes crucial for navigation and map building applications because both route to follow and positions of the objects to be inserted into the map highly depend on the position of the robot in the environment.
For localization, the robot uses the measurements that it takes by various devices such as laser rangefinders, sonars, odometry devices and vision. Generally these devices give the distances of the objects in the environment to the robot and proceesing these distance information, the robot finds its location in the environment.
In this thesis, two vision-based robot localization algorithms are implemented. The first algorithm uses artificial landmarks as the objects around the robot and by measuring the positions of these landmarks with respect to the camera system, the robot locates itself in the environment. Locations of these landmarks are known. The second algorithm instead of using artificial landmarks, estimates its location by measuring the positions of the objects that naturally exist in the environment. These objects are treated as natural landmarks and locations of these landmarks are not
known initially.
A three-wheeled robot base on which a stereo camera system is mounted is used as the mobile robot unit. Processing and control tasks of the system is performed by a stationary PC. Experiments are performed on this robot system. The stereo camera system is the measurement device for this robot.
|
6 |
Biometrické rozpoznávání 3D modelů obličeje / Biometric Recognition of 3D FacesMichálek, Martin January 2014 (has links)
This thesis is about biometric 3D face recognition . A general biometric system as well as functioning of biometric system are present . Techniques used in 2D and 3D face recognition are described . Finally , an automatic biometric system for 3D face recognition is proposed and implemeted . This system divide face for areas by position of detected landmarks . Particular areas are compared separately . Final system fusion results from Eigenfaces and ARENA algorithms .
|
7 |
Traitement d'images de radiographie à faible dose : Débruitage et rehaussement de contraste conjoints et détection automatique de points de repère anatomiques pour l'estimation de la qualité des images / Low dose x-ray image processing : Joint denoising and contrast enhancement, and automatic detection of anatomical landmarks for image quality estimationIrrera, Paolo 17 June 2015 (has links)
Nos travaux portent sur la réduction de la dose de rayonnement lors d'examens réalisés avec le Système de radiologie EOS. Deux approches complémentaires sont étudiées. Dans un premier temps, nous proposons une méthode de débruitage et de rehaussement de contraste conjoints pour optimiser le compromis entre la qualité des images et la dose de rayons X. Nous étendons le filtre à moyennes non locales pour restaurer les images EOS. Nous étudions ensuite comment combiner ce filtre à une méthode de rehaussement de contraste multi-échelles. La qualité des images cliniques est optimisée grâce à des fonctions limitant l'augmentation du bruit selon la quantité d’information locale redondante captée par le filtre. Dans un deuxième temps, nous estimons des indices d’exposition (EI) sur les images EOS afin de donner aux utilisateurs un retour immédiat sur la qualité de l'image acquise. Nous proposons ainsi une méthode reposant sur la détection de points de repère qui, grâce à l'exploitation de la redondance de mesures locales, est plus robuste à la présence de données aberrantes que les méthodes existantes. En conclusion, la méthode de débruitage et de rehaussement de contraste conjoints donne des meilleurs résultats que ceux obtenus par un algorithme exploité en routine clinique. La qualité des images EOS peut être quantifiée de manière robuste par des indices calculés automatiquement. Étant donnée la cohérence des mesures sur des images de pré-affichage, ces indices pourraient être utilisés en entrée d'un système de gestion automatique des expositions. / We aim at reducing the ALARA (As Low As Reasonably Achievable) dose limits for images acquired with EOS full-body system by means of image processing techniques. Two complementary approaches are studied. First, we define a post-processing method that optimizes the trade-off between acquired image quality and X-ray dose. The Non-Local means filter is extended to restore EOS images. We then study how to combine it with a multi-scale contrast enhancement technique. The image quality for the diagnosis is optimized by defining non-parametric noise containment maps that limit the increase of noise depending on the amount of local redundant information captured by the filter. Secondly, we estimate exposure index (EI) values on EOS images which give an immediate feedback on image quality to help radiographers to verify the correct exposure level of the X-ray examination. We propose a landmark detection based approach that is more robust to potential outliers than existing methods as it exploits the redundancy of local estimates. Finally, the proposed joint denoising and contrast enhancement technique significantly increases the image quality with respect to an algorithm used in clinical routine. Robust image quality indicators can be automatically associated with clinical EOS images. Given the consistency of the measures assessed on preview images, these indices could be used to drive an exposure management system in charge of defining the optimal radiation exposure.
|
8 |
Navigation and Information System for Visually Impaired / Navigation and Information System for Visually ImpairedHrbáček, Jan January 2018 (has links)
Poškození zraku je jedním z nejčastějších tělesných postižení -- udává se, že až 3 % populace trpí vážným poškozením nebo ztrátou zraku. Oslepnutí výrazně zhoršuje schopnost orientace a pohybu v okolním prostředí -- bez znalosti uspořádání prostoru, jinak získané převážně pomocí zraku, postižený zkrátka neví, kudy se pohybovat ke svému cíli. Obvyklým řešením problému orientace v neznámých prostředích je doprovod nevidomého osobou se zdravým zrakem; tato služba je však velmi náročná a nevidomý se musí plně spolehnout na doprovod. Tato práce zkoumá možnosti, kterými by bylo možné postiženým ulehčit orientaci v prostoru, a to využitím existujících senzorických prostředků a vhodného zpracování jejich dat. Téma je zpracováno skrze analogii s mobilní robotikou, v jejímž duchu je rozděleno na část lokalizace a plánování cesty. Zatímco metody plánování cesty jsou vesměs k dispozici, lokalizace chodce často trpí značnými nepřesnostmi určení polohy a komplikuje tak využití standardních navigačních přístrojů nevidomými uživateli. Zlepšení odhadu polohy může být dosaženo vícero cestami, zkoumanými analytickou kapitolou. Předložená práce prvně navrhuje fúzi obvyklého přijímače systému GPS s chodeckou odometrickou jednotkou, což vede k zachování věrného tvaru trajektorie na lokální úrovni. Pro zmírnění zbývající chyby posunu odhadu je proveden návrh využití přirozených význačných bodů prostředí, které jsou vztaženy ke globální referenci polohy. Na základě existujících formalismů vyhledávání v grafu jsou zkoumána kritéria optimality vhodná pro volbu cesty nevidomého skrz městské prostředí. Generátor vysokoúrovňových instrukcí založený na fuzzy logice je potom budován s motivací uživatelského rozhraní působícího lidsky; doplňkem je okamžitý haptický výstup korigující odchylku směru. Chování navržených principů bylo vyhodnoceno na základě realistických experimentů zachycujících specifika cílového městského prostředí. Výsledky vykazují značná zlepšení jak maximálních, tak středních ukazatelů chyby určení polohy.
|
9 |
Computer vision approaches for quantitative analysis of microscopy imagesBahry, Ella 23 November 2021 (has links)
Mikroskopaufnahmen kompletter Organismen und ihrer Entwicklung ermöglichen die Erforschung ganzer Organismen oder Systeme und erzeugen Datensätze im Terabyte-Bereich. Solche großen Datensätze erfordern die Entwicklung von Computer-Vision-Tools, um Aufgaben wie Erkennung, Segmentierung, Klassifizierung und Registrierung durchzuführen. Es ist wünschenswert, Computer-Vision-Tools zu entwickeln, die nur eine minimale Menge an manuell annotierten Trainingsdaten benötigen. Ich demonstriere derartige Anwendungen in drei Projekte.
Zunächst stelle ich ein Tool zur automatischen Registrierung von Drosophila-Flügeln (verschiedener Spezies) unter Verwendung von Landmarkenerkennung vor, das für die Untersuchung der Funktionsweise von Enhancern eingesetzt wird. Ich vergleiche die Leistung eines Shape-Model-Ansatzes mit der eines kleinen neuronalen Netz bei der Verfügbarkeit von nur 20 Trainingsbeispiele. Beide Methoden schneiden gut ab und ermöglichen eine präzise Registrierung von Tausenden von Flügeln.
Das zweite Projekt ist ein hochauflösendes Zellkernmodell des C. elegans, das aus einem nanometeraufgelösten Elektronenmikroskopiedatensatz einer ganzen Dauerlarve erstellt wird. Diese Arbeit ist der erste Atlas der Dauerdiapause von C. elegans, der jemals erstellt wurde, und enthüllt die Anzahl der Zellkerne in diesem Stadium.
Schließlich stelle ich eine Bildanalysepipeline vor, an der ich zusammen mit Laura Breimann und anderen gearbeitet habe. Die Pipeline umfasst die Punkterkennung von Einzelmolekül-Fluoreszenz-In-situ-Hybridisierung (smFISH), die Segmentierung von Objekten und die Vorhersage des Embryonalstadiums.
Mit diesen drei Beispielen demonstriere ich sowohl generische Ansätze zur computergestützten Modellierung von Modellorganismen als auch maßgeschneiderte Lösungen für spezifische Probleme und die Verschiebung des Feldes in Richtung Deep-Learning. / Microscopy images of entire organisms and their development allows research in whole organisms or systems, producing terabyte scale datasets. Such big datasets require the development of computer vision tools to perform tasks such as detection, segmentation, classification, and registration. It is desirable to develop computer vision tools that require minimal manually annotated training data. I demonstrate such applications in three projects.
First, I present a tool for automatic Drosophila wing (of various species) registration using landmark detection, for its application in studying enhancer function. I compare the performance of a shape model technique to a small CNN requiring only 20 training examples. Both methods perform well, and enable precise registration of thousands of wings.
The second project is a high resolution nucleus model of the C. elegans, constructed from a nanometer-resolved electron microscopy dataset of an entire dauer larva. The nucleus model is constructed using a classical dynamic programing approach as well as a CNN approach. The resulting model is accessible via a web-based (CATMAID) open source and open access resource for the community. I also developed a CATMAID plugin for the annotation of segmentation objects (here, nucleus identity). This work is the first atlas of the C. elegans dauer diapause ever created and unveils the number of nuclei at that stage.
Lastly, I detail an image analysis pipeline I collaborated on with Laura Breimann and others. The pipeline involves single molecule fluorescence in situ hybridization (smFISH) spot detection, object segmentation, and embryo stage prediction. The pipeline is used to study the dynamics of X specific transcriptional repression by condensin in the C. elegans embryo.
With these three examples, I demonstrate both generic approaches to computational modeling of model organisms, as well as bespoke solutions to specific problems, and the shift in the field towards deep learning.
|
Page generated in 0.0989 seconds