Global ETD Search

11	JEZIK: A Cognitive Translation System Employing a Single, Visible Spectrum Tracking Detector Bzik, Davor 01 June 2016 (has links) A link between eye movement mechanics and the mental processing associated with text reading has been established in the past. The pausing of an eye gaze on a specific word within a sentence reflects correctness or fluency of a translated text. A cognitive translation system has been built employing a single, inexpensive web camera without the use of infrared illumination. It was shown that the system translates the text, detects rarely occurring and out-of-context words from eye gaze information, and provides solutions in real time while the user is still reading. The solutions are in form of a translation, definition or synonym for the word in question. The only effort required is that of reading. pupil tracking without infrared web-camera eye tracking eye gaze estimation cognitive translation the Jezik system assistive translation inexpensive translation assistant reading assistant Electrical and Computer Engineering
12	Unconstrained Gaze Estimation Using RGB-D Camera. / Estimation du regard avec une caméra RGB-D dans des environnements utilisateur non-contraints Kacete, Amine 15 December 2016 (has links) Dans ce travail, nous avons abordé le problème d’estimation automatique du regard dans des environnements utilisateur sans contraintes. Ce travail s’inscrit dans la vision par ordinateur appliquée à l’analyse automatique du comportement humain. Plusieurs solutions industrielles sont aujourd’hui commercialisées et donnent des estimations précises du regard. Certaines ont des spécifications matérielles très complexes (des caméras embarquées sur un casque ou sur des lunettes qui filment le mouvement des yeux) et présentent un niveau d’intrusivité important, ces solutions sont souvent non accessible au grand public. Cette thèse vise à produire un système d’estimation automatique du regard capable d’augmenter la liberté du mouvement de l’utilisateur par rapport à la caméra (mouvement de la tête, distance utilisateur-capteur), et de réduire la complexité du système en utilisant des capteurs relativement simples et accessibles au grand public. Dans ce travail, nous avons exploré plusieurs paradigmes utilisés par les systèmes d’estimation automatique du regard. Dans un premier temps, Nous avons mis au point deux systèmes basés sur deux approches classiques: le premier basé caractéristiques et le deuxième basé semi apparence. L’inconvénient majeur de ces paradigmes réside dans la conception des systèmes d'estimation du regard qui supposent une indépendance totale entre l'image d'apparence des yeux et la pose de la tête. Pour corriger cette limitation, Nous avons convergé vers un nouveau paradigme qui unifie les deux blocs précédents en construisant un espace regard global, nous avons exploré deux directions en utilisant des données réelles et synthétiques respectivement. / In this thesis, we tackled the automatic gaze estimation problem in unconstrained user environments. This work takes place in the computer vision research field applied to the perception of humans and their behaviors. Many existing industrial solutions are commercialized and provide an acceptable accuracy in gaze estimation. These solutions often use a complex hardware such as range of infrared cameras (embedded on a head mounted or in a remote system) making them intrusive, very constrained by the user's environment and inappropriate for a large scale public use. We focus on estimating gaze using cheap low-resolution and non-intrusive devices like the Kinect sensor. We develop new methods to address some challenging conditions such as head pose changes, illumination conditions and user-sensor large distance. In this work we investigated different gaze estimation paradigms. We first developed two automatic gaze estimation systems following two classical approaches: feature and semi appearance-based approaches. The major limitation of such paradigms lies in their way of designing gaze systems which assume a total independence between eye appearance and head pose blocks. To overcome this limitation, we converged to a novel paradigm which aims at unifying the two previous components and building a global gaze manifold, we explored two global approaches across the experiments by using synthetic and real RGB-D gaze samples. Estimation du regard Caméra RGB-D Suivi de la pupille Champs aléatoires Apprentissage automatique Gaze estimation RGB-D Camera Eye-pupil localization Random Forest Machine learning
13	3D Gaze Estimation on RGB Images using Vision Transformers Li, Jing January 2023 (has links) Gaze estimation, a vital component in numerous applications such as humancomputer interaction, virtual reality, and driver monitoring systems, is the process of predicting the direction of an individual’s gaze. The predominant methods for gaze estimation can be broadly classified into intrusive and nonintrusive approaches. Intrusive methods necessitate the use of specialized hardware, such as eye trackers, while non-intrusive methods leverage images or recordings obtained from cameras to make gaze predictions. This thesis concentrates on appearance-based gaze estimation, specifically within the non-intrusive domain, employing various deep learning models. The primary focus of this study is to compare the efficacy of Vision Transformers (ViTs), a recently introduced architecture, with Convolutional Neural Networks (CNNs) for gaze estimation on RGB images. Performance evaluations of the models are conducted based on metrics such as the angular gaze error, stimulus distance error, and model size. Within the realm of ViTs, two variants are explored: pure ViTs and hybrid ViTs, which combine both CNN and ViT architectures. Throughout the project, both variants are examined in different sizes. Experimental results demonstrate that all pure ViTs underperform in comparison to the baseline ResNet-18 model. However, the hybrid ViT consistently emerges as the best-performing model across all evaluation datasets. Nonetheless, the discussion regarding whether to deploy the hybrid ViT or stick with the baseline model remains unresolved. This uncertainty arises because utilizing an exceedingly large and slow model, albeit highly accurate, may not be the optimal solution. Hence, the selection of an appropriate model may vary depending on the specific use case. / Ögonblicksbedömning, en avgörande komponent inom flera tillämpningar såsom människa-datorinteraktion, virtuell verklighet och övervakningssystem för förare, är processen att förutsäga riktningen för en individs blick. De dominerande metoderna för ögonblicksbedömning kan i stort sett indelas i påträngande och icke-påträngande tillvägagångssätt. Påträngande metoder kräver användning av specialiserad hårdvara, såsom ögonspårare, medan ickepåträngande metoder utnyttjar bilder eller inspelningar som erhållits från kameror för att göra bedömningar av blicken. Denna avhandling fokuserar på utseendebaserad ögonblicksbedömning, specifikt inom det icke-påträngande området, genom att använda olika djupinlärningsmodeller. Studiens huvudsakliga fokus är att jämföra effektiviteten hos Vision Transformers (ViTs), en nyligen introducerad arkitektur, med Convolutional Neural Networks (CNNs) för ögonblicksbedömning på RGB-bilder. Prestandautvärderingar av modellerna utförs baserat på metriker som den vinkelmässiga felbedömningen av blicken, felbedömning av stimulusavstånd och modellstorlek. Inom ViTs-området utforskas två varianter: rena ViTs och hybrid-ViT, som kombinerar både CNN- och ViT-arkitekturer. Under projektet undersöks båda varianterna i olika storlekar. Experimentella resultat visar att alla rena ViTs presterar sämre jämfört med basmodellen ResNet-18. Hybrid-ViT framstår dock konsekvent som den bäst presterande modellen över alla utvärderingsdatauppsättningar. Diskussionen om huruvida hybrid-ViT ska användas eller om man ska hålla sig till basmodellen förblir dock olöst. Denna osäkerhet uppstår eftersom användning av en extremt stor och långsam modell, även om den är mycket exakt, kanske inte är den optimala lösningen. Valet av en lämplig modell kan därför variera beroende på det specifika användningsområdet. 3D Gaze Estimation Vision Transformers (ViTs) Convolutional Neural Networks (CNNs) Multi-Head Attention Red-Green-Blue (RGB) Images 3D Blickriktningsestimering Vision Transformers (ViTs) Konvolutionsneurala Nätverk (CNNs) Multi-Head Attention Röd-Grön-Blå (RGB) Bilder Computer and Information Sciences Data- och informationsvetenskap
14	3D Gaze Estimation on Near Infrared Images Using Vision Transformers / 3D Ögonblicksuppskattning på Nära Infraröda Bilder med Vision Transformers Vardar, Emil Emir January 2023 (has links) Gaze estimation is the process of determining where a person is looking, which has recently become a popular research area due to its broad range of applications. For example, tools that estimate gaze are used for research, medical diagnosis, virtual and augmented reality, driver assistance system, and many more. Therefore, better products are sought by many. Gaze estimation methods typically use images of only the eyes or the whole face to estimate the gaze since these methods are the most practical and convenient options. Recently, Convolutional Neural Networks (CNNs) have been appealing candidates for estimating the gaze. Nevertheless, the recent success of Vision Transformers (ViTs) in image classification tasks has introduced a new potential alternative. Hence, this work investigates the potential of using ViTs to estimate the gaze on Near-Infrared (NIR) images. This is done in terms of average error and computational complexity. Furthermore, this work examines not only pure ViTs but other models, such as hybrid ViTs and CNN-Formers, which combine CNNs and ViTs. The empirical results showed that hybrid ViTs are the only models that can outperform state-of-the-art CNNs such as MobileNetV2 and ResNet-18 while maintaining similar computational complexity to ResNet-18. The results on hybrid ViTs indicate that the convolutional stem is the most crucial part of them. Improved convolutional stems lead to better outcomes. Moreover, in this work, we defined a new training algorithm for hybrid ViTs, the hybrid Data-Efficient Image Transformer (DeiT) procedure, which has shown remarkable results. It is 3.5% better than the pretrained ResNet-18 while having the same time complexity. / Blickuppskattning är processen att uppskatta en persons blick, vilket nyligen har blivit ett populärt forskningsområde på grund av dess breda användningsområde. Till exempel, verktyg för blickuppskattning används inom forskning, medicinsk diagnos, virtuell och förstärkt verklighet, förarassistanssystem och för mycket mer. Därför, bättre produkter för blickuppskattning eftersträvas av många. Blickuppskattnings metoder vanligtvis använder bilder av endast ögonen eller hela ansiktet för att uppskatta blicken eftersom denna typen av metoder är de mest praktiska och lämliga alternativ. På sistånde har Convolutional Neural Networks (CNNs) varit tilltalande kandidater för att uppskatta blicken. Dock, har den senaste framgången med Vision Transformers (ViTs) i bildklassificeringsuppgifter introducerat ett nytt potentiellt alternativ. Därför undersöker detta arbete potentialen av att använda ViTs för att uppskatta blicken på Nära-infraröda (NIR) bilder. Undersökningen görs både i termer av medelfel och beräkningskomplexitet. Hursomhelst, detta arbete undersöker inte enbart rena ViTs utan andra modeller, som hybrida ViTs och CNN-Formers, som kombinerar CNNs och ViTs. De empiriska resultaten visade att hybrida ViTs är de enda modellerna som kan överträffa toppmoderna CNNs som MobileNetV2 och ResNet-18 samtidigt som de bibehåller liknande beräkningskomplexitet som ResNet-18. Resultaten på hybrida ViTs indikerar att faltningsstammen är den mest avgörande delen av dem. Det vill säga, desto bättre faltningsstamm en har desto bättre resultat kan man erhålla. Dessutom definierade vi i detta arbete en ny träningsalgoritm för hybrida ViTs, vilket vi kallar hybrida Data-Efficient Image Transformer (DeiT) procedur som har visat anmärkningsvärda resultat. Den är 3,5% bättre än den förtränade ResNet-18 samtidigt som den har samma tid komplexitet. Gaze estimation Eye tracking Vision Transformers (ViTs) Hybrid ViTs Deep learning Near-infrared (NIR) images Ögonblicksuppskattning Blickspårning Vision Transformers (ViTs) Hybrida ViTs Djupinlärning Nära-infraröda (NIR) bilder Signal Processing Signalbehandling Elektroteknik och elektronik

Page generated in 0.0894 seconds