Global ETD Search

1	Intégration de l’interaction au regard dans des systèmes optroniques : évaluation de l’influence du contexte / Integration of eye-gaze based interaction in optronic systems : evaluating the context Grosse, Romain 09 April 2018 (has links) Les nouvelles versions de produits optroniques de Safran Electronic & Defense comme les jumelles multifonctions ou les lunettes de visée disposent de plus en plus de fonctionnalités, ce qui rend nécessaire une amélioration des dispositifs d’interaction mis à disposition des utilisateurs. L’intégration du regard comme modalité d’interaction semble notamment intéressante en raison de son caractère rapide, naturel et disponible. Si pour des utilisateurs handicapés, l'interaction au regard est déjà bien développée, elle n'est pas encore une technologie mature pour des personnes valides. Lors d'interactions actives au regard, expressions explicites de l'intention d'agir d'un utilisateur, un problème appelé Midas Touch apparait : il s'agit de l'incapacité pour un utilisateur de dissocier les phases d'analyse et les phases d'action, parce que l’œil est avant tout un organe senseur. Plusieurs modalités d'interaction au regard cherchent à outrepasser ce problème : on peut par exemple utiliser un temps de fixation minimum sur un item pour l'activer (DwellTime) ou un temps de fixation minimum sur un item spécifique disposé à côté de l'item d'intérêt (DwellTime délocalisé) ou encore associer une autre modalité d'interaction pour spécifier l'intention d'activation (multimodalité oeil-bouton). Chacune de ces modalités dispose d'avantages et d'inconvénients spécifiques, et déterminer la modalité d'interaction la plus adaptée n'est pas une question triviale. C'est d'autant plus difficile que les performances des modalités semblent dépendre de facteurs extérieurs variables, c'est-à-dire du contexte d'utilisation de la modalité. Afin de mieux intégrer le suivi du regard dans des systèmes et de choisir quelle modalité utiliser, il est nécessaire de bien comprendre quels sont les éléments du contexte de l'interaction et comment ils agissent sur les modalités. Le but de cette thèse est de modéliser le contexte de la modalité d'interaction, c'est-à-dire de déterminer l'ensemble de éléments extérieurs à la modalité pouvant en influencer les performances. A partir d'un état de l'art et d'une étude de l'interaction au regard, nous proposons une séparation de ce contexte en quatre axes : l'utilisateur, la tâche, le système et l'environnement. Chacun de ces axes correspond à un ensemble de caractéristiques dont l'influence est justifiée par des travaux antérieurs ou par des raisonnements théoriques. Malgré une préférence des utilisateurs novices pour la multimodalité, en étudiant le contexte des modalités d'interaction, nous montrons qu'une interaction à base de temps de fixation est surement plus adaptée pour une intégration dans des systèmes optroniques. L'étude d'autres caractéristiques du contexte permettra d'affiner ces résultats et d'identifier la modalité d'interaction adéquate à une situation donnée. / New versions of Safran Electronics & Defense optronic products such as infrared binoculars or firearm sights are endowed with more and more functionalities. This leads to a need in the improvement of the user interface of those systems. The integration of eye-gaze bases interaction modalities seems interesting because of the speed, the naturalness and the availability of the eye. The eye-gaze based interaction is already well developed for impaired people but is not a mature technology for healthy people yet. During active eye-based interactions, which are explicit input from the user, a problem named Midas Touch arises. It consists in the in the incapacity of the system to differentiate scene analysis and voluntary user input. This is because the eye is a sensory organ over all. To overcome this problem, several interaction modalities have been designed: Dwell Time uses for example a minimum gaze dwell duration to trigger input, but the fixation area may also be located near the item to activate (relocated Dwell Time). It is also possible to associate the eye with another input modality such as a press button to indicate the intent from the user (multimodality eye-button). Each of these modalities has pros and cons and cherry-picking the most suitable to a given situation is not trivial. Moreover, the performances of the interaction modalities seem dependent from external variables, which represents the context of an interaction modality. To integrate eye-based interaction in systems and to choose which modality to use, it is necessary to identify what are the context characteristics and how they affect the modalities. Our goal is to propose an interaction modality context model,; that is to define all the external characteristics affecting the modality performances. From a state of the art of the eye-based interaction, we propose a description of the context following four axes: the user, the task, the system and the environment. Each of these axes decomposed in characteristics whose influence is justified by previous works or theoretical reasonings. Then we studied three characteristics which appeared to us as critical for the integration in optronic products, and we compared the performances of precited modalities against these characteristics. The first one is the type of menu (linear or circular). Contrary to the mouse, the tested interaction modalities present no significative differences depending on the type of menu they are used on. The second characteristic is linked to the user task. The aim is to evaluate the adequacy of interaction modalities with one’s ability to split his/her visual attention, that is, to fixate an area while being visually focused elsewhere. This skill is necessary during target following tasks. The fixation-based modalities seemed more permissive concerning this ability. The third characteristic is about peripheral visual alert detection in order to ensure that the user can be warned at all time. Fixation based modalities seemed to less reduce the visual field than others. Despite novice user preferences for the use of multimodality, we showed that fixation-based modalities may be more adequate for use in optronic systems. The further study of other characteristics of the context will allow to highlight the modality to use for each situation. IHM Human-machine systems Interaction modality Eye-based Interaction Context Optronicsystems Military
2	A noisy-channel based model to recognize words in eye typing systems / Um modelo baseado em canal de ruído para reconhecer palavras digitadas com os olhos Hanada, Raíza Tamae Sarkis 04 April 2018 (has links) An important issue with eye-based typing iis the correct identification of both whrn the userselects a key and which key is selected. Traditional solutions are based on predefined gaze fixation time, known as dwell-time methods. In an attempt to improve accuracy long dwell times are adopted, which un turn lead to fatigue and longer response limes. These problems motivate the proposal of methods free of dwell-time, or with very short ones, which rely on more robust recognition techniques to reduce the uncertainty about user\'s actions. These techniques are specially important when the users have disabilities which affect their eye movements or use inexpensive eye trackers. An approach to deal with the recognition problem is to treat it as a spelling correction task. An usual strategy for spelling correction is to model the problem as the transmission of a word through a noisy-channel, such that it is necessary to determine which known word of a lexicon is the received string. A feasible application of this method requires the reduction of the set of candidate words by choosing only the ones that can be transformed into the imput by applying up to k character edit operations. This idea works well on traditional typing because the number of errors per word is very small. However, this is not the case for eye-based typing systems, which are much noiser. In such a scenario, spelling correction strategies do not scale well as they grow exponentially with k and the lexicon size. Moreover, the error distribution in eye typing is different, with much more insertion errors due to specific sources, of noise such as the eye tracker device, particular user behaviors, and intrinsic chracteeristics of eye movements. Also, the lack of a large corpus of errors makes it hard to adopt probabilistic approaches based on information extracted from real world data. To address all these problems, we propose an effective recognition approach by combining estimates extracted from general error corpora with domain-specific knowledge about eye-based input. The technique is ablçe to calculate edit disyances effectively by using a Mor-Fraenkel index, searchable using a minimun prfect hashing. The method allows the early processing of most promising candidates, such that fast pruned searches present negligible loss in word ranking quality. We also propose a linear heuristic for estimating edit-based distances which take advantage of information already provided by the index. Finally, we extend our recognition model to include the variability of the eye movements as source of errors, provide a comprehensive study about the importance of the noise model when combined with a language model and determine how it affects the user behaviour while she is typing. As result, we obtain a method very effective on the task of recognizing words and fast enough to be use in real eye typing systems. In a transcription experiment with 8 users, they archived 17.46 words per minute using proposed model, a gain of 11.3% over a state-of-the-art eye-typing system. The method was particularly userful in more noisier situations, such as the first use sessions. Despite significant gains in typing speed and word recognition ability, we were not able to find statistically significant differences on the participants\' perception about their expeience with both methods. This indicates that an improved suggestion ranking may not be clearly perceptible by the users even when it enhances their performance. / Um problema importante em sistemas de digitação com os olhos é a correta identificação tanto de quando uma letra é selecionada como de qual letra foi selecionada pelo usuário. As soluções tradicionais para este problema são baseadas na verificação de quanto tempo o olho permanece retido em um alvo. Se ele fica por um certo limite de tempo, a seleção é reconhecida. Métodos em que usam esta ideia são conhecidos como baseados em tempo de retenção (dwell time). É comum que tais métodos, com intuito de melhorar a precisão, adotem tempos de retenção alto. Isso, por outro lado, leva à fadiga e tempos de resposta altos. Estes problemas motivaram a proposta de métodos não baseados em tempos de retenção reduzidos, que dependem de técnicas mais robustas de reconhecimento para inferir as ações dos usuários. Tais estratégias são particularmente mais importantes quando o usuário tem desabilidades que afetam o movimento dos olhos ou usam dispositivos de rastreamento ocular (eye-trackers) muito baratos e, portanto, imprecisos. Uma forma de lidar com o problema de reconhecimento das ações dos usuários é tratá-lo como correção ortográfica. Métodos comuns para correção ortográfica consistem em modelá-lo como a transmissão de uma palavra através de um canal de ruído, tal que é necessário determinar que palavra de um dicionário corresponde à string recebida. Para que a aplicação deste método seja viável, o conjunto de palavras candidatas é reduzido somente àquelas que podem ser transformadas na string de entrada pela aplicação de até k operações de edição de carácter. Esta ideia funciona bem em digitação tradicional porque o número de erros por palavra é pequeno. Contudo, este não é o caso de digitação com os olhos, onde há muito mais ruído. Em tal cenário, técnicas de correção de erros ortográficos não escalam pois seu custo cresce exponencialmente com k e o tamanho do dicionário. Além disso, a distribuição de erros neste cenário é diferente, com muito mais inserções incorretas devido a fontes específicas de ruído como o dispositivo de rastreamento ocular, certos comportamentos dos usuários e características intrínsecas dos movimentos dos olhos. O uso de técnicas probabilísticas baseadas na análise de logs de digitação também não é uma alternativa uma vez que não há corpora de dados grande o suficiente para tanto. Para lidar com todos estes problemas, propomos um método efetivo de reconhecimento que combina estimativas de corpus de erros gerais com conhecimento específico sobre fontes de erro encontradas em sistemas de digitação com os olhos. Nossa técnica é capaz de calcular distâncias de edição eficazmente usando um índice de Mor-Fraenkel em que buscas são feitas com auxílio de um hashing perfeito mínimo. O método possibilita o processamento ordenado de candidatos promissores, de forma que as operações de busca podem ser podadas sem que apresentem perda significativa na qualidade do ranking. Nós também propomos uma heurística linear para estimar distância de edição que tira proveito das informações já mantidas no índice, estendemos nosso modelo de reconhecimento para incluir erros vinculados à variabilidade decorrente dos movimentos oculares e fornecemos um estudo detalhado sobre a importância relativa dos modelos de ruído e de linguagem. Por fim, determinamos os efeitos do modelo no comportamento do usuário enquanto ele digita. Como resultado, obtivemos um método de reconhecimento muito eficaz e rápido o suficiente para ser usado em um sistema real. Em uma tarefa de transcrição com 8 usuários, eles alcançaram velocidade de 17.46 palavras por minuto usando o nosso modelo, o que corresponde a um ganho de 11,3% sobre um método do estado da arte. Nosso método se mostrou mais particularmente útil em situação onde há mais ruído, tal como a primeira sessão de uso. Apesar dos ganhos claros de velocidade de digitação, não encontramos diferenças estatisticamente significativas na percepção dos usuários sobre sua experiência com os dois métodos. Isto indica que uma melhoria no ranking de sugestões pode não ser claramente perceptível pelos usuários mesmo quanto ela afeta positivamente os seus desempenhos. Digitação com os Olhos Eye-based typing systems Human-computer interfaces Índices de Mor-Fraenkel Interface Humano-Computador Modelos baseados em Canal Ruidoso Mor-fraenkel indices Noisy-channel models
3	A noisy-channel based model to recognize words in eye typing systems / Um modelo baseado em canal de ruído para reconhecer palavras digitadas com os olhos Raíza Tamae Sarkis Hanada 04 April 2018 (has links) An important issue with eye-based typing iis the correct identification of both whrn the userselects a key and which key is selected. Traditional solutions are based on predefined gaze fixation time, known as dwell-time methods. In an attempt to improve accuracy long dwell times are adopted, which un turn lead to fatigue and longer response limes. These problems motivate the proposal of methods free of dwell-time, or with very short ones, which rely on more robust recognition techniques to reduce the uncertainty about user\'s actions. These techniques are specially important when the users have disabilities which affect their eye movements or use inexpensive eye trackers. An approach to deal with the recognition problem is to treat it as a spelling correction task. An usual strategy for spelling correction is to model the problem as the transmission of a word through a noisy-channel, such that it is necessary to determine which known word of a lexicon is the received string. A feasible application of this method requires the reduction of the set of candidate words by choosing only the ones that can be transformed into the imput by applying up to k character edit operations. This idea works well on traditional typing because the number of errors per word is very small. However, this is not the case for eye-based typing systems, which are much noiser. In such a scenario, spelling correction strategies do not scale well as they grow exponentially with k and the lexicon size. Moreover, the error distribution in eye typing is different, with much more insertion errors due to specific sources, of noise such as the eye tracker device, particular user behaviors, and intrinsic chracteeristics of eye movements. Also, the lack of a large corpus of errors makes it hard to adopt probabilistic approaches based on information extracted from real world data. To address all these problems, we propose an effective recognition approach by combining estimates extracted from general error corpora with domain-specific knowledge about eye-based input. The technique is ablçe to calculate edit disyances effectively by using a Mor-Fraenkel index, searchable using a minimun prfect hashing. The method allows the early processing of most promising candidates, such that fast pruned searches present negligible loss in word ranking quality. We also propose a linear heuristic for estimating edit-based distances which take advantage of information already provided by the index. Finally, we extend our recognition model to include the variability of the eye movements as source of errors, provide a comprehensive study about the importance of the noise model when combined with a language model and determine how it affects the user behaviour while she is typing. As result, we obtain a method very effective on the task of recognizing words and fast enough to be use in real eye typing systems. In a transcription experiment with 8 users, they archived 17.46 words per minute using proposed model, a gain of 11.3% over a state-of-the-art eye-typing system. The method was particularly userful in more noisier situations, such as the first use sessions. Despite significant gains in typing speed and word recognition ability, we were not able to find statistically significant differences on the participants\' perception about their expeience with both methods. This indicates that an improved suggestion ranking may not be clearly perceptible by the users even when it enhances their performance. / Um problema importante em sistemas de digitação com os olhos é a correta identificação tanto de quando uma letra é selecionada como de qual letra foi selecionada pelo usuário. As soluções tradicionais para este problema são baseadas na verificação de quanto tempo o olho permanece retido em um alvo. Se ele fica por um certo limite de tempo, a seleção é reconhecida. Métodos em que usam esta ideia são conhecidos como baseados em tempo de retenção (dwell time). É comum que tais métodos, com intuito de melhorar a precisão, adotem tempos de retenção alto. Isso, por outro lado, leva à fadiga e tempos de resposta altos. Estes problemas motivaram a proposta de métodos não baseados em tempos de retenção reduzidos, que dependem de técnicas mais robustas de reconhecimento para inferir as ações dos usuários. Tais estratégias são particularmente mais importantes quando o usuário tem desabilidades que afetam o movimento dos olhos ou usam dispositivos de rastreamento ocular (eye-trackers) muito baratos e, portanto, imprecisos. Uma forma de lidar com o problema de reconhecimento das ações dos usuários é tratá-lo como correção ortográfica. Métodos comuns para correção ortográfica consistem em modelá-lo como a transmissão de uma palavra através de um canal de ruído, tal que é necessário determinar que palavra de um dicionário corresponde à string recebida. Para que a aplicação deste método seja viável, o conjunto de palavras candidatas é reduzido somente àquelas que podem ser transformadas na string de entrada pela aplicação de até k operações de edição de carácter. Esta ideia funciona bem em digitação tradicional porque o número de erros por palavra é pequeno. Contudo, este não é o caso de digitação com os olhos, onde há muito mais ruído. Em tal cenário, técnicas de correção de erros ortográficos não escalam pois seu custo cresce exponencialmente com k e o tamanho do dicionário. Além disso, a distribuição de erros neste cenário é diferente, com muito mais inserções incorretas devido a fontes específicas de ruído como o dispositivo de rastreamento ocular, certos comportamentos dos usuários e características intrínsecas dos movimentos dos olhos. O uso de técnicas probabilísticas baseadas na análise de logs de digitação também não é uma alternativa uma vez que não há corpora de dados grande o suficiente para tanto. Para lidar com todos estes problemas, propomos um método efetivo de reconhecimento que combina estimativas de corpus de erros gerais com conhecimento específico sobre fontes de erro encontradas em sistemas de digitação com os olhos. Nossa técnica é capaz de calcular distâncias de edição eficazmente usando um índice de Mor-Fraenkel em que buscas são feitas com auxílio de um hashing perfeito mínimo. O método possibilita o processamento ordenado de candidatos promissores, de forma que as operações de busca podem ser podadas sem que apresentem perda significativa na qualidade do ranking. Nós também propomos uma heurística linear para estimar distância de edição que tira proveito das informações já mantidas no índice, estendemos nosso modelo de reconhecimento para incluir erros vinculados à variabilidade decorrente dos movimentos oculares e fornecemos um estudo detalhado sobre a importância relativa dos modelos de ruído e de linguagem. Por fim, determinamos os efeitos do modelo no comportamento do usuário enquanto ele digita. Como resultado, obtivemos um método de reconhecimento muito eficaz e rápido o suficiente para ser usado em um sistema real. Em uma tarefa de transcrição com 8 usuários, eles alcançaram velocidade de 17.46 palavras por minuto usando o nosso modelo, o que corresponde a um ganho de 11,3% sobre um método do estado da arte. Nosso método se mostrou mais particularmente útil em situação onde há mais ruído, tal como a primeira sessão de uso. Apesar dos ganhos claros de velocidade de digitação, não encontramos diferenças estatisticamente significativas na percepção dos usuários sobre sua experiência com os dois métodos. Isto indica que uma melhoria no ranking de sugestões pode não ser claramente perceptível pelos usuários mesmo quanto ela afeta positivamente os seus desempenhos. Digitação com os Olhos Índices de Mor-Fraenkel Interface Humano-Computador Modelos baseados em Canal Ruidoso Eye-based typing systems Human-computer interfaces Mor-fraenkel indices Noisy-channel models

1

Page generated in 0.0574 seconds