Global ETD Search

31	PAC-Bayesian aggregation and multi-armed bandits Audibert, Jean-Yves 14 October 2010 (has links) (PDF) This habilitation thesis presents several contributions to (1) the PAC-Bayesian analysis of statistical learning, (2) the three aggregation problems: given d functions, how to predict as well as (i) the best of these d functions (model selection type aggregation), (ii) the best convex combination of these d functions, (iii) the best linear combination of these d functions, (3) the multi-armed bandit problems. [MATH:MATH_ST] Mathematics/Statistics [STAT:TH] Statistics/Statistics Theory PAC-Bayesian aggregation bandit
32	Sociologia da maldade e maldade da sociologia: arqueologia do bandido Gaudencio, Edmundo de Oliveira 02 June 2004 (has links) Made available in DSpace on 2015-05-14T13:27:05Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 2387056 bytes, checksum: f92e1bd89d3ecba446565ad7a8453d97 (MD5) Previous issue date: 2004-06-02 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Rechercher la genèse et les usages sociaux du mot bandit c est le but de mon travail. Voilà pourquoi j adopte trois concepts opérationnels: archéologie, de Foucault; pli, de Deleuze et rhizome, de Deleuze et Guatari. Cependant, analyser un mot remet à l etude de tout ce qui l engage, mis dans les mots qu elle entreprend et aux termes auxquels ils s associent. Aisi, dans l ensemble des mot entrepris par bandit ou y attélés, un terme synonyme se met en évidence, criminel . Cependant, les synonymes constituent des babillages, puisqu aucunt mot ne dit un autre. Ainsi, je dévoile le parcours historique de ces deux termes, criminel et bandit , en analysant, d abord, les usages sociaux du mot criminel et après, du mot bandit. Criminel, au XIXe. siècle, constitue une catégorie générale désignative du délinquant, y compris le criminel politique ou bandit. Mais, peu à peu le bandit qui constituait autrefois une catégorie particulière de criminel, criminel politique, devient catégorie générale à partir de la fin du XIXe. siècle et début du XXe., en désignant, dans la presse, toute sorte de délinquant. En tant que terme central dans la première partie intitulée Sociologie de la Méchancité et Méchancité de la Sociologie , le mot criminel essaie d analyser la méchancité qui gagne une visibilité dans le corps du criminel, selon les discours de la physiognomonie, de la phrénologie, de la craniométrie et de la criminologie. Telles visibilités constituent un discours d exclusion, basé sur la peur sociale, le déni de cette émotion et sa transformation en haine. Ainsi, une Sociologie de la Méchancité doit analyser les facteurs sociaux existants dans la transformation de peur en haine, en discutant une et d autres émotions, en tant que des faits historiques qui rendent possible l invention de la survellance et du contrôle sociaux. Méchancité de la Sociologie, par contre, n est que l usage stratégique de la sociologie de la part du Pouvoir qui s en sert comme un moyen de rationalisation pour la surveillance, le contrôle, l exclusion, au nom de la sécurité sociale devant la possibilité de danger de certains groupes sociaux, pris comme suspects et/ou criminels. À la deuxième partie, où spécifiquement l on recherche l Archéologie du Bandit, à la façon de reconstituer le parcours historique de bandit , j élabore une analyse biographique sur Antônio Vicente Mendes Maciel, le Conselheiro , le bandit typique des premières années de la République Brésilienne, tandis que j analyse la guerre de Canudos comme exemple d exclusion sociale, par les rites sacrificatoires engagés dans les chocs entre le Même et l Autre. Le cas Conselheiro peut servir soit à démontrer l usage social du mot bandit, importe du settecento français, soit à l usage national des savoirs produits en Europe à la fin du XIXe. siècle quand le concept de criminel est créé, recapitules parmi nous par Raimundo Nina Rodrigues et Euclydes da Cunha. Dans les Inconclusões du travail, à partir des concepts sociaux de criminel et bandit, je renvoie à la discussions sur les notions de contrôle, surveillance et exclusion, mises entre la croyance de l égalité et le manque de respect à la différence et médiatrices de certaines relations entre lê Même et l Autre. / Investigar a gênese e os usos sociais da palavra bandido, este o objetivo de meu trabalho. Para tanto, lanço mão de três conceitos operacionais: arqueologia, de Foucault; dobra, de Deleuze; e rizoma, de Deleuze e Guatari. Analisar um vocábulo, porém, remete ao estudo dos seus entornos, colocados nas palavras que ele agencia e nos termos que àquele se associam. Dessa forma, na rede dos vocábulos agenciados pela palavra bandido ou a ela associados, um termo sinônimo ganha destaque, criminoso. Entretanto, sinônimos são falácias, pois nenhuma palavra diz outra. Pensado assim, ponho a descoberto o percurso histórico destes dois termos, criminoso e bandido, analisando, na primeira parte, os usos sociais do vocábulo criminoso e, na segunda, os usos sociais da palavra bandido, tendo-se que criminoso, no século XIX, é categoria geral designativa do delinqüente, entre os quais se inclui o criminoso político ou bandido. Gradativamente, porém, o bandido, que era categoria particular de criminoso, criminoso político, passa a categoria geral, a partir do final do século XIX e início do século XX, designando, no jornalismo, toda e qualquer modalidade de delinqüente. Termo nuclear na primeira parte, intitulada Sociologia da Maldade & Maldade da Sociologia , o vocábulo criminoso enseja analisar a maldade que, de acordo com os discursos da fisiognomonia, da frenologia, da craniometria e da criminologia, ganha visibilidade no corpo do criminoso. Tais dizibilidades formatam um discurso de exclusão, calcado no medo social, na denegação dessa emoção e na sua transformação em ódio. Assim sendo, uma Sociologia da Maldade deve analisar os fatores sociais alocados na transformação daquele medo nesse ódio, discutindo uma e outra emoções, enquanto fatos históricos possibilitantes da invenção da vigilância e do controle sociais. Maldade da sociologia, por outro lado, nada mais é que a utilização estratégica da sociologia por parte do Poder, que dela se serve como forma de racionalização para a vigilância, o controle, a exclusão, em nome da segurança social, diante da suposta periculosidade de certos grupos sociais, assinalados como suspeitos e/ou criminosos. Na segunda parte, onde especificamente é investigada a Arqueologia do Bandido, à guisa de reconstituir o percurso histórico do termo bandido, elaboro uma análise biográfica sobre Antônio Vicente Mendes Maciel, o Conselheiro, o bandido típico dos primeiros anos da República Brasileira, enquanto analiso a guerra de Canudos como exemplo de exclusão social, pela via dos ritos sacrificiais envolvidos nos embates entre o Mesmo e o Outro. O caso Conselheiro tanto serve para demonstrar o uso social do termo bandido, importado do settecento francês, quanto o uso nacional dos saberes produzidos na Europa ao final do século XIX, quando é inventado o conceito de criminoso, recapitulados entre nós por Raimundo Nina Rodrigues e Euclydes da Cunha. Nas Inconclusões que encerram o trabalho, partindo dos conceitos sociais de criminoso e de bandido, remeto à discussão sobre as noções de controle, vigilância e exclusão, colocadas entre a crença da igualdade e o desrespeito à diferença e mediadoras de certas relações entre o Mesmo e o Outro. Maldade Sociologia Bandido Crime Criminologia Méchanceté Sociologie Bandit Crime Criminologie CIENCIAS HUMANAS::SOCIOLOGIA
33	Crime, proceder, convívio-seguro: um experimento antropológico a partir de relações entre ladrões / Crime, proceder, conviviality-security: an anthropological experiment from relations among bandits Adalton Jose Marques 26 February 2010 (has links) Neste experimento antropológico, fortemente inspirado na obra de Michel Foucault, apresento uma etnografia constituída principalmente a partir de conversas travadas com presos, ex-presos e seus familiares, em torno de experiências prisionais. No primeiro capítulo, exploro diferentes compreensões sobre o proceder e sobre a divisão espacial convívio-seguro, elaboradas como resposta à pergunta nativa o que é o certo?. Cada uma delas se faz como defesa do coletivo de presos donde emerge e como execração dos coletivos inimigos. No segundo capítulo, busco deslindar uma dimensão de estratégias adjacente a essas compreensões, onde os presos são levados a prestar atenção a eles próprios, precavendo-se para manterem um singular equilíbrio entre ser humilde e ser cabuloso. Nisso consiste o sentido do que designam por ser ladrão. Finalmente, no último capítulo mapeio uma noção de crime fundamental aos meus interlocutores, definido como movimento que estabelece as alianças nutridas entre ladrões e outros aliados ao mesmo tempo em que define inimigos a partir de considerações sobre suas caminhadas. / In this anthropological experiment inspired in the Michel Foucault works I present an ethnography mainly constituted from conversations with prisoners, ex-prisoners and their families about the experience of prison. In the first chapter I tried to present the native answers to the question they address to themselves: Which is the right way to behave?. So I worked on the meanings of the native concept of proceder and the internal division of the space of the prisons under the conviviality-security label. In each of these places they present their defenses and claim to be acting the right way, condemning the other collectivity. In the second chapter I show the strategies that underlies in the proceder especially those that make them pay attention to themselves in a balance between being humble and being fearless, the main definition they present for being a bandit. Finally, in the last chapter, I map the native concept of crime defined as a movement that institute the nurturing alliances among bandits and other allies in the same time that define the enemies from the speculations they get in their path. Crime Divisão espacial - convívio-seguro Ladrão Prisioneiros Proceder Bandit Crime Prisoners Proceder
34	Decision making using Thompson Sampling Mellor, Joseph Charles January 2014 (has links) The ability to make decisions is a crucial ability of many autonomous systems. In many scenarios the consequence of a decision is unknown and often stochastic. The same decision may lead to a different outcome every time it is taken. An agent that can learn to make decisions based purely on its past experience needs less tuning and is likely more robust. An agent must often balance between learning the payoff of actions by exploring, and exploiting the knowledge they currently have. The multi-armed bandit problem exhibits such an exploration-exploitation dilemma. Thompson Sampling is a strategy for the problem, first proposed in 1933. In the last several years there has been renewed interest in it, with the emergence of strong empirical and theoretical justification for its use. This thesis seeks to take advantage of the benefits of Thompson Sampling while applying it to other decision-making models. In doing so we propose different algorithms for these scenarios. Firstly we explore a switching multi-armed bandit problem. In real applications the most appropriate decision to take often changes over time. We show that an agent assuming switching is often robust to many types of changing environment. Secondly we consider the best arm identification problem. Unlike the multi-armed bandit problem, where an agent wants to increase reward over the entire period of decision making, the best arm identification is concerned in increasing the reward gained by a final decision. This thesis argues that both problems can be tackled effectively using Thompson Sampling based approaches and provides empirical evidence to support this claim. 006.3
35	Tuning evolutionary search for closed-loop optimization Allmendinger, Richard January 2012 (has links) Closed-loop optimization deals with problems in which candidate solutions are evaluated by conducting experiments, e.g. physical or biochemical experiments. Although this form of optimization is becoming more popular across the sciences, it may be subject to rather unexplored resourcing issues, as any experiment may require resources in order to be conducted. In this thesis we are concerned with understanding how evolutionary search is affected by three particular resourcing issues -- ephemeral resource constraints (ERCs), changes of variables, and lethal environments -- and the development of search strategies to combat these issues. The thesis makes three broad contributions. First, we motivate and formally define the resourcing issues considered. Here, concrete examples in a range of applications are given. Secondly, we theoretically and empirically investigate the effect of the resourcing issues considered on evolutionary search. This investigation reveals that resourcing issues affect optimization in general, and that clear patterns emerge relating specific properties of the different resourcing issues to performance effects. Thirdly, we develop and analyze various search strategies augmented on an evolutionary algorithm (EA) for coping with resourcing issues. To cope specifically with ERCs, we develop several static constraint-handling strategies, and investigate the application of reinforcement learning techniques to learn when to switch between these static strategies during an optimization process. We also develop several online resource-purchasing strategies to cope with ERCs that leave the arrangement of resources to the hands of the optimizer. For problems subject to changes of variables relating to the resources, we find that knowing which variables are changed provides an optimizer with valuable information, which we exploit using a novel dynamic strategy. Finally, for lethal environments, where visiting parts of the search space can cause the permanent loss of resources, we observe that a standard EA's population may be reduced in size rapidly, complicating the search for innovative solutions. To cope with such scenarios, we consider some non-standard EA setups that are able to innovate genetically whilst simultaneously mitigating risks to the evolving population. 519.6
36	SCHEDULING AND CONTROL WITH MACHINE LEARNING IN MANUFACTURING SYSTEMS Sungbum Jun (9136835) 05 August 2020 (has links) Numerous optimization problems in production systems can be considered as decision-making processes that determine the best allocation of resources to tasks over time to optimize one or more objectives in concert with big data. Among the optimization problems, production scheduling and routing of robots for material handling are becoming more important due to their impacts on system performance. However, the development of efficient algorithms for scheduling or routing faces several challenges. While the scheduling and vehicle routing problems can be solved by mathematical models such as mixed-integer linear programming to find optimal solutions to smallsized problems, they are not applicable to larger problems due to the nature of NP-hard problems. Thus, further research on machine learning applications to those problems is a significant step towards increasing the possibilities and potentialities of field application. In order to create truly intelligent systems, new frameworks for scheduling and routing are proposed to utilize machine learning (ML) techniques. First, the dynamic single-machine scheduling problem for minimization of total weighted tardiness is addressed. In order to solve the problem more efficiently, a decisiontree-based approach called Generation of Rules Automatically with Feature construction and Treebased learning (GRAFT) is designed to extract dispatching rules from existing or good schedules. In addition to the single-machine scheduling problem, the flexible job-shop scheduling problem with release times for minimizing the total weighted tardiness is analyzed. As a ML-based solution approach, a random-forest-based approach called Random Forest for Obtaining Rules for Scheduling (RANFORS) is developed to solve the problem by generating dispatching rules automatically. Finally, an optimization problem for routing of autonomous robots for minimizing total tardiness of transportation requests is analyzed by decomposing it into three sub-problems. In order to solve the sub-problems, a comprehensive framework with consideration of conflicts between routes is proposed. Especially to the sub-problem for vehicle routing, a new local search algorithm called COntextual-Bandit-based Adaptive Local search with Tree-based regression (COBALT) that incorporates the contextual bandit into operator selection is developed. The findings from my research contribute to suggesting a guidance to practitioners for the applications of ML to scheduling and control problems, and ultimately to lead the implementation of smart factories. Engineering not elsewhere classified Production Scheduling Machine Learning Autonomous Robot Decision Tree Genetic Programming Contextual Bandit
37	The effect of stress on the explore-exploit dilemma Ferguson, Thomas 05 April 2022 (has links) When humans are faced with multiple options, they must decide whether to choose a novel or less certain option (explore) or stick with what they know (exploit). Exploration is a fundamental cognitive process. Importantly, when humans attempt to solve the explore-exploit dilemma, they must effectively incorporate both feedback and uncertainty to guide their actions. While prior work has shown that both acute (short-term) and chronic (long-term) stress can disrupt how humans solve the explore-exploit dilemma, the mechanisms of how this occurs are unclear. For example, does stress disrupt how people integrate feedback to guide their decisions to explore or exploit, or does stress disrupt computations of uncertainty regarding their choices? Importantly, the use of electroencephalography as a tool can help reveal the impact of stress on explore-exploit decision making by measuring neural signals sensitive to feedback learning and uncertainty. In the present dissertation, I provide evidence from a series of experiments where I examined the impact of both acute and chronic stress on the explore-exploit dilemma while electroencephalographic data was collected. In experiment 1, I exposed participants to an acute stressor and then examined their decisions to switch or stay – as a proxy for explore and exploit decisions – in a multi-arm bandit paradigm. I found tentative evidence that the acute stress response disrupted both the feedback learning signal (the reward positivity) and the uncertainty signal (the switch P300). In experiment 2 I adopted a computational neuroscience approach and directly classified participants decisions as explorations or exploitations using reinforcement learning models. There was only an effect of the acute stress response on feedback signals, in this case, the feedback P300. In experiments 1 and 2, I used contextual bandit tasks where the reward probabilities of the options shifted throughout, and there was no behavioural effect of acute stress on task performance or exploration rate. However, in experiment 3, I examined a learnable bandit where one option was preferred. Again, using computational modelling and electroencephalography, I found tentative evidence that the acute stress response disrupted the feedback learning signals (the feedback P300) and stronger evidence that acute stress disrupted the uncertainty signal (the exploration P300). As well, I observed that the acute stress response reduced task performance and increased exploration rate. Lastly, in experiment 4, I examined the impact of chronic stress exposure on explore-exploit decision making and electrophysiology – while I found no effects of chronic stress, I believe future research is necessary. Taken together, these findings provide novel evidence for the neural mechanisms of how the acute stress response impacts the explore-exploit dilemma through disruptions to feedback learning and assessments of uncertainty. These findings also highlight how theories of the P300 signal may not be properly capturing the varied role of the P300 in cognition. / Graduate Explore-Exploit Acute Stress Chronic Stress Reinforcement Learning Decision Making Bandit Task
38	Intelligent Data Mining Techniques for Automatic Service Management Wang, Qing 07 November 2018 (has links) Today, as more and more industries are involved in the artificial intelligence era, all business enterprises constantly explore innovative ways to expand their outreach and fulfill the high requirements from customers, with the purpose of gaining a competitive advantage in the marketplace. However, the success of a business highly relies on its IT service. Value-creating activities of a business cannot be accomplished without solid and continuous delivery of IT services especially in the increasingly intricate and specialized world. Driven by both the growing complexity of IT environments and rapidly changing business needs, service providers are urgently seeking intelligent data mining and machine learning techniques to build a cognitive ``brain" in IT service management, capable of automatically understanding, reasoning and learning from operational data collected from human engineers and virtual engineers during the IT service maintenance. The ultimate goal of IT service management optimization is to maximize the automation of IT routine procedures such as problem detection, determination, and resolution. However, to fully automate the entire IT routine procedure is still a challenging task without any human intervention. In the real IT system, both the step-wise resolution descriptions and scripted resolutions are often logged with their corresponding problematic incidents, which typically contain abundant valuable human domain knowledge. Hence, modeling, gathering and utilizing the domain knowledge from IT system maintenance logs act as an extremely crucial role in IT service management optimization. To optimize the IT service management from the perspective of intelligent data mining techniques, three research directions are identified and considered to be greatly helpful for automatic service management: (1) efficiently extract and organize the domain knowledge from IT system maintenance logs; (2) online collect and update the existing domain knowledge by interactively recommending the possible resolutions; (3) automatically discover the latent relation among scripted resolutions and intelligently suggest proper scripted resolutions for IT problems. My dissertation addresses these challenges mentioned above by designing and implementing a set of intelligent data-driven solutions including (1) constructing the domain knowledge base for problem resolution inference; (2) online recommending resolution in light of the explicit hierarchical resolution categories provided by domain experts; and (3) interactively recommending resolution with the latent resolution relations learned through a collaborative filtering model. Artificail Intelligent Automatic Service Management Knowledge Base Multi-armed Bandit Model Computer Engineering
39	<b>MODERN BANDIT OPTIMIZATION WITH STATISTICAL GUARANTEES</b> Wenjie Li (17506956) 01 December 2023 (has links) <p dir="ltr">Bandit and optimization represent prominent areas of machine learning research. Despite extensive prior research on these topics in various contexts, modern challenges, such as deal- ing with highly unsmooth nonlinear reward objectives and incorporating federated learning, have sparked new discussions. The X-armed bandit problem is a specialized case where bandit algorithms and blackbox optimization techniques join forces to address noisy reward functions within continuous domains to minize the regret. This thesis concentrates on the X -armed bandit problem in a modern setting. In the first chapter, we introduce an optimal statistical collaboration framework for the single-client X -armed bandit problem, expanding the range of objectives by considering more general smoothness assumptions and empha- sizing tighter statistical error measures to expedite learning. The second chapter addresses the federated X-armed bandit problem, providing a solution for collaboratively optimizing the average global objective while ensuring client privacy. In the third chapter, we confront the more intricate personalized federated X -armed bandit problem. An enhanced algorithm facilitating the simultaneous optimization of all local objectives is proposed.</p> Optimisation Applied statistics Blackbox Optimization X-armed Bandit Federated Learning Online Learning
40	Adaptive Personalization of Pedagogical Sequences using Machine Learning / Personalisation Adaptative de Séquences Pédagogique à l'aide d'Apprentissage Automatique Clement, Benjamin 12 December 2018 (has links) Les ordinateurs peuvent-ils enseigner ? Pour répondre à cette question, la recherche dans les Systèmes Tuteurs Intelligents est en pleine expansion parmi la communauté travaillant sur les Technologies de l'Information et de la Communication pour l'Enseignement (TICE). C'est un domaine qui rassemble différentes problématiques et réunit des chercheurs venant de domaines variés, tels que la psychologie, la didactique, les neurosciences et, plus particulièrement, le machine learning. Les technologies numériques deviennent de plus en plus présentes dans la vie quotidienne avec le développement des tablettes et des smartphones. Il semble naturel d'utiliser ces technologies dans un but éducatif. Cela amène de nombreuses problématiques, telles que comment faire des interfaces accessibles à tous, comment rendre des contenus pédagogiques motivants ou encore comment personnaliser les activités afin d'adapter le contenu à chacun. Au cours de cette thèse, nous avons développé des méthodes, regroupées dans un framework nommé HMABITS, afin d'adapter des séquences d'activités pédagogiques en fonction des performances et des préférences des apprenants, dans le but de maximiser leur vitesse d'apprentissage et leur motivation. Ces méthodes utilisent des modèles computationnels de motivation intrinsèque pour identifier les activités offrant les plus grands progrès d'apprentissage, et utilisent des algorithmes de Bandits Multi-Bras pour gérer le compromis exploration/exploitation à l'intérieur de l'espace d'activité. Les activités présentant un intérêt optimal sont ainsi privilégiées afin de maintenir l'apprenant dans un état de Flow ou dans sa Zone de Développement Proximal. De plus, certaines de nos méthodes permettent à l'apprenant de faire des choix sur des caractéristiques contextuelles ou le contenu pédagogique de l'application, ce qui est un vecteur d'autodétermination et de motivation. Afin d'évaluer l'efficacité et la pertinence de nos algorithmes, nous avons mené plusieurs types d'expérimentation. Nos méthodes ont d'abord été testées en simulation afin d'évaluer leur fonctionnement avant de les utiliser dans d'actuelles applications d'apprentissage. Pour ce faire, nous avons développé différents modèles d'apprenants, afin de pouvoir éprouver nos méthodes selon différentes approches, un modèle d'apprenant virtuel ne reflétant jamais le comportement d'un apprenant réel. Les résultats des simulations montrent que le framework HMABITS permet d'obtenir des résultats d'apprentissage comparables et, dans certains cas, meilleurs qu'une solution optimale ou qu'une séquence experte. Nous avons ensuite développé notre propre scénario pédagogique et notre propre serious game afin de tester nos algorithmes en situation réelle avec de vrais élèves. Nous avons donc développé un jeu sur la thématique de la décomposition des nombres, au travers de la manipulation de la monnaie, pour les enfants de 6 à 8 ans. Nous avons ensuite travaillé avec le rectorat et différentes écoles de l'académie de bordeaux. Sur l'ensemble des expérimentations, environ 1000 élèves ont travaillé sur l'application sur tablette. Les résultats des études en situation réelle montrent que le framework HMABITS permet aux élèves d'accéder à des activités plus diverses et plus difficiles, d'avoir un meilleure apprentissage et d'être plus motivés qu'avec une séquence experte. Les résultats montrent même que ces effets sont encore plus marqués lorsque les élèves ont la possibilité de faire des choix. / Can computers teach people? To answer this question, Intelligent Tutoring Systems are a rapidly expanding field of research among the Information and Communication Technologies for the Education community. This subject brings together different issues and researchers from various fields, such as psychology, didactics, neurosciences and, particularly, machine learning. Digital technologies are becoming more and more a part of everyday life with the development of tablets and smartphones. It seems natural to consider using these technologies for educational purposes. This raises several questions, such as how to make user interfaces accessible to everyone, how to make educational content motivating and how to customize it to individual learners. In this PhD, we developed methods, grouped in the aptly-named HMABITS framework, to adapt pedagogical activity sequences based on learners' performances and preferences to maximize their learning speed and motivation. These methods use computational models of intrinsic motivation and curiosity-driven learning to identify the activities providing the highest learning progress and use Multi-Armed Bandit algorithms to manage the exploration/exploitation trade-off inside the activity space. Activities of optimal interest are thus privileged with the target to keep the learner in a state of Flow or in his or her Zone of Proximal Development. Moreover, some of our methods allow the student to make choices about contextual features or pedagogical content, which is a vector of self-determination and motivation. To evaluate the effectiveness and relevance of our algorithms, we carried out several types of experiments. We first evaluated these methods with numerical simulations before applying them to real teaching conditions. To do this, we developed multiple models of learners, since a single model never exactly replicates the behavior of a real learner. The simulation results show the HMABITS framework achieves comparable, and in some cases better, learning results than an optimal solution or an expert sequence. We then developed our own pedagogical scenario and serious game to test our algorithms in classrooms with real students. We developed a game on the theme of number decomposition, through the manipulation of money, for children aged 6 to 8. We then worked with the educational institutions and several schools in the Bordeaux school district. Overall, about 1000 students participated in trial lessons using the tablet application. The results of the real-world studies show that the HMABITS framework allows the students to do more diverse and difficult activities, to achieve better learning and to be more motivated than with an Expert Sequence. The results show that this effect is even greater when the students have the possibility to make choices. Système Tuteur Intelligent Enseignement Adaptatif Théorie du Flow Motivation Intrinsèque Algorithme de Bandit Multi Bras Modèle d’apprenant Serious Game Intelligent Tutoring System Adaptive Teaching Flow Theory Intrinsic Motivation Multi-Armed Bandit Learner Model Serious Game

Search results