Global ETD Search

11	A Prediction Model Uses the Sequence of Attempts and Hints to Better Predict Knowledge: Better to Attempt the Problem First, Rather Than Ask for A Hint Zhu, Linglong 28 April 2014 (has links) Intelligent Tutoring Systems (ITS) have been proven to be efficient in providing students assistance and assessing their performance when they do their homework. Many research projects have been done to analyze how students' knowledge grows and to predict their performance from within intelligent tutoring system. Most of them focus on using correctness of the previous question or the number of hints and attempts students need to predict their future performance, but ignore how they ask for hints and make attempts. In this research work, we build a Sequence of Actions (SOA) model taking advantage of the sequence of hints and attempts a student needed for previous question to predict students' performance. A two step modeling methodology is put forward in the work, which is a combination of Tabling method and the Logistic Regression. We used an ASSISTments dataset of 66 students answering a total of 34,973 problems generated from 5010 questions over the course of two years. The experimental results showed that the Sequence of Action model has reliable predictive accuracy than Knowledge Tracing and Assistance Model and its performance of prediction is improved after combining with Knowledge Tracing. Intelligent Tutoring System Student Modelling Educational Data Mining
12	Towards the Automatic Classification of Student Answers to Open-ended Questions Alvarado Mantecon, Jesus Gerardo 24 April 2019 (has links) One of the main research challenges nowadays in the context of Massive Open Online Courses (MOOCs) is the automation of the evaluation process of text-based assessments effectively. Text-based assessments, such as essay writing, have been proved to be better indicators of higher level of understanding than machine-scored assessments (E.g. Multiple Choice Questions). Nonetheless, due to the rapid growth of MOOCs, text-based evaluation has become a difficult task for human markers, creating the need of automated systems for grading. In this thesis, we focus on the automated short answer grading task (ASAG), which automatically assesses natural language answers to open-ended questions into correct and incorrect classes. We propose an ensemble supervised machine learning approach that relies on two types of classifiers: a response-based classifier, which centers around feature extraction from available responses, and a reference-based classifier which considers the relationships between responses, model answers and questions. For each classifier, we explored a set of features based on words and entities. For the response-based classifier, we tested and compared 5 features: traditional n-gram models, entity URIs (Uniform Resource Identifier) and entity mentions both extracted using a semantic annotation API, entity mention embeddings based on GloVe and entity URI embeddings extracted from Wikipedia. For the reference-based classifier, we explored fourteen features: cosine similarity between sentence embeddings from student answers and model answers, number of overlapping elements (words, entity URI, entity mention) between student answers and model answers or question text, Jaccard similarity coefficient between student answers and model answers or question text (based on words, entity URI or entity mentions) and a sentence embedding representation. We evaluated our classifiers on three datasets, two of which belong to the SemEval ASAG competition (Dzikovska et al., 2013). Our results show that, in general, reference-based features perform much better than response-based features in terms of accuracy and macro average f1-score. Within the reference-based approach, we observe that the use of S6 embedding representation, which considers question text, student and model answer, generated the best performing models. Nonetheless, their combination with other similarity features helped build more accurate classifiers. As for response-based classifiers, models based on traditional n-gram features remained the best models. Finally, we combined our best reference-based and response-based classifiers using an ensemble learning model. Our ensemble classifiers combining both approaches achieved the best results for one of the evaluation datasets, but underperformed on the remaining two. We also compared the best two classifiers with some of the main state-of-the-art results on the SemEval competition. Our final embedded meta-classifier outperformed the top-ranking result on the SemEval Beetle dataset and our top classifier on SemEval SciEntBank, trained on reference-based features, obtained the 2nd position. In conclusion, the reference-based approach, powered mainly by sentence level embeddings and other similarity features, proved to generate the most efficient models in two out of three datasets and the ensemble model was the best on the SemEval Beetle dataset. Natural Language Processing Machine Learning Educational Data Mining
13	Aplicação da arquitetura lambda na construção de um ambiente big data educacional para análise de dados Mendes, Renê de Ávila 09 February 2017 (has links) Submitted by Marta Toyoda (1144061@mackenzie.br) on 2018-02-09T19:36:53Z No. of bitstreams: 2 RENÊ DE ÁVILA MENDES.pdf: 2131022 bytes, checksum: 371eff9a643c4104cbd7ced2b556bab5 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Paola Damato (repositorio@mackenzie.br) on 2018-02-22T13:28:09Z (GMT) No. of bitstreams: 2 RENÊ DE ÁVILA MENDES.pdf: 2131022 bytes, checksum: 371eff9a643c4104cbd7ced2b556bab5 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-02-22T13:28:09Z (GMT). No. of bitstreams: 2 RENÊ DE ÁVILA MENDES.pdf: 2131022 bytes, checksum: 371eff9a643c4104cbd7ced2b556bab5 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-02-09 / To properly deal with volume, velocity and variety data dimensions in educational contexts is a major concern for Educational Institutions and both Educational Data Mining and Learning Analytics Researchers have cooperated to properly address this challenge which is popularly called Big Data. Hardware developments have been made to increase computing power, storage capacity and efficiency in energy use. New technologies in databases, file systems and distributed systems, as well as developments in data transmission techniques, data management, data analysis and visualization have been trying to overcome the challenge of processing, storing and analyzing large volumes of data and the inability to meet simultaneously the requirements of consistency, availability and tolerance of partitions. Although the architecture definition is the main task in a Big Data system design, objective guidelines for the selection of the architecture and the tools for the implementation of Big Data systems were not found in the literature. The present research aims to analyze the main architectures for both batch and stream processing and to use one of them in the construction of a Big Data environment, providing important orientations to Researchers, Technicians and Managers. Academic data and logs of the Virtual Learning Environment Moodle of an Academic Unit of a Higher Education Institution are used. / Lidar adequadamente com as dimensões de volume, velocidade e variedade dos dados no contexto educacional é um importante desafio para as Instituições de Ensino, e Pesquisadores das áreas de Mineração de Dados Educacionais e Learning Analytics têm cooperado para tratar adequadamente este desafio, popularmente chamado de Big Data. Desenvolvimentos em hardware têm sido feitos para aumentar o poder computacional, a capacidade de armazenamento e a eficiência no uso de energia. Novas tecnologias de bancos de dados, sistemas de arquivos e sistemas distribuídos, além do desenvolvimento de técnicas de transmissão, administração, análise e visualização de dados têm tentado vencer o desafio de processar, armazenar e analisar grandes volumes de dados e a impossibilidade de atender simultaneamente os requisitos de consistência, disponibilidade e tolerância a partições. Embora a definição da arquitetura seja a principal tarefa em um projeto de sistema Big Data, não foram encontradas na literatura orientações objetivas para a seleção da arquitetura e das ferramentas para a implementação de aplicações Big Data. A presente pesquisa tem por objetivo analisar as principais arquiteturas para processamento em lote e em fluxo e utilizar uma delas na construção de um ambiente Big Data, fornecendo importantes orientações a Pesquisadores, Técnicos e Gestores. São utilizados dados acadêmicos e logs do Ambiente Virtual de Aprendizagem Moodle de uma Unidade Acadêmica de uma Instituição de Ensino Superior. big data educational data mining arquitetura lambda moodle CNPQ::ENGENHARIAS
14	Minería de datos educacionales: modelos de predicción del desempeño escolar en alumnos de enseñanza básica Molen Moris, Johan van der January 2013 (has links) Ingeniero Civil Matemático / En los últimos años, se ha abierto una oportunidad de hacer análisis más precisos de las habilidades y desempeños de los estudiantes. De a poco, han comenzado a proliferar sistemas de ejercitación en línea y tutores inteligentes que permiten registrar una gran cantidad de información valiosa referente al aprendizaje de los alumnos. La Minería de Datos Educacionales (MDE), es un campo de estudio dedicado a desarrollar métodos matemáticos para analizar datos provenientes de ambientes relacionados a la educación, y extraer la mayor cantidad de información para tratar de entender mejor a los estudiantes, profesores y actores relacionados, con el fin de mejorar los procesos educativos. En esta memoria se aborda el problema de predecir el desempeño de un alumno dados sus datos históricos recopilados a partir de su interacción en un sistema computacional de ejercitación en línea. Este desafío se ha constituido últimamente como uno de los más importantes dentro de la MDE, tal como evidencia el aumento de publicaciones relacionadas, y el gran interés que ha despertado por parte de universidades y entidades gubernamentales. En este trabajo, se analizan los registros almacenados de más de medio millón de ejercicios en línea realizados semanalmente en el 2011 por 805 estudiantes en 23 cursos de cuarto básico de 13 escuelas vulnerables, explorando varios de los enfoques más usados para enfrentar este problema, y proponiendo nuevas variantes para mejorar los resultados y ayudar a la detección de observaciones anómalas que podrían incluir instancias de "gaming the system". Adicionalmente, se estudia el problema de conocer cómo ciertos contenidos impactan en otros. Se trata de un problema de Minería de Datos Educacionales central en el diseño curricular y la planificación de clases. Usualmente esta red de influencias causales se construye en base a las opiniones de expertos. Algunos contribuyen explicitando la dependencia lógica de los contenidos y otros con sus experiencias personales al enseñar esos contenidos. Sin embargo, es muy importante contrastar esas opiniones con el proceso de aprendizaje que efectivamente ocurre en el aula y construir redes causales en base a la evidencia empírica. Aprovechamos los datos y técnicas de Minería de Datos para generar automáticamente la primera red causal de contenidos de un currículo construida empíricamente. Finalmente, se reporta el análisis del impacto de la ejercitación en línea en el desempeño de la prueba SIMCE. Mediciones en condiciones de laboratorio muestran que la ejercitación aumenta el aprendizaje. Sin embargo, implementaciones escolares no han mostrado impactos positivos. Este trabajo muestra la experiencia con escuelas vulnerables donde los estudiantes hacen decenas de ejercicios matemáticos semanales en un sistema en línea. El SIMCE de matemáticas subió significativamente, más de tres veces el aumento histórico logrado a nivel nacional en 2011. Además, los cursos que realizaron mayor cantidad de ejercicios lograron un mayor aumento en el SIMCE, independiente del efecto del profesor y de la escuela. Minería de datos Rendimiento en la educación Predicción en tecnología Educational data mining
15	Caracterização de alunos em ambientes de ensino online: estendendo o uso da DAMICORE para minerar dados educacionais / Characterization of students in online learning environments: extending the use of DAMICORE to educational data mining Luis Fernando de Souza Moro 04 May 2015 (has links) Com a popularização do uso de recursos tecnológicos na educação, uma enorme quantidade de dados, relacionados às interações entre alunos e esses recursos, é armazenada. Analisar esses dados, visando caracterizar os alunos, é tarefa muito importante, uma vez que os resultados dessa análise podem auxiliar professores no processo de ensino e aprendizagem. Entretanto, devido ao fato de as ferramentas utilizadas para essa caracterização serem complexas e pouco intuitivas, os profissionais da área de ensino acabam por não utilizá-las, inviabilizando a implementação de tais ferramentas em ambientes educacionais. Dentro desse contexto, a dissertação de mestrado aqui apresentada teve como objetivo analisar os dados provenientes de um sistema tutor inteligente, o MathTutor, que disponibiliza exercícios específicos de matemática, para identificar padrões de comportamento dos alunos que interagiram com esse sistema durante um determinado período. Essa análise foi realizada por meio de um processo de Mineração de Dados Educacionais (EDM), utilizando a ferramenta DAMICORE, com o intuito de possibilitar que fossem geradas, de forma rápida e eficaz, informações úteis à caracterização dos alunos. Durante a realização dessa análise, seguiram-se algumas fases do processo de descobrimento de conhecimento em bases de dados, seleção, pré-processamento, mineração dos dados e avaliação e interpretação. Na fase de mineração de dados, foi utilizada a ferramenta DAMICORE, que encontrou padrões que foram estudados na fase de avaliação e interpretação. A partir dessa análise foram encontrados padrões comportamentais dos alunos, por exemplo, alunos do sexo masculino apresentam rendimento superior ou inferior ao de alunas do sexo feminino e quais alunos terão um bom ou mau rendimento nas etapas finais do processo de ensino. Como principal resultado temos que uma das hipóteses criadas, Alunos que obtiveram bom desempenho no pós-teste imediato apresentaram dois dos três seguintes comportamentos: poucas interações na intervenção, baixo tempo interagindo com o sistema na intervenção e poucos misconceptions no pré-teste, teve sua acurácia comprovada dentre os dados utilizados nessa pesquisa. Assim, por meio desta pesquisa concluiu-se que a utilização da DAMICORE em contexto educacional pode auxiliar o professor a inferir o desempenho dos seus alunos oferecendo a ele a oportunidade de realizar as intervenções pedagógicas que auxiliem alunos com possíveis dificuldades e apresente novos desafios para aqueles com facilidade no tema estudado / With the popularization of the use of technological resources in education, a huge amount of data, related to the interactions between students and these resources, is stored. Analyzing this data, due to characterize the students, is an important task, since the results of this analysis can help teachers on teaching and learning process. However, due to the fact that the tools used to this characterization are complex and non-intuitive, the educational professionals do not use it, invalidating the implementation of such tools at educational environments. Within this context, this master\'s dissertation aimed analyzing the prevenient data from an educational web system named MathTutor, which offers specific math exercises to identify behavioral patterns of students who interacted with this system during some period. This analysis was performed by a process known as Educational Data Mining, using the tool named DAMICORE, in order to enable quickly and effectively the construction of helpful information to the characterization of the students. During the course of this analysis, some phases of the process of knowledge discovery in databases were followed: \"selection\", \"preprocessing\", \"data mining\" and \"evaluation and interpretation\". In \"data mining\" phase, the tool DAMICORE was used to find behavioral patterns of students which were studied at the \"evaluation and interpretation\" phase. From this analysis, behavioral patterns of students were found, for example, male students have higher or lower yield against the female students and which students are going to have a good or bad yield on the final steps of the educational process. As the main result we have one of the made assumptions, \"Students who get good performance in the \"immediate posttest\" showed two of the following behaviors: few interactions in the \"intervention\", low time interacting with the system in the \"intervention\" and few misconceptions in \"pretest\"\", has proven its accuracy among the data used in this dissertation. Thus, through this research, it was concluded that the use of DAMICORE at educational context can help teacher to infer the performance of their students offering him the opportunity to perform the pedagogical interventions that help students who faces difficulties and show new challenges for those who have facilities in the subject studied. Atributos Mineração de dados educacionais Atributes Educational data mining
16	Educational Data Mining : En kvalitativ studie med inriktning på dataanalys för att hitta mönster i närvarostatistik / Educational Data Mining : A qualitative study focusing on data analysis to find patterns in presence statistics Borg, Olivia January 2019 (has links) Studien fokuserar på att hitta olika mönster i närvarostatistik hos elever som inte närvarar i skolan. Informationen som resultatet ger kan därefter användas som ett beslutsunderlag för skolor eller till andra organisationer som är intresserade av EDM inom närvarostatistik. Arbetet genomförde en kvalitativ metodansats med en fallstudie som bestod utav en litteraturstudie samt en implementation. Litteraturstudien användes för att få en förståelse över vanliga tillvägagångssätt inom EDM, som därefter låg till grund för implementationen som använde arbetssättet CRISP-DM. Resultatet blev fem olika mönster som definieras genom dataanalys. Mönstren visar frånvaro ur ett tidsperspektiv samt per ämne och kan ligga till grund för framtida beslutsunderlag. / The study focuses on finding different patterns in attendance statistics for students who are not present at school. The information provided by the results can thereafter be used as a basis for decision-making for schools or for other organizations interested in EDM within attendance statistics. The work carried out a qualitative method approach with a case study that consisted a literature study and an implementation. The literature study was used to gain an understanding of common approaches within EDM, which subsequently formed the basis for the implementation that used the working method CRISP-DM. The project resulted in five different patterns defined by data analysis. The patterns show absence from a time perspective and per subject and can form the basis for future decision-making. Data Mining Educational Data Mining Patterns Data Mining Educational Data Mining Mönster Information Systems
17	Predicting Student Performance in Programming Courses Using Test Unit Snapshot Data / Förutsägelse av Studentprestationer i Programmeringskurser med hjälp av Snapshot-data för Testenheter Elia, Sanherib January 2023 (has links) Predicting student performance is an important topic in academia, especially so in programming context, where identification of struggling students allows teachers to offer early and continuous assistance to help them improve their performance. It is thus essential to analyze student programming behavior to detect those at-risk students. This thesis uses data generated from 220 students in a master’s level programming course at a large European university. The students run unit tests in order to test their code when solving assignments, with a snapshot being taken of each test as it is executed. Unit testing is a method of testing software where individual units of source code are tested for correctness. A data set with simple features is derived from a database of snapshots and labeled with students’ grades. Then, the machine learning models support vector machine (SVM), naive Bayes (NB), random forest, and neural networks with one, two and three hidden layers each are trained, evaluated and performance is compared. The results show that SVM and neural networks models are likely the best performing all-rounders, with a possible naive Bayes selection depending on what goal one has. The thesis contributes by training machine learning models on students’ programming behavior. By arming teacher with models such as these, more students that need assistance can get in-time support and thus improve their performance. Future work can improve the models by using or combining other types of student data as features or use a larger data set. / Att förutsäga studenters prestationer är ett viktigt ämne inom akademin, särskilt i programmeringssammanhang, där identifiering av studenter som kämpar med sina studier gör det möjligt för lärare att erbjuda tidig och kontinuerlig hjälp för att hjälpa dem att förbättra sina prestationer. Det är därför viktigt att analysera studenternas programmeringsbeteende för att upptäcka dessa studenter som är vid risk. Denna uppsats använder data från 220 studenter i en programmeringskurs på masternivå vid ett stort europeiskt universitet. Studenterna kör enhetstester för att testa sin kod när de löser uppgifter, och en snapshot tas av varje test när det körs. Enhetstestning är en metod för att testa programvara där enskilda enheter av källkoden testas för korrekthet. En datamängd med enkla features härleds från en databas med snapshots och märks med studenternas betyg. Därefter tränas och utvärderas maskininlärningsmodellerna support vector machine (SVM), naive Bayes (NB), random forest och neurala nätverk med ett, två och tre dolda lager vardera och deras prestanda jämförs. Resultaten visar att SVM och neurala nätverk sannolikt är de bäst presterande allroundmodellerna, med ett möjligt naivt Bayes-val beroende på vilket mål man har. Uppsatsen bidrar genom att träna maskininlärningsmodeller på studenters programmeringsbeteende. Genom att utrusta lärare med modeller som dessa kan fler studenter som behöver hjälp få stöd i tid och därmed förbättra sina prestationer. Framtida arbete kan förbättra modellerna genom att använda eller kombinera andra typer av studentdata som features eller använda en större datamängd. Student performance prediction Programming education Unit test snapshot Machine learning Educational data mining Förutsägelse av studentprestationer programmeringsutbildning snapshot av enhetstest maskininlärning Educational data mining Computer Sciences Datavetenskap (datalogi)
18	Tracing Knowledge and Engagement in Parallel by Observing Behavior in Intelligent Tutoring Systems Schultz, Sarah E 27 January 2015 (has links) Two of the major goals in Educational Data Mining are determining studentsâ€™ state of knowledge and determining their affective state. It is useful to be able to determine whether a student is engaged with a tutor or task in order to adapt to his/her needs and necessary to have an idea of the students' knowledge state in order to provide material that is appropriately challenging. These two problems are usually examined separately and multiple methods have been proposed to solve each of them. However, little work has been done on examining both of these states in parallel and the combined effect on a studentâ€™s performance. The work reported in this thesis explores ways to observe both behavior and performance in order to more fully understand student state. engagement educational data mining Bayesian networks affect knowledge tracing student modeling intelligent tutoring system
19	Modeling Student Retention in an Environment with Delayed Testing Li, Shoujing 24 April 2013 (has links) Over the last two decades, the field of educational data mining (EDM) has been focusing on predicting the correctness of the next student response to the question (e.g., [2, 6] and the 2010 KDD Cup), in other words, predicting student short-term performance. Student modeling has been widely used for making such inferences. Although performing well on the immediate next problem is an indicator of mastery, it is by far not the only criteria. For example, the Pittsburgh Science of Learning Center's theoretic framework focuses on robust learning (e.g., [7, 10]), which includes the ability to transfer knowledge to new contexts, preparation for future learning of related skills, and retention - the ability of students to remember the knowledge they learned over a long time period. Especially for a cumulative subject such as mathematics, robust learning, particularly retention, is more important than short-term indicators of mastery. The Automatic Reassessment and Relearning System (ARRS) is a platform we developed and deployed on September 1st, 2012, which is mainly used by middle-school math teachers and their students. This system can help students better retain knowledge through automatically assigning tests to students, giving students opportunity to relearn the skill when necessary and generating reports to teachers. After we deployed and tested the system for about seven months, we have collected 287,424 data points from 6,292 students. We have created several models that predict students' retention performance using a variety of features, and discovered which were important for predicting correctness on a delayed test. We found that the strongest predictor of retention was a student's initial speed of mastering the content. The most striking finding was that students who struggled to master the content (took over 8 practice attempts) showed very poor retention, only 55% correct, after just one week. Our results will help us advance our understanding of learning and potentially improve ITS. Intelligent tutoring system Robust learning Student modeling Knowledge retention Educational data mining
20	Análise de modelos de regressão multiníveis simétricos / Analysis of symmetrical multilevel regression models Osio, Marina Mitie Gishifu 24 April 2013 (has links) O uso de modelos multiníveis é uma alternativa interessante para analisar dados que estão estruturados de forma hierárquica, pois permite a obtenção de diferentes estimativas de parâmetros relativos a grupos distintos e, ao mesmo tempo, leva em consideração a dependência entre as observações em um mesmo grupo. Neste trabalho, desenvolvemos e aplicamos modelos de regressão multiníveis simétricos, a fim de fornecer alternativas ao modelo usual, sob normalidade. Além disso, apresentamos uma breve análise de diagnóstico e estudo de simulação. Como motivação, consideramos dados educacionais, a fim de avaliar se o número de reprovações no histórico escolar do aluno e a infraestrutura da escola são variáveis relevantes que afetam o baixo desempenho dos alunos do ensino básico na disciplina de Matemática / The use of multilevel models is an interesting alternative to analyze data that is structured in a hierarchical manner, since it allows the obtention of different parameters estimates for distinct groups and, at the same time, it takes into account the dependence of observations in the same group. In this dissertation, we develop and apply symmetrical multilevel regression models, for the purpose of providing alternatives to the usual model, under normality. Furthermore we present a brief diagnostics analysis and a simulation study. As motivation, we consider educational data in order to assess whether the number of failures in school history of students and the school infrastructure are important variables that affect the low performance of elementary school students in Mathematics Dados educacionais Distribuições simétricas Educational data Hierachical models Modelos hierárquicos Modelos multiníveis Multilevel regression Symmetrical distribution

Search results