Global ETD Search

61	n-TARP: A Random Projection based Method for Supervised and Unsupervised Machine Learning in High-dimensions with Application to Educational Data Analysis Yellamraju Tarun (6630578) 11 June 2019 (has links) Analyzing the structure of a dataset is a challenging problem in high-dimensions as the volume of the space increases at an exponential rate and typically, data becomes sparse in this high-dimensional space. This poses a significant challenge to machine learning methods which rely on exploiting structures underlying data to make meaningful inferences. This dissertation proposes the <i>n</i>-TARP method as a building block for high-dimensional data analysis, in both supervised and unsupervised scenarios.<div><br></div><div>The basic element, <i>n</i>-TARP, consists of a random projection framework to transform high-dimensional data to one-dimensional data in a manner that yields point separations in the projected space. The point separation can be tuned to reflect classes in supervised scenarios and clusters in unsupervised scenarios. The <i>n</i>-TARP method finds linear separations in high-dimensional data. This basic unit can be used repeatedly to find a variety of structures. It can be arranged in a hierarchical structure like a tree, which increases the model complexity, flexibility and discriminating power. Feature space extensions combined with <i>n</i>-TARP can also be used to investigate non-linear separations in high-dimensional data.<br></div><div><br></div><div>The application of <i>n</i>-TARP to both supervised and unsupervised problems is investigated in this dissertation. In the supervised scenario, a sequence of <i>n</i>-TARP based classifiers with increasing complexity is considered. The point separations are measured by classification metrics like accuracy, Gini impurity or entropy. The performance of these classifiers on image classification tasks is studied. This study provides an interesting insight into the working of classification methods. The sequence of <i>n</i>-TARP classifiers yields benchmark curves that put in context the accuracy and complexity of other classification methods for a given dataset. The benchmark curves are parameterized by classification error and computational cost to define a benchmarking plane. This framework splits this plane into regions of "positive-gain" and "negative-gain" which provide context for the performance and effectiveness of other classification methods. The asymptotes of benchmark curves are shown to be optimal (i.e. at Bayes Error) in some cases (Theorem 2.5.2).<br></div><div><br></div><div>In the unsupervised scenario, the <i>n</i>-TARP method highlights the existence of many different clustering structures in a dataset. However, not all structures present are statistically meaningful. This issue is amplified when the dataset is small, as random events may yield sample sets that exhibit separations that are not present in the distribution of the data. Thus, statistical validation is an important step in data analysis, especially in high-dimensions. However, in order to statistically validate results, often an exponentially increasing number of data samples are required as the dimensions increase. The proposed <i>n</i>-TARP method circumvents this challenge by evaluating statistical significance in the one-dimensional space of data projections. The <i>n</i>-TARP framework also results in several different statistically valid instances of point separation into clusters, as opposed to a unique "best" separation, which leads to a distribution of clusters induced by the random projection process.<br></div><div><br></div><div>The distributions of clusters resulting from <i>n</i>-TARP are studied. This dissertation focuses on small sample high-dimensional problems. A large number of distinct clusters are found, which are statistically validated. The distribution of clusters is studied as the dimensionality of the problem evolves through the extension of the feature space using monomial terms of increasing degree in the original features, which corresponds to investigating non-linear point separations in the projection space.<br></div><div><br></div><div>A statistical framework is introduced to detect patterns of dependence between the clusters formed with the features (predictors) and a chosen outcome (response) in the data that is not used by the clustering method. This framework is designed to detect the existence of a relationship between the predictors and response. This framework can also serve as an alternative cluster validation tool.<br></div><div><br></div><div>The concepts and methods developed in this dissertation are applied to a real world data analysis problem in Engineering Education. Specifically, engineering students' Habits of Mind are analyzed. The data at hand is qualitative, in the form of text, equations and figures. To use the <i>n</i>-TARP based analysis method, the source data must be transformed into quantitative data (vectors). This is done by modeling it as a random process based on the theoretical framework defined by a rubric. Since the number of students is small, this problem falls into the small sample high-dimensions scenario. The <i>n</i>-TARP clustering method is used to find groups within this data in a statistically valid manner. The resulting clusters are analyzed in the context of education to determine what is represented by the identified clusters. The dependence of student performance indicators like the course grade on the clusters formed with <i>n</i>-TARP are studied in the pattern dependence framework, and the observed effect is statistically validated. The data obtained suggests the presence of a large variety of different patterns of Habits of Mind among students, many of which are associated with significant grade differences. In particular, the course grade is found to be dependent on at least two Habits of Mind: "computation and estimation" and "values and attitudes."<br></div> Statistics Education Applied Statistics Probability Theory Stochastic Analysis and Modelling Pattern Recognition and Data Mining Education Assessment and Evaluation High-dimensions n-TARP Clustering Methods Machine Learning data analysis study Educational Data Statistical test pattern analysis techniques Pattern Dependence Benchmarks
62	Real-time Assessment, Prediction, and Scaffolding of Middle School Studentsâ€™ Data Collection Skills within Physical Science Simulations Sao Pedro, Michael A. 25 April 2013 (has links) Despite widespread recognition by science educators, researchers and K-12 frameworks that scientific inquiry should be an essential part of science education, typical classrooms and assessments still emphasize rote vocabulary, facts, and formulas. One of several reasons for this is that the rigorous assessment of complex inquiry skills is still in its infancy. Though progress has been made, there are still many challenges that hinder inquiry from being assessed in a meaningful, scalable, reliable and timely manner. To address some of these challenges and to realize the possibility of formative assessment of inquiry, we describe a novel approach for evaluating, tracking, and scaffolding inquiry process skills. These skills are demonstrated as students experiment with computer-based simulations. In this work, we focus on two skills related to data collection, designing controlled experiments and testing stated hypotheses. Central to this approach is the use and extension of techniques developed in the Intelligent Tutoring Systems and Educational Data Mining communities to handle the variety of ways in which students can demonstrate skills. To evaluate students' skills, we iteratively developed data-mined models (detectors) that can discern when students test their articulated hypotheses and design controlled experiments. To aggregate and track students' developing latent skill across activities, we use and extend the Bayesian Knowledge-Tracing framework (Corbett & Anderson, 1995). As part of this work, we directly address the scalability and reliability of these models' predictions because we tested how well they predict for student data not used to build them. When doing so, we found that these models demonstrate the potential to scale because they can correctly evaluate and track students' inquiry skills. The ability to evaluate students' inquiry also enables the system to provide automated, individualized feedback to students as they experiment. As part of this work, we also describe an approach to provide such scaffolding to students. We also tested the efficacy of these scaffolds by conducting a study to determine how scaffolding impacts acquisition and transfer of skill across science topics. When doing so, we found that students who received scaffolding versus students who did not were better able to acquire skills in the topic in which they practiced, and also transfer skills to a second topic when was scaffolding removed. Our overall findings suggest that computer-based simulations augmented with real-time feedback can be used to reliably measure the inquiry skills of interest and can help students learn how to demonstrate these skills. As such, our assessment approach and system as a whole shows promise as a way to formatively assess students' inquiry. Behavior Detection Skill Prediction User Modeling Validation Inquiry Learning Environment Science Education Computer-Based Assessment Inquiry Assessment Performance Assessment Science Simulations Science Microworlds Educational Data Mining Formative Assessment Exploratory Learning Environment Open-Ended Learning Environment Science Inquiry Science Assessment Designing and Conducting Experiments Construct Validity Generalizability Text Replay Tagging J48 Decision Trees Bayesian Knowledge Tracing
63	Real-time Assessment, Prediction, and Scaffolding of Middle School Studentsâ€™ Data Collection Skills within Physical Science Simulations Sao Pedro, Michael A. 25 April 2013 (has links) Despite widespread recognition by science educators, researchers and K-12 frameworks that scientific inquiry should be an essential part of science education, typical classrooms and assessments still emphasize rote vocabulary, facts, and formulas. One of several reasons for this is that the rigorous assessment of complex inquiry skills is still in its infancy. Though progress has been made, there are still many challenges that hinder inquiry from being assessed in a meaningful, scalable, reliable and timely manner. To address some of these challenges and to realize the possibility of formative assessment of inquiry, we describe a novel approach for evaluating, tracking, and scaffolding inquiry process skills. These skills are demonstrated as students experiment with computer-based simulations. In this work, we focus on two skills related to data collection, designing controlled experiments and testing stated hypotheses. Central to this approach is the use and extension of techniques developed in the Intelligent Tutoring Systems and Educational Data Mining communities to handle the variety of ways in which students can demonstrate skills. To evaluate students' skills, we iteratively developed data-mined models (detectors) that can discern when students test their articulated hypotheses and design controlled experiments. To aggregate and track students' developing latent skill across activities, we use and extend the Bayesian Knowledge-Tracing framework (Corbett & Anderson, 1995). As part of this work, we directly address the scalability and reliability of these models' predictions because we tested how well they predict for student data not used to build them. When doing so, we found that these models demonstrate the potential to scale because they can correctly evaluate and track students' inquiry skills. The ability to evaluate students' inquiry also enables the system to provide automated, individualized feedback to students as they experiment. As part of this work, we also describe an approach to provide such scaffolding to students. We also tested the efficacy of these scaffolds by conducting a study to determine how scaffolding impacts acquisition and transfer of skill across science topics. When doing so, we found that students who received scaffolding versus students who did not were better able to acquire skills in the topic in which they practiced, and also transfer skills to a second topic when was scaffolding removed. Our overall findings suggest that computer-based simulations augmented with real-time feedback can be used to reliably measure the inquiry skills of interest and can help students learn how to demonstrate these skills. As such, our assessment approach and system as a whole shows promise as a way to formatively assess students' inquiry. Behavior Detection Skill Prediction User Modeling Validation Inquiry Learning Environment Science Education Computer-Based Assessment Inquiry Assessment Performance Assessment Science Simulations Science Microworlds Educational Data Mining Formative Assessment Exploratory Learning Environment Open-Ended Learning Environment Science Inquiry Science Assessment Designing and Conducting Experiments Construct Validity Generalizability Text Replay Tagging J48 Decision Trees Bayesian Knowledge Tracing

Page generated in 0.1554 seconds