Global ETD Search

51	Árvores de decisão como técnica para classificar a resposta quanto à atividade celular in vitro para diferentes tratamentos superficiais em titânio Gamba, Mateus Luiz January 2016 (has links) Diversos artigos têm sido publicados a fim de avaliar a influência de diferentes tratamentos de superfície de TiO2/Ti sobre atividade celular de osteoblastos, tentando estabelecer dessa forma a relação entre as propriedades de superfície e o processo de ossointegração. No entanto, ainda existem lacunas críticas na avaliação e compreensão do efeito das propriedades de superfície sobre atividade celular. Como muitos fatores podem influenciar na resposta celular, a avaliação da influência combinada dos diferentes parâmetros empregados dificulta a compreensão do efeito das propriedades superficiais no processo de osseointegração, bem como a comparação do desempenho de diferentes tratamentos de superfícies. Além disso, uma avaliação comparativa entre estudos realizados por diferentes autores é muito difícil pelo fato de que não seja adotada uma padronização dos experimentos, por exemplo quanto ao tipo de célula empregada no estudo, dentre outros. Nesse contexto, o presente trabalho propõe o uso de um método computacional objetivando classificar e prever a resposta da atividade celular in vitro sobre superfícies de TiO2/Ti. A partir de resultados obtidos em artigos publicados por diferentes autores, foi construído um dataset relacionando a influência das propriedades de superfície TiO2/Ti (rugosidade e molhabilidade) sobre atividade celular e viabilidade pelo ensaio 3-(4,5-dimetiltiazol-2-il)-2,5-difenil tetrazólio bromide (MTT), empregando-se células osteoblásticas MC3TE-E1 e os mesmos critérios de monitoramento. Posteriormente foram aplicados os algoritmos de árvores de decisão J48 e SimpleCart para obter regras capazes de classificar e prever resultados da atividade celular em função das propriedades superficiais. A ferramenta empregada para gerar a árvore de decisão foi Weka. Dentre os algoritmos testados, o algoritmo SimpleCart apresentou uma melhor classificação, resultando em um coeficiente de Kappa de 40,45% contra o J48 o qual obteve um coeficiente de Kappa de 26,51%. Esse coeficiente é uma métrica utilizada para avaliar a qualidade da classificação da árvore de decisão. Nesse sentido, a árvore de decisão gerada permitiu identificar regras de decisão que podem ser empregados como um modelo preditivo e de classificação para o dataset construído, relacionando o efeito das propriedades superficiais (rugosidade e molhabilidade) de TiO2/Ti com a atividade celular. / Several articles have been published to evaluate the influence of different TiO2/Ti surface treatments on the cellular activity of osteoblasts, trying to establish the relationship between surface properties and the osseointegration process. However, there are still critical gaps in the assessment and understanding of the effect of these surface properties on the cellular activity. As many factors can influence on the cellular response, the combined influence evaluation of the different parameters applied makes it difficult to understand the effect of the surface properties on the osseointegration process, and the performance comparison of different surface treatments. In addition, a comparative evaluation between studies of different authors is very difficult to conduct because there is no pattering of experiments, for instance the cell type used in the study, among others. In this context, this paper proposes the use of a computational method aimed to classify and predict the cellular activity response in vitro on TiO2/Ti surfaces. From the results gotten in published articles of different authors, a dataset was built in order to relate the influence of TiO2/Ti surface properties (roughness and wettability) on the cellular activity and viability assay by 3-(4,5-dimethylthiazol-2-yl)-2,5- diphenyl tetrazolium bromide (MTT), using MC3TE-E1 osteoblastic cells, and the same monitoring criteria. Later the algorithms J48 and SimpleCart decision trees were applied to get rules able to classify and predict cellular activity results depending on the surface properties. Weka was the tool used to generate the decision tree. Among the tested algorithms, the SimpleCart algorithm presented the best classification, resulting in a Kappa coefficient of 40.45% compared to J48, which resulted in a Kappa coefficient of 26.51%. This coefficient is a metric used to evaluate the quality of the decision tree classification. In this way, the decision tree generated allowed the identification of decision rules that can be used as a predictive model for the dataset built related to the Ti/TiO2 surface properties (roughness and wettability) with the cellular activity. Tratamento de superfícies Titânio Atividade celular Surface treatments Cellular Activity MC3TEE1 Decision Tree TiO2 Titanium
52	Tutoring Students with Adaptive Strategies Wan, Hao 18 January 2017 (has links) Adaptive learning is a crucial part in intelligent tutoring systems. It provides students with appropriate tutoring interventions, based on studentsâ€™ characteristics, status, and other related features, in order to optimize their learning outcomes. It is required to determine studentsâ€™ knowledge level or learning progress, based on which it then uses proper techniques to choose the optimal interventions. In this dissertation work, I focus on these aspects related to the process in adaptive learning: student modeling, k-armed bandits, and contextual bandits. Student modeling. The main objective of student modeling is to develop cognitive models of students, including modeling content skills and knowledge about learning. In this work, we investigate the effect of prerequisite skill in predicting studentsâ€™ knowledge in post skills, and we make use of the prerequisite performance in different student models. As a result, this makes them superior to traditional models. K-armed bandits. We apply k-armed bandit algorithms to personalize interventions for students, to optimize their learning outcomes. Due to the lack of diverse interventions and small difference of intervention effectiveness in educational experiments, we also propose a simple selection strategy, and compare it with several k-armed bandit algorithms. Contextual bandits. In contextual bandit problem, additional side information, also called context, can be used to determine which action to select. First, we construct a feature evaluation mechanism, which determines which feature to be combined with bandits. Second, we propose a new decision tree algorithm, which is capable of detecting aptitude treatment effect for students. Third, with combined bandits with the decision tree, we apply the contextual bandits to make personalization in two different types of data, simulated data and real experimental data. decision tree prerequisite contextual bandits k-armed bandits student model adaptive learning
53	Detecting students who are conducting inquiry Without Thinking Fastidiously (WTF) in the Context of Microworld Learning Environments Wixon, Michael 09 April 2013 (has links) In recent years, there has been increased interest and research on identifying the various ways that students can deviate from expected or desired patterns while using educational software. This includes research on gaming the system, player transformation, haphazard inquiry, and failure to use key features of the learning system. Detection of these sorts of behaviors has helped researchers to better understand these behaviors, thus allowing software designers to develop interventions that can remediate them and/or reduce their negative impacts on student learning. This work addresses two types of student disengagement: carelessness and a behavior we term WTF (â€œWithout Thinking Fastidiouslyâ€�) behavior. Carelessness is defined as not demonstrating a skill despite knowing it; we measured carelessness using a machine learned model. In WTF behavior, the student is interacting with the software, but their actions appear to have no relationship to the intended learning task. We discuss the detector development process, validate the detectors with human labels of the behavior, and discuss implications for understanding how and why students conduct inquiry without thinking fastidiously while learning in science inquiry microworlds. Following this work we explore the relationship between student learner characteristics and the aforementioned disengaged behaviors carelessness and WTF. Our goal was to develop a deeper understanding of which learner characteristics correlate to carelessness or WTF behavior. Our work examines three alternative methods for predicting carelessness and WTF behaviors from learner characteristics: simple correlations, k-means clustering, and decision tree rule learners. Automated Detectors Cluster Analysis Decision Tree Rule Learners Machine Learning Disengaged Behavior Learner Characteristics Science Inquiry
54	Md-pread: um modelo para predição de reprovação de aprendizes na educação a distância usando árvore de decisão Ferreira, João Luiz Cavalcante 25 February 2016 (has links) Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2016-04-13T15:28:01Z No. of bitstreams: 1 João Luiz Cavalcante Ferreira_.pdf: 1672669 bytes, checksum: 80b5c6fbc873c9f858b230e78855dd55 (MD5) / Made available in DSpace on 2016-04-13T15:28:01Z (GMT). No. of bitstreams: 1 João Luiz Cavalcante Ferreira_.pdf: 1672669 bytes, checksum: 80b5c6fbc873c9f858b230e78855dd55 (MD5) Previous issue date: 2016-02-25 / Nenhuma / A Educação a Distância (EaD) no Brasil tem se consolidado com diversos estudantes optando por essa modalidade de ensino para ampliar suas formações e realização profissional, no entanto ela enfrenta alguns obstáculos, como a resistência de educandos e educadores, desafios organizacionais, custos de produção e a questão da reprovação ou retenção de alunos. Um dos principais diferenciais dos cursos EaD é a grande quantidade de dados gerados pelas interações no ambiente educacional, o que abre novas possibilidades para estudar e compreender estas interações. A Mineração de Dados educacionais (MDE) é uma área de pesquisa interdisciplinar que lida com o desenvolvimento de métodos para explorar dados originados no contexto educacional. A Learning Analytics (LA) é outra área de pesquisa emergente. Ela busca medir, coletar, analisar e relatar dados sobre estudantes. O desafio dos pesquisadores é desenvolver métodos capazes de prever o desempenho dos estudantes de modo a possibilitar a intervenção de professores e tutores visando resgatar o estudante antes que reprove. Esta dissertação propõe o MD-PREAD, um modelo para predição de grupos de risco de reprovação em um ambiente de Educação a Distância. A técnica de árvore de decisão foi utilizada para possibilitar um diferencial quanto à possibilidade de interpretação dos dados gerados pelo uso dos métodos de predição, pois outros métodos, tais como Redes Neurais Artificiais possuem como deficiência justamente a dificuldade de identificar as causas que levam aos resultados das predições. O modelo foi prototipado na ferramenta de mineração RapidMiner. Um experimento foi realizado no Instituto Federal de Educação, Ciência e Tecnologia do Amazonas, no programa Universidade Aberta do Brasil, no Curso de Filosofia da educação. Foram feitas coletas de dados históricos de 10 disciplinas de um grupo de 30 aprendizes em dois semestres consecutivos, 2014/2 e 2015/1, o total de alunos matriculados foi de 125, o total de interações levantadas foi de 41070, o cálculo de predição considerou as médias das avaliações de 30 aprendizes, os desvios padrões das interações e suas respectivas situações. Estes dados serviram para compor o conjunto de treinamento necessário para a definição da regra de classificação que teve como predominante a acurácia de 55% e a confiabilidade Kappa de 0,22. Foi realizado um segundo processo de validação, após o experimento, considerou-se os 125 alunos e o melhor classificador encontrado foi o J48 com a acurácia de 84,05%, precisão de 77,08% e recall de 50,23%. Concluiu-se que o MD-PREAD é uma ferramenta de auxílio no prognóstico de grupos de risco de reprovação, uma vez que possibilitou a geração e disponibilização semanal destes grupos a um sistema de recomendação educacional externo. / E-learning in Brazil has been established with many students opting for this type of education to expand their training and professional achievement, however it faces some obstacles, such as resistance from students and educators, organizational challenges, production costs and the question of failure or retention of students. One of the main advantages of e-learning courses is the large amount of data generated by the interactions in the educational environment, which opens up new possibilities to study and understand these interactions. Educational Data Mining (EDM) is an area of interdisciplinary research that deals with the development of methods to explore data that originates in the educational context. Learning Analytics (LA) is another area of emerging research. It seeks to measure, collect, analyze and report data on students. The challenge for researchers is to develop methods to predict the performance of students in order to allow the intervention of teachers and tutors aiming to retrieve the student before failing. This thesis proposes the MD-PREAD, a model for predicting failure of risk groups in a e-learning environment. The decision tree technique was used to enable a difference as to whether the interpretation of the data generated by the use of prediction methods, since other methods such as Artificial Neural Networks that has as disability difficulty in identifying precisely the causes that lead to predictions results. The model was prototyped in RapidMiner mining tool. An experiment was conducted at the Federal Institute of Education, Science and Technology of Amazonas, the Open University of Brazil program in course Philosophy of education. Historical data collection of 10 disciplines from a group of 30 apprentices were made in two consecutive semesters, 2014/2 and 2015/1, the total number of enrolled students was 125, the total raised interactions were 41070, the prediction calculation considered average of 30 apprentices ratings, the standard deviations of the interactions and their situations. These data served to compose the training set required for classification rule defining which had as predominant accuracy of 55% and Kappa reliability 0.22. A second validation process was carried out after the experiment. It was considered the total amount of 125 apprentices and the best classifier found was the J48 with the accuracy of 84.05%, 77.08% of classification precision and recall of 50.23%. It was concluded that the MD-PREAD is a support tool in the prognosis of failure risk groups, since it enabled the generation and weekly availability of these groups to a recommendation system. EaD Predição Árvore de decisão Learning analytics E-learnig Prediction Decision tree
55	Explorando técnicas para modelagem de dados agregados de óbitos provenientes de acidentes por automóvel / Exploring techniques for modeling of aggregates data from deaths automobile accidents Murilo Castanho dos Santos 01 October 2015 (has links) Esta dissertação se baseia na exploração de técnicas para modelagem de óbitos provenientes de acidentes por automóvel no estado de São Paulo. A análise foi agregada por área, e utilizou a razão de óbitos por população, por área e por fluxo veicular como variáveis dependentes e as variáveis independentes foram características socioeconômicas, área, frota de veículos, IDHM, fluxo veicular anual e distâncias entre microrregiões. Os dados do ano 2000 foram utilizados na calibração e dados de 2010 na validação dos modelos, com a técnica de mineração de dados (algoritmos de Árvore de Decisão - AD: CART - Classification And Regression Tree e CHAID - Chi-squared Automatic Interaction Detection) e Regressão Linear Múltipla (RLM) para fins comparativos com os modelos de AD. A partir dos resultados verifica-se que a RLM foi a técnica que obteve melhores erro médio, erro médio absoluto e coeficiente de correlação, e o algoritmo CART da AD o menor erro médio normalizado. Ao comparar as taxas de óbitos, a relação por área apresentou melhor erro médio e coeficiente de correlação, já a relação por população obteve menor erro médio normalizado e erro médio absoluto. Vale ressaltar que os algoritmos de AD são técnicas adequadas para classificação de áreas segundo faixas de valores de variáveis explicativas e valores médios da variável objeto de estudo. Além disso, tais técnicas são mais flexíveis em relação a alguns pressupostos de modelos de regressão. Dessa forma, a principal contribuição deste trabalho consiste na exploração de tais algoritmos para previsão de acidentes e classificação de regiões. / This dissertation is based on techniques exploration for modeling of deaths from automobile accidents on the state of São Paulo. The analysis was aggregated by area, and used the ratio of deaths per population, by area and by vehicle flow as dependent variables and the independent variables were socioeconomic characteristics, area, vehicle fleet, Municipal Human Development Index (MHDI), annual vehicle flow and distances between micro-regions. The 2000 data were used for calibration and 2010 data to validate the models with data mining technique (decision tree - DT algorithms: CART - Classification And Regression Tree and CHAID - Chi-squared Automatic Interaction Detection) and Multiple Linear Regression (MLR) for comparative purposes with the DT models. From the results it appears that the RLM was the technique that achieved better mean error, mean absolute error and correlation coefficient values, while the CART algorithm presented the lowest value of mean normalized error. When comparing death rates, a relation by area showed better mean error and correlation coefficient values, as the ratio by population had lower mean normalized error and mean absolute error values. It is noteworthy that the DT algorithms are suitable techniques for classification of areas in accordance with explanatory variables of value ranges and average values of the variable object of study. Furthermore, such techniques are more flexible compared to some assumptions regression models. Thus, the main contribution of this study is the exploration of such algorithms for prediction of accidents and regions classification. Árvore de Decisão Classificação Previsão de Acidentes Taxas de óbitos Accident prediction Classification Death rates Decision tree
56	Understanding complex systems through computational modeling and simulation / Comprendre les systèmes complexes par la modélisation et la simulation computationnelles Le, Xuan Tuan 18 January 2017 (has links) Les approches de simulation classiques ne sont en général pas adaptées pour traiter les aspects de complexité que présentent les systèmes complexes tels que l'émergence ou l'adaptation. Dans cette thèse, l'auteur s'appuie sur ses travaux menés dans le cadre d'un projet de simulation sur l’épidémie de grippe en France associée à des interventions sur une population en considérant le phénomène étudié comme un processus diffusif sur un réseau complexe d'individus, l'originalité réside dans le fait que la population y est considérée comme un système réactif. La modélisation de tels systèmes nécessite de spécifier explicitement le comportement des individus et les réactions de ceux-cis tout en produisant un modèle informatique qui doit être à la fois flexible et réutilisable. Les diagrammes d'états sont proposés comme une approche de programmation reposant sur une modélisation validée par l'expertise. Ils correspondent également à une spécification du code informatique désormais disponibles dans les outils logiciels de programmation agent. L'approche agent de type bottom-up permet d'obtenir des simulations de scénario "what-if" où le déroulement des actions peut nécessiter que les agents s'adaptent aux changements de contexte. Cette thèse propose également l'apprentissage pour un agent par l'emploi d'arbre de décision afin d'apporter flexibilité et lisibilité pour la définition du modèle de comportement des agents et une prise de décision adaptée au cours de la simulation. Notre approche de modélisation computationnelle est complémentaire aux approches traditionnelles et peut se révéler indispensable pour garantir une approche pluridisciplinaire validable par l'expertise. / Traditional approaches are not sufficient, and sometimes impossible in dealing with complexity issues such as emergence, self-organization, evolution and adaptation of complex systems. As illustrated in this thesis by the practical work of the author in a real-life project, the spreading of infectious disease as well as interventions could be considered as difusion processes on complex networks of heterogeneous individuals in a society which is considered as a reactive system. Modeling of this system requires explicitly specifying of each individual’s behaviors and (re)actions, and transforming them into computational model which has to be flexible, reusable, and ease of coding. Statechart, typical for model-based programming, is a good solution that the thesis proposes. Bottom-up agent based simulation finds emergence episodes in what-if scenarios that change rules governing agent’s behaviors that requires agents to learn to adapt with these changes. Decision tree learning is proposed to bring more flexibility and legibility in modeling of agent’s autonomous decision making during simulation runtime. Our proposition for computational models such as agent based models are complementary to traditional ones, and in some case they are unique solutions due to legal, ethical issues. Modélisation computationnelle Modélisation à base d’agents Statechart Arbre de décision Computational modeling Agent-Based model Statechart Decision tree
57	MINERAÇÃO DE DADOS APLICADA À CLASSIFICAÇÃO DOS CONTRIBUINTES DE ICMS DA SEFAZ-GO Rocha, Santiago Meireles 18 August 2017 (has links) Submitted by admin tede (tede@pucgoias.edu.br) on 2018-02-15T18:00:36Z No. of bitstreams: 1 SANTIAGO MEIRELES ROCHA.pdf: 972185 bytes, checksum: afac5e4d20639e20e3c5eed384124a70 (MD5) / Made available in DSpace on 2018-02-15T18:00:36Z (GMT). No. of bitstreams: 1 SANTIAGO MEIRELES ROCHA.pdf: 972185 bytes, checksum: afac5e4d20639e20e3c5eed384124a70 (MD5) Previous issue date: 2017-08-18 / With the exponential increase in the volume of data stored and the high potential for hidden knowledge in these data that can aid in the strategies and decision making of organizations, much has been invested in information technology and telecommunication. The purpose of this dissertation was to apply the Knowledge Discovery in Database (DCBD) process in order to classify the taxpayers of SEFAZ-GO ICMS in High Eviction and Low Eviction, through the task of data mining Supervised Classification, Implemented by the algorithm J48, on the WEKA computing platform. Three experiments were carried out with a sample of ICMS taxpayers data from the wholesale sector of the city of Goiânia-GO, with attributes selected from the Tax Code of the State of Goiás. During the experiments, the AttributeSelection and Discretize algorithms were applied. Reduction of attributes and transformation of the continuous variables into discrete ones, respectively. The statistical indices Confusion Matrix and Kappa Coefficient were used as validation metrics of the proposed model. After each experiment, classification rules were extracted, thus forming the proposed predictive model of classification. In the best scenario, a correct classification rate of 84% accuracy was obtained. Data mining is a reality within many organizations and can be a strong ally in fulfilling the, trivial, task of knowledge discovery in corporate databases. / Com o aumento exponencial do volume de dados armazenados e o alto potencial de conhecimento oculto nesses dados que pode auxiliar nas estratégias e nas tomadas de decisão das organizações, muito vem se investido em tecnologia da informação e telecomunicação. A presente dissertação teve como objetivo aplicar o processo de Descoberta do Conhecimento em Base de Dados (DCBD) a fim de classificar os contribuintes de ICMS da SEFAZ-GO em Alto Sonegador e Baixo Sonegador, por meio da tarefa de mineração de dados Classificação Supervisionada, implementada pelo algoritmo J48, na plataforma computacional WEKA. Foram realizados 3 experimentos com uma amostra de dados de contribuintes de ICMS do setor atacadista do município de Goiânia-GO, com atributos selecionados a partir do Código do Tributário do Estado de Goiás. Durante os experimentos foram aplicados os algoritmos AttributeSelection e Discretize, para a redução de atributos e transformação das variáveis contínuas em discretas, respectivamente. Os índices estatísticos Matriz de Confusão e Coeficiente de Kappa foram utilizados como métricas de validação do modelo proposto. Após cada experimento, regras de classificação foram extraídas formando assim o modelo preditivo proposto de classificação. Obteve-se, no melhor cenário, uma taxa de classificação correta de 84% de acerto. A mineração de dados é uma realidade dentro de muitas organizações e pode ser uma forte aliada no cumprimento da, nada trivial, tarefa de descoberta de conhecimento nas bases de dados corporativas. Tax evasion, Decision tree, KDD, WEKA ENGENHARIAS::ENGENHARIA DE PRODUCAO
58	Detecting students who are conducting inquiry Without Thinking Fastidiously (WTF) in the Context of Microworld Learning Environments Wixon, Michael 09 April 2013 (has links) In recent years, there has been increased interest and research on identifying the various ways that students can deviate from expected or desired patterns while using educational software. This includes research on gaming the system, player transformation, haphazard inquiry, and failure to use key features of the learning system. Detection of these sorts of behaviors has helped researchers to better understand these behaviors, thus allowing software designers to develop interventions that can remediate them and/or reduce their negative impacts on student learning. This work addresses two types of student disengagement: carelessness and a behavior we term WTF (“Without Thinking Fastidiously”) behavior. Carelessness is defined as not demonstrating a skill despite knowing it; we measured carelessness using a machine learned model. In WTF behavior, the student is interacting with the software, but their actions appear to have no relationship to the intended learning task. We discuss the detector development process, validate the detectors with human labels of the behavior, and discuss implications for understanding how and why students conduct inquiry without thinking fastidiously while learning in science inquiry microworlds. Following this work we explore the relationship between student learner characteristics and the aforementioned disengaged behaviors carelessness and WTF. Our goal was to develop a deeper understanding of which learner characteristics correlate to carelessness or WTF behavior. Our work examines three alternative methods for predicting carelessness and WTF behaviors from learner characteristics: simple correlations, k-means clustering, and decision tree rule learners. Automated Detectors Cluster Analysis Decision Tree Rule Learners Machine Learning Disengaged Behavior Learner Characteristics Science Inquiry
59	Classifying textual fast food restaurant reviews quantitatively using text mining and supervised machine learning algorithms Wright, Lindsey 01 May 2018 (has links) Companies continually seek to improve their business model through feedback and customer satisfaction surveys. Social media provides additional opportunities for this advanced exploration into the mind of the customer. By extracting customer feedback from social media platforms, companies may increase the sample size of their feedback and remove bias often found in questionnaires, resulting in better informed decision making. However, simply using personnel to analyze the thousands of relative social media content is financially expensive and time consuming. Thus, our study aims to establish a method to extract business intelligence from social media content by structuralizing opinionated textual data using text mining and classifying these reviews by the degree of customer satisfaction. By quantifying textual reviews, companies may perform statistical analysis to extract insight from the data as well as effectively address concerns. Specifically, we analyzed a subset of 56,000 Yelp reviews on fast food restaurants and attempt to predict a quantitative value reflecting the overall opinion of each review. We compare the use of two different predictive modeling techniques, bagged Decision Trees and Random Forest Classifiers. In order to simplify the problem, we train our model to accurately classify strongly negative and strongly positive reviews (1 and 5 stars) reviews. In addition, we identify drivers behind strongly positive or negative reviews allowing businesses to understand their strengths and weaknesses. This method provides companies an efficient and cost-effective method to process and understand customer satisfaction as it is discussed on social media. text mining sentiment analysis decision tree random forest Other Applied Mathematics
60	Learning From Spatially Disjoint Data Bhadoria, Divya 02 April 2004 (has links) Committees of classifiers, also called mixtures or ensembles of classifiers, have become popular because they have the potential to improve on the performance of a single classifier constructed from the same set of training data. Bagging and boosting are some of the better known methods of constructing a committee of classifiers. Committees of classifiers are also important because they have the potential to provide a computationally scalable approach to handling massive datasets. When the emphasis is on computationally scalable approaches to handling massive datasets, the individual classifiers are often constructed from a small faction of the total data. In this context, the ability to improve on the accuracy of a hypothetical single classifier created from all of the training data may be sacrificed. The design of a committee of classifiers typically assumes that all of the training data is equally available to be assigned to subsets as desired, and that each subset is used to train a classifier in the committee. However, there are some important application contexts in which this assumption is not valid. In many real life situations, massive data sets are created on a distributed computer, recording the simulation of important physical processes. Currently, experts visually browse such datasets to search for interesting events in the simulation. This sort of manual search for interesting events in massive datasets is time consuming. Therefore, one would like to construct a classifier that could automatically label the "interesting" events. The problem is that the dataset is distributed across a large number of processors in chunks that are spatially homogenous with respect to the underlying physical context in the simulation. Here, a potential solution to this problem using ensembles is explored. data mining decision tree nearest neighbor distributed learning classification American Studies Arts and Humanities

Search results