Spelling suggestions: "subject:"[een] CLASSIFICATION TREES"" "subject:"[enn] CLASSIFICATION TREES""
11 |
Social-Emotional Predictors of Postsecondary Enrollment for Students with DisabilitiesJanuary 2011 (has links)
abstract: The purpose of this exploratory study was to determine which social-emotional skills may predict postsecondary enrollment for students with disabilities. Students with disabilities are less likely to enroll in any form of postsecondary education and in turn experience poorer post-education outcomes than their general education peers. Using data from the second National Longitudinal Transition Study (NLTS2), a classification tree analysis was conducted on teacher-rated social-emotional behaviors in an attempt to determine which social-emotional skills were the strongest predictors of postsecondary enrollment. Items assessing social-emotional skills were selected from the second wave of teacher surveys based on their alignment with the broad taxonomy of social-emotional skills created by Caldarella and Merrell. The results of the classification tree analysis showed that one of the selected social-emotional items, teacher rated ability to follow directions, was the most significant predictor of postsecondary enrollment for students with disabilities. In general, the results suggest that compliance and, to a lesser extent, peer-relations skills, in addition to family income, predict postsecondary enrollment for students with high-incidence disabilities. This finding suggests that social-emotional skills play an important role in postsecondary enrollment for SWD, providing support for the use of social-emotional skills interventions in improving postsecondary enrollment rates and potentially post-educational outcomes for SWD. / Dissertation/Thesis / Ph.D. Educational Psychology 2011
|
12 |
Comparação de métodos de mapeamento digital de solos através de variáveis geomorfométricas e sistemas de informações geográficasCoelho, Fabrício Fernandes January 2010 (has links)
Mapas pedológicos são fontes de informações primordiais para planejamento e manejo de uso do solo, porém apresentam altos custos de produção. A fim de produzir mapas de solos a partir de mapas existentes, o presente trabalho objetiva testar e comparar métodos de classificação em estágio único (regressões logísticas múltiplas multinomiais e Bayes) e em estágios múltiplos (CART, J48 e LMT) com utilização de sistemas de informações geográficas e de variáveis geomorfométricas para produção de mapas pedológicos com legenda original e simplificada. A base de dados foi gerenciada em ambiente ArcGis onde as variáveis e o mapa original foram relacionados através de amostras de treinamento para os algoritmos. O resultado dos algoritmos obtidos no software Weka foram implementados no ArcGis para a confecção dos mapas. Foram gerados matrizes de erros para análise de acurácias dos mapas. As variáveis geomorfométricas de declividade, perfil e plano de curvatura, elevação e índice de umidade topográfica são aquelas que melhor explicam a distribuição espacial das classes de solo. Os métodos de classificação em estágio múltiplo apresentaram sensíveis melhoras nas acurácias globais, porém significativas melhoras nos índices Kappa. A utilização de legenda simplificada aumentou significativamente as acurácias do produtor e do usuário, porém sensível melhora na acurácia global e índice Kappa. / Soil maps are sources of important information for land planning and management, but are expensive to produce. This study proposes testing and comparing single stage classification methods (multiple multinomial logistic regression and Bayes) and multiple stage classification methods (CART, J48 and LMT) using geographic information system and terrain parameters for producing soil maps with both original and simplified legend. In ArcGis environment terrain parameters and original soil map were sampled for training algoritms. The results from statistical software Weka were implemented in ArcGis environment to generate digital soil maps. Error matrices were genereted for analysis accuracies of the maps.The terrain parameters that best explained soil distribution were slope, profile and planar curvature, elevation, and topographic wetness index. The multiple stage classification methods showed small improvements in overall accuracies and large improvements in the Kappa index. Simplification of the original legend significantly increased the producer and user accuracies, however produced small improvements in overall accuracies and Kappa index.
|
13 |
Comparação de métodos de mapeamento digital de solos através de variáveis geomorfométricas e sistemas de informações geográficasCoelho, Fabrício Fernandes January 2010 (has links)
Mapas pedológicos são fontes de informações primordiais para planejamento e manejo de uso do solo, porém apresentam altos custos de produção. A fim de produzir mapas de solos a partir de mapas existentes, o presente trabalho objetiva testar e comparar métodos de classificação em estágio único (regressões logísticas múltiplas multinomiais e Bayes) e em estágios múltiplos (CART, J48 e LMT) com utilização de sistemas de informações geográficas e de variáveis geomorfométricas para produção de mapas pedológicos com legenda original e simplificada. A base de dados foi gerenciada em ambiente ArcGis onde as variáveis e o mapa original foram relacionados através de amostras de treinamento para os algoritmos. O resultado dos algoritmos obtidos no software Weka foram implementados no ArcGis para a confecção dos mapas. Foram gerados matrizes de erros para análise de acurácias dos mapas. As variáveis geomorfométricas de declividade, perfil e plano de curvatura, elevação e índice de umidade topográfica são aquelas que melhor explicam a distribuição espacial das classes de solo. Os métodos de classificação em estágio múltiplo apresentaram sensíveis melhoras nas acurácias globais, porém significativas melhoras nos índices Kappa. A utilização de legenda simplificada aumentou significativamente as acurácias do produtor e do usuário, porém sensível melhora na acurácia global e índice Kappa. / Soil maps are sources of important information for land planning and management, but are expensive to produce. This study proposes testing and comparing single stage classification methods (multiple multinomial logistic regression and Bayes) and multiple stage classification methods (CART, J48 and LMT) using geographic information system and terrain parameters for producing soil maps with both original and simplified legend. In ArcGis environment terrain parameters and original soil map were sampled for training algoritms. The results from statistical software Weka were implemented in ArcGis environment to generate digital soil maps. Error matrices were genereted for analysis accuracies of the maps.The terrain parameters that best explained soil distribution were slope, profile and planar curvature, elevation, and topographic wetness index. The multiple stage classification methods showed small improvements in overall accuracies and large improvements in the Kappa index. Simplification of the original legend significantly increased the producer and user accuracies, however produced small improvements in overall accuracies and Kappa index.
|
14 |
Natural Language Explanation Model for Decision TreesSilva, Jesús, Hernández Palma, Hugo, Niebles Núẽz, William, Ruiz-Lazaro, Alex, Varela, Noel 07 January 2020 (has links)
This study describes a model of explanations in natural language for classification decision trees. The explanations include global aspects of the classifier and local aspects of the classification of a particular instance. The proposal is implemented in the ExpliClas open source Web service [1], which in its current version operates on trees built with Weka and data sets with numerical attributes. The feasibility of the proposal is illustrated with two example cases, where the detailed explanation of the respective classification trees is shown.
|
15 |
Application des arbres décisionnels en grappes pour prédire la performance des institutions microfinancières / Application of decision-trees for predicting the performance of microfinance institutionsBou Kheir, Roy 28 June 2013 (has links)
Les performances financières et sociales sont des caractéristiques institutionnelles importantes qui permettent aux pauvres et aux ‘quasi-pauvres' d'avoir accès aux crédits dans des conditions favorables, et aboutissent en même temps à un fonctionnement durable et aux mécanismes efficaces de gouvernances dans les institutions micro financières (IMFs). Dans ce contexte, cette étude a été menée afin de déterminer les variables financières/sociales/gouvernables qui peuvent influer les indicateurs de performance financière et sociale des IMFs à l'échelle mondiale; et de développer pour la première fois des arbres logiques décisionnels (en grappes) simples et pratiques qui peuvent être considérés comme des outils précieux aidant la mise en œuvre de stratégies efficaces pour les différents types des IMFs (à but lucratif et non lucratif) à l'échelle nationale.La première partie de cette thèse expose les données financières et sociales globales qui ont été extraites au cours des cinq dernières années (2007-2011) à partir de plusieurs bases de données bien connues (ex. Microfinance Information Exchange, Mix Market, Rating fund, etc…) pour les IMFs choisies classées comme ayant 4 ou 5 diamants (soit, 263 IMFs à but non lucratif et 135 IMFs à but lucratif) distribuées à travers les continents. Parmi les 263 IMFs à but non lucratif, l'échantillon de données a été composé de 192 organisations non-gouvernementales (ONGs), 42 institutions non bancaires et 29 coopératives. Un grand nombre de variables prédictives (54) ont été recueillies reflétant les aspects de l'environnement financier de ces IMFs (par exemple l'index des dépenses administratives, l'index de solvabilité, le coût par prêt, le nombre des déposants, etc…), les caractéristiques sociales (ex. profondeur, pourcentage des emprunteurs actifs ‘femmes', marché rural/urbain, niveau de pauvreté, etc…) et les mécanismes de gouvernance (ex. la taille de l'entreprise, la taille du conseil, la régulation, l'audit, l'affiliation à un réseau, l'assurance, etc…). Cette 1ère partie compare également l'efficacité de la plupart des méthodes/modèles statistiques les plus utilisés (incluant la régression linéaire, la régression logistique, les méthodes bayésiennes, les réseaux artificiels des neurones, l'analyse en composantes principales, etc….) pour estimer les indicateurs de performance financière et sociale au sein des IMFs. Elle inclue aussi une description détaillée du processus de construction des arbres décisionnels en grappes qui peut être utilisé pour cette estimation ainsi que toutes les étapes reliées (comprenant l'évaluation des divisions, l'assignement des catégories aux nœuds, les valeurs manquantes avec des répartiteurs de substitution, les critères d'arrêt, etc….).La deuxième partie explore les relations quantitatives entre les quatre indicateurs de performance financière les plus couramment utilisés [autosuffisance opérationnelle (operational self-sufficiency OSS), marge bénéficiaire (profit margin PM), rendement des actifs (return on assets ROA), et rendement des capitaux propres (return on equity ROE)] et les principales variables prédictives pour les IMFs choisies à but non lucratif (incluses à partir de 53 pays) à travers l'application de la modélisation par arbre de régression. Pour chaque indicateur de performance financière, plusieurs arbres de régression non élagués (684) ont été développés : (i) en utilisant toutes les variables prédictives, (ii) en utilisant toutes les variables prédictives financières seulement, (iii) en utilisant toutes les variables prédictives sociales seulement, (iv) en utilisant toutes les variables prédictives de gouvernance seulement, (v) en appliquant une seule variable prédictive à la fois, (vi) en excluant chaque variable à la fois du groupe potentiel des variables prédictives, et (vi) en forçant la séparation initiale de l'arbre à travers l'utilisation de la variable prédictive préférée afin d'explorer le pouvoir prédictif ... / Financial and social performances are important institutional characteristics that allow ‘the poor and the near-poor' to have access to credit in favorable conditions, and drives sustainable efficiency and effective governance mechanisms in MFIs (microfinance institutions). In this context, this study was conducted to determine the most influencing financial/social/governance variables (with their relative importance in %) that may affect the financial and social MFI performance indicators on worldwide basis; and to develop simple and practical microfinance tree-models (for the first time) that can be considered valuable tools helping with the implementation of efficient strategies among nonprofit and profit MFIs at a national scale.The first part of this thesis exposes the global financial and social data that has been extracted over the five recent years (2007-2011) from several well-known databases (e.g., Microfinance Information Exchange, Mix Market, Rating fund, etc.) for the chosen MFIs ranked four or five diamonds (i.e., 263 nonprofit MFIs and 135 profit ones) distributed widely over the continents. Among the 263 nonprofit MFIs, the data sample was composed of 192 Non-Governmental Organizations (NGOs), 42 non-bank institutions and 29 cooperatives. A large number of predictor variables (54) have been collected capturing aspects of the financial environment of these MFIs (e.g., administrative expense ratio, ratio of solvency, cost per loan, number of depositors, write-off-ratio, etc.), the social characteristics (e.g., depth, percent of women active borrowers, rural/urban market, poverty level, etc.) and the governance mechanisms (e.g., firm size, board size, regulation, audit, network affiliation, insurance, etc.). This first part compares also the efficiencies of the most used statistical methods/models (including linear regression, logistic regression, Bayesian methods, artificial neural networks, cluster analysis, principal component analysis, decision-trees, etc.) for estimating diverse financial and social performance MFIs' indicators. It includes also a detailed description of the tree building process that has been used for such estimation and all related steps (involving evaluating splits, assigning categories to nodes, missing values with surrogate splitters, stopping criteria, etc.).The second part explores quantitative relationships between the four commonly worldwide used financial performance indicators (operational self-sufficiency OSS, profit margin PM, return on assets ROA, and return on equity ROE) and key financial/social/governance predictor variables for the chosen non-profit MFIs (included from 53 countries) through the application of regression-tree modeling. For each financial performance indicator, several un-pruned regression trees (684) were developed: (i) using all predictor variables, (ii) all financial predictor variables only, (iii) all social predictor variables only, (iv) all governance predictor variables only, (v) applying only a single variable at a time, (vi) excluding each variable one at a time from the potential pool of predictor variables, and (vii) forcing the initial split of the tree using the preferred predictor variable for exploring the predictive power of independent predictors. The obtained results demonstrate that the strongest relationships were associated with ROE and ROA, the proportion of variance explained being equal to 99.8% and 99.5% respectively, followed by PM (97%) and OSS (95%). The second part also showed that the financial predictor variables did interfere differently in building the financial performance regression trees and associated relationships where ; administrative expense ratio influenced ROE (100%) ; average loan balance per borrower affected OSS (100%); cost per borrower, number of depositors, operating expense:loan portfolio, and risk coverage had significant impacts on ROA/ROE (98.5-100%).
|
16 |
Defining and predicting species-environment relationships : understanding the spatial ecology of demersal fish communitiesMoore, Cordelia Holly January 2009 (has links)
[Truncated abstract] The aim of this research was to define key species-environment relationships to better understand the spatial ecology of demersal fish. To help understand these relationships a combination of multivariate analyses, landscape analysis and species distribution models were employed. Of particular interest was to establish the scale at which these species respond to their environment. With recent high resolution surveying and mapping of the benthos in five of Victoria's Marine National Parks (MNPs), full coverage bathymetry, terrain data and accurate predicted benthic habitat maps were available for each of these parks. This information proved invaluable to this research, providing detailed (1:25,000) benthic environmental data, which facilitated the development and implementation of a very targeted and robust sampling strategy for the demersal fish at Cape Howe MNP. The sampling strategy was designed to provide good spatial coverage of the park and to represent the park's dominant substrate types and benthic communities, whilst also satisfying the assumptions of the statistical and spatial analyses applied. The fish assemblage data was collected using baited remote underwater stereo-video systems (stereo- BRUVS), with a total of 237 one-hour drops collected. Analysis of the video footage identified 77 species belonging to 40 families with a total of 14,449 individual fish recorded. ... This research revealed that the statistical modelling techniques employed provided an accurate means for predicting species distributions. These predicted distributions will allow for more effective management of these species by providing a robust and spatially explicit map of their current distribution enabling the identification and prediction of future changes in these species distributions. This research demonstrated the importance of the benthic environment on the spatial distribution of demersal fish. The results revealed that different species responded to different scales of investigation and that all scales must be ix considered to establish the factors fish are responding to and the strength and nature of this response. Having individual, continuous and spatially explicit environmental measures provided a significant advantage over traditional measures that group environmental and biological factors into 'habitat type'. It enabled better identification of individual factors, or correlates, driving the distribution of demersal fish. The environmental and biological measures were found to be of ecological relevance to the species and the scale of investigation and offered a more informative description of the distributions of the species examined. The use of species distribution modelling provided a robust means for the characterisation of the nature and strength of these relationships. In addition, it enabled species distributions to be predicted accurately across unsampled locations. Outcomes of the project include a greater understanding of how the benthic environment influences the distribution of demersal fish and demonstrates a suite of robust and useful marine species distribution tools that may be used by researcher and managers to understand, monitor, manage and predict marine species distributions.
|
17 |
Identifying mild cognitive impairment in older adultsRitchie, Lesley Jane 20 January 2009 (has links)
The absence of gold standard criteria for mild cognitive impairment (MCI) impedes the comparison of research findings and the development of primary and secondary prevention strategies addressing the possible conversion to dementia. The objective of Study 1 was to compare the predictive ability of different MCI models as markers for incipient dementia in a longitudinal population-based Canadian sample. The utility of well-documented MCI criteria using data from persons who underwent a clinical examination in the second wave of the Canadian Study of Health and Aging (CSHA) was examined. Demographic characteristics, average neuropsychological test performance, and prevalence and conversion rates were calculated for each classification. Receiver operating characteristic (ROC) analyses were employed to assess the predictive power of each cognitive classification. The highest prevalence and conversion rates were associated with case definitions of multiple-domain MCI. The only diagnostic criteria to significantly predict dementia five years later was the Cognitive Impairment, No Dementia (CIND) Type 2 case definition. It is estimated that more restrictive MCI case definitions fail to address the varying temporal increases in decline across different cognitive domains in the progression from normal cognitive functioning to dementia. Using data from the CSHA, the objective of Study 2 was to elucidate the clinical correlates that best differentiate between cognitive classifications. A machine learning algorithm was used to identify the symptoms that best discriminated between: 1) not cognitively impaired (NCI) and CIND; 2) CIND & demented; and 3) converting and non-converting CIND participants. Poor retrieval was consistently a significant predictor of greater cognitive impairment across all three questions. While interactions with other predictors were noted when differentiating CIND from NCI and demented from non-demented participants, retrieval was the sole predictor of conversion to dementia over five years. Importantly, the limited specificity and predictive values of the respective algorithms caution against their use as clinical markers of CIND, dementia, or conversion. Rather, it is recommended that the predictors serve as markers for ongoing monitoring and assessment. Overall, the results of both studies suggest that the architecture of pathological cognitive decline to dementia may not be captured by a single set of diagnostic criteria.
|
18 |
Identifying mild cognitive impairment in older adultsRitchie, Lesley Jane 20 January 2009 (has links)
The absence of gold standard criteria for mild cognitive impairment (MCI) impedes the comparison of research findings and the development of primary and secondary prevention strategies addressing the possible conversion to dementia. The objective of Study 1 was to compare the predictive ability of different MCI models as markers for incipient dementia in a longitudinal population-based Canadian sample. The utility of well-documented MCI criteria using data from persons who underwent a clinical examination in the second wave of the Canadian Study of Health and Aging (CSHA) was examined. Demographic characteristics, average neuropsychological test performance, and prevalence and conversion rates were calculated for each classification. Receiver operating characteristic (ROC) analyses were employed to assess the predictive power of each cognitive classification. The highest prevalence and conversion rates were associated with case definitions of multiple-domain MCI. The only diagnostic criteria to significantly predict dementia five years later was the Cognitive Impairment, No Dementia (CIND) Type 2 case definition. It is estimated that more restrictive MCI case definitions fail to address the varying temporal increases in decline across different cognitive domains in the progression from normal cognitive functioning to dementia. Using data from the CSHA, the objective of Study 2 was to elucidate the clinical correlates that best differentiate between cognitive classifications. A machine learning algorithm was used to identify the symptoms that best discriminated between: 1) not cognitively impaired (NCI) and CIND; 2) CIND & demented; and 3) converting and non-converting CIND participants. Poor retrieval was consistently a significant predictor of greater cognitive impairment across all three questions. While interactions with other predictors were noted when differentiating CIND from NCI and demented from non-demented participants, retrieval was the sole predictor of conversion to dementia over five years. Importantly, the limited specificity and predictive values of the respective algorithms caution against their use as clinical markers of CIND, dementia, or conversion. Rather, it is recommended that the predictors serve as markers for ongoing monitoring and assessment. Overall, the results of both studies suggest that the architecture of pathological cognitive decline to dementia may not be captured by a single set of diagnostic criteria.
|
19 |
Comparative Choice Analysis using Artificial Intelligence and Discrete Choice Models in A Transport ContextSehmisch, Sebastian 23 November 2021 (has links)
Artificial Intelligence in form of Machine Learning classifiers is increasingly applied for travel choice modeling issues and therefore constitutes a promising, competitive alternative towards conventional discrete choice models like the Logit approach. In comparison to traditional theory-based models, data-driven Machine Learning generally shows powerful predictive performance, but often lacks in model interpretability, i.e., the provision of comprehensible explanations of individual decision behavior. Consequently, the question about which approach is superior remains unanswered. Thus, this paper performs an in-depth comparison between benchmark Logit models and Artificial Neural Networks and Decision Trees representing two popular algorithms of Artificial Intelligence. The primary focus of the
analysis is on the models’ prediction performance and its ability to provide reasonable economic behavioral information such as the value of travel time and demand elasticities. For this purpose, I use crossvalidation and extract behavioral indicators numerically from Machine Learning models by means of post-hoc sensitivity analysis. All models are specified and estimated on synthetic and empirical data. As the results show, Neural Networks provide plausible aggregate value of time and elasticity measures, even though their values are in different regions as those of the Logit models. The simple Classification Tree algorithm, however, appears unsuitable for the applied computation procedure of these indicators, although it provides reasonable interpretable decision rules for travel choice behavior. Consistent with the literature, both Machine Learning methods achieve strong overall predictive performance and therefore outperform the Logit models in this regard. Finally, there is no clear indication of which approach is superior. Rather, there seems to be a methodological tradeoff between Artificial Intelligence and discrete choice models depending on the underlying modeling objective.
|
20 |
Influence of retraint systems during an automobile crash : prediction of injuries for frontal impact sled tests based on biomechanical data mining / Infkuence des systèmes de retenue lors d'un accident automobile : Prédiction des blessures de l'occupant lors d'essais catapultés frontaux basées sur le data miningCridelich, Carine caroline 17 December 2015 (has links)
La sécurité automobile est l’une des principales considérations lors de l’achat d’un véhicule. Avant d’ être commercialisée, une voiture doit répondre aux normes de sécurité du pays, ce qui conduit au développement de systèmes de retenue tels que les airbags et ceintures de sécurité. De plus, des ratings comme EURO NCAP et US NCAP permettent d’évaluer de manière indépendante la sécurité de la voiture. Des essais catapultes sont entre autres effectués pour confirmer le niveau de protection du véhicule et les résultats sont généralement basés sur des valeurs de référence des dommages corporels dérivés de paramètres physiques mesurés dans les mannequins.Cette thèse doctorale présente une approche pour le traitement des données d’entrée (c’est-à-dire des paramètres des systèmes de retenue définis par des experts) suivie d’une classification des essais catapultes frontaux selon ces mêmes paramètres. L’étude est uniquement basée sur les données du passager, les données collectées pour le conducteur n’ étant pas assez complètes pour produire des résultats satisfaisants. L’objectif principal est de créer un modèle qui définit l’influence des paramètres d’entrées sur la sévérité des dommages et qui aide les ingénieurs à avoir un ordre de grandeur des résultats des essais catapultes selon la législation ou le rating choisi. Les valeurs biomécaniques du mannequin (outputs du modèle) ont été regroupées en clusters dans le but de définir des niveaux de dommages corporels. Le modèle ainsi que les différents algorithmes ont été implémentés dans un programme pour une meilleur utilisation quotidienne. / Safety is one of the most important considerations when buying a new car. The car has to achievecrash tests defined by the legislation before being selling in a country, what drives to the developmentof safety systems such as airbags and seat belts. Additionally, ratings like EURO NCAP and US NCAPenable to provide an independent evaluation of the car safety. Frontal sled tests are thus carried outto confirm the protection level of the vehicle and the results are mainly based on injury assessmentreference values derived from physical parameters measured in dummies.This doctoral thesis presents an approach for the treatment of the input data (i.e. parameters ofthe restraint systems defined by experts) followed by a classification of frontal sled tests accordingto those parameters. The study is only based on data from the passenger side, the collected datafor the driver were not enough completed to produce satisfying results. The main objective is tocreate a model that evaluates the input parameters’ influence on the injury severity and helps theengineers having a prediction of the sled tests results according to the chosen legislation or rating.The dummy biomechanical values (outputs of the model) have been regrouped into clusters in orderto define injuries groups. The model and various algorithms have been implemented in a GraphicalUser Interface for a better practical daily use.
|
Page generated in 0.0319 seconds