Global ETD Search

11	Classification and Regression Trees in R / Classification and Regression Trees in R Nemčíková, Lucia January 2014 (has links) Tree-based methods are a nice add-on to traditional statistical methods when solving classification and regression problems. The aim of this master thesis is not to judge which approach is better but rather bring the overview of these methods and apply them on the real data using R. Focus is made especially on the basic methodology of tree-based models and the application in specific software in order to provide wide range of tool for reader to be able to use these methods. One part of the thesis touches the advanced tree-based methods to provide full picture of possibilities.
12	Automatic Differential Diagnosis Model of Patients with Parkinsonian Syndrome : A model using multiple linear regression and classification tree learning Löwe, Rakel, Schneider, Ida January 2020 (has links) Parkinsonian syndrome is an umbrella term including several diseases with similar symptoms. PET images are key when differential diagnosing patients with parkinsonsian syndrome. In this work two automatic diagnosing models are developed and evaluated, with PET images as input, and a diagnosis as output. The two devoloped models are evaluated based on performance, in terms of sensitivity, specificity and misclassification error. The models consists of 1) regression model and 2) either a decision tree or a random forest. Two coefficients, alpha and beta, are introduced to train and test the models. The coefficients are the output from the regression model. They are calculated with multiple linear regression, with the patient images as dependent variables, and mean images of four patient groups as explanatory variables. The coefficients are the underlying relationship between the two. The four patient groups consisted of 18 healthy controls, 21 patients with Parkinson's disease, 17 patients with dementia with Lewi bodies and 15 patients with vascular parkinsonism. The models predict the patients with misclassification errors of 27% for the decision tree and 34% for the random forest. The patient group which is easiest to classify according to both models is healthy controls. The patient group which is hardest to classify is vascular parkinsonism. These results implies that alpha and beta are interesting outcomes from PET scans, and could, after further development of the model, be used as a guide when diagnosing in the models developed. PET Parkinsonism Parkinsonian syndrome multilinear regression classification tree Engineering and Technology Teknik och teknologier
13	Transferability and Robustness of Predictive Models to Proactively Assess Real-Time Freeway Crash Risk Shew, Cameron Hunter 01 October 2012 (has links) (PDF) This thesis describes the development and evaluation of real-time crash risk assessment models for four freeway corridors, US-101 NB (northbound) and SB (southbound) as well as I-880 NB and SB. Crash data for these freeway segments for the 16-month period from January 2010 through April 2011 are used to link historical crash occurrences with real-time traffic patterns observed through loop detector data. The analysis techniques adopted for this study are logistic regression and classification trees, which are one of the most common data mining tools. The crash risk assessment models are developed based on a binary classification approach (crash and non-crash outcomes), with traffic parameters measured at surrounding vehicle detection station (VDS) locations as the independent variables. The classification performance assessment methodology accounts for rarity of crashes compared to non-crash cases in the sample instead of the more common pre-specified threshold-based classification. Prior to development of the models, some of the data-related issues such as data cleaning and aggregation were addressed. Based on the modeling efforts, it was found that the turbulence in terms of speed variation is significantly associated with crash risk on the US-101 NB corridor. The models estimated with data from US-101 NB were evaluated based on their classification performance, not only on US-101 NB, but also on the other three freeways for transferability assessment. It was found that the predictive model derived from one freeway can be readily applied to other freeways, although the classification performance decreases. The models which transfer best to other roadways were found to be those that use the least number of VDSs–that is, using one upstream and downstream station rather than two or three. The classification accuracy of the models is discussed in terms of how the models can be used for real-time crash risk assessment, which may be helpful to authorities for freeway segments with newly installed traffic surveillance apparatuses, since the real-time crash risk assessment models from nearby freeways with existing infrastructure would be able to provide a reasonable estimate of crash risk. These models can also be applied for developing and testing variable speed limits (VSLs) and ramp metering strategies that proactively attempt to reduce crash risk. The robustness of the model output is assessed by location, time of day and day of week. The analysis shows that on some locations the models may require further learning due to higher than expected false positive (e.g., the I-680/I-280 interchange on US-101 NB) or false negative rates. The approach for post-processing the results from the model provides ideas to refine the model prior to or during the implementation. Real-time crash risk data mining classification tree loop detector data transferability robustness Civil Engineering
14	Analysis of Crash Location and Crash Severity Related to Work Zones in Ohio Alfallaj, Ibrahim Saleh 26 August 2014 (has links) No description available. Civil Engineering Transportation
15	Influencing elections with statistics: targeting voters with logistic regression trees Rusch, Thomas, Lee, Ilro, Hornik, Kurt, Jank, Wolfgang, Zeileis, Achim 09 1900 (has links) (PDF) In political campaigning substantial resources are spent on voter mobilization, that is, on identifying and influencing as many people as possible to vote. Campaigns use statistical tools for deciding whom to target ("microtargeting"). In this paper we describe a nonpartisan campaign that aims at increasing overall turnout using the example of the 2004 US presidential election. Based on a real data set of 19,634 eligible voters from Ohio, we introduce a modern statistical framework well suited for carrying out the main tasks of voter targeting in a single sweep: predicting an individual's turnout (or support) likelihood for a particular cause, party or candidate as well as data-driven voter segmentation. Our framework, which we refer to as LORET (for LOgistic REgression Trees), contains standard methods such as logistic regression and classification trees as special cases and allows for a synthesis of both techniques. For our case study, we explore various LORET models with different regressors in the logistic model components and different partitioning variables in the tree components; we analyze them in terms of their predictive accuracy and compare the effect of using the full set of available variables against using only a limited amount of information. We find that augmenting a standard set of variables (such as age and voting history) with additional predictor variables (such as the household composition in terms of party affiliation) clearly improves predictive accuracy. We also find that LORET models based on tree induction beat the unpartitioned models. Furthermore, we illustrate how voter segmentation arises from our framework and discuss the resulting profiles from a targeting point of view. (authors' abstract)
16	Algoritmo para indução de árvores de classificação para dados desbalanceados / Algorithm for induction of classification trees for unbalanced data Cláudio Frizzarini 21 November 2013 (has links) As técnicas de mineração de dados, e mais especificamente de aprendizado de máquina, têm se popularizado enormemente nos últimos anos, passando a incorporar os Sistemas de Informação para Apoio à Decisão, Previsão de Eventos e Análise de Dados. Por exemplo, sistemas de apoio à decisão na área médica e ambientes de \\textit{Business Intelligence} fazem uso intensivo dessas técnicas. Algoritmos indutores de árvores de classificação, particularmente os algoritmos TDIDT (Top-Down Induction of Decision Trees), figuram entre as técnicas mais comuns de aprendizado supervisionado. Uma das vantagens desses algoritmos em relação a outros é que, uma vez construída e validada, a árvore tende a ser interpretada com relativa facilidade, sem a necessidade de conhecimento prévio sobre o algoritmo de construção. Todavia, são comuns problemas de classificação em que as frequências relativas das classes variam significativamente. Algoritmos baseados em minimização do erro global de classificação tendem a construir classificadores com baixas taxas de erro de classificação nas classes majoritárias e altas taxas de erro nas classes minoritárias. Esse fenômeno pode ser crítico quando as classes minoritárias representam eventos como a presença de uma doença grave (em um problema de diagnóstico médico) ou a inadimplência em um crédito concedido (em um problema de análise de crédito). Para tratar esse problema, diversos algoritmos TDIDT demandam a calibração de parâmetros {\\em ad-hoc} ou, na ausência de tais parâmetros, a adoção de métodos de balanceamento dos dados. As duas abordagens não apenas introduzem uma maior complexidade no uso das ferramentas de mineração de dados para usuários menos experientes, como também nem sempre estão disponíveis. Neste trabalho, propomos um novo algoritmo indutor de árvores de classificação para problemas com dados desbalanceados. Esse algoritmo, denominado atualmente DDBT (Dynamic Discriminant Bounds Tree), utiliza um critério de partição de nós que, ao invés de se basear em frequências absolutas de classes, compara as proporções das classes nos nós com as proporções do conjunto de treinamento original, buscando formar subconjuntos com maior discriminação de classes em relação ao conjunto de dados original. Para a rotulação de nós terminais, o algoritmo atribui a classe com maior prevalência relativa no nó em relação à prevalência no conjunto original. Essas características fornecem ao algoritmo a flexibilidade para o tratamento de conjuntos de dados com desbalanceamento de classes, resultando em um maior equilíbrio entre as taxas de erro em classificação de objetos entre as classes. / Data mining techniques and, particularly, machine learning methods, have become very popular in recent years. Many decision support information systems and business intelligence tools have incorporated and made intensive use of such techniques. Top-Down Induction of Decision Trees Algorithms (TDIDT) appear among the most popular tools for supervised learning. One of their advantages with respect to other methods is that a decision tree is frequently easy to be interpreted by the domain specialist, precluding the necessity of previous knowledge about the induction algorithms. On the other hand, several typical classification problems involve unbalanced data (heterogeneous class prevalence). In such cases, algorithms based on global error minimization tend to induce classifiers with low error rates over the high prevalence classes, but with high error rates on the low prevalence classes. This phenomenon may be critical when low prevalence classes represent rare or important events, like the presence of a severe disease or the default in a loan. In order to address this problem, several TDIDT algorithms require the calibration of {\\em ad-hoc} parameters, or even data balancing techniques. These approaches usually make data mining tools more complex for less expert users, if they are ever available. In this work, we propose a new TDIDT algorithm for problems involving unbalanced data. This algorithm, currently named DDBT (Dynamic Discriminant Bounds Tree), uses a node partition criterion which is not based on absolute class frequencies, but compares the prevalence of each class in the current node with those in the original training sample. For terminal nodes labeling, the algorithm assigns the class with maximum ration between the relative prevalence in the node and the original prevalence in the training sample. Such characteristics provide more flexibility for the treatment of unbalanced data-sets, yielding a higher equilibrium among the error rates in the classes. Aprendizado supervisionado Árvore de classificação Árvore de decisão Dados desbalanceados Mineração de dados Classification tree Data mining Decision Tree Supervised learning Unbalanced data
17	Algoritmo para indução de árvores de classificação para dados desbalanceados / Algorithm for induction of classification trees for unbalanced data Frizzarini, Cláudio 21 November 2013 (has links) As técnicas de mineração de dados, e mais especificamente de aprendizado de máquina, têm se popularizado enormemente nos últimos anos, passando a incorporar os Sistemas de Informação para Apoio à Decisão, Previsão de Eventos e Análise de Dados. Por exemplo, sistemas de apoio à decisão na área médica e ambientes de \\textit{Business Intelligence} fazem uso intensivo dessas técnicas. Algoritmos indutores de árvores de classificação, particularmente os algoritmos TDIDT (Top-Down Induction of Decision Trees), figuram entre as técnicas mais comuns de aprendizado supervisionado. Uma das vantagens desses algoritmos em relação a outros é que, uma vez construída e validada, a árvore tende a ser interpretada com relativa facilidade, sem a necessidade de conhecimento prévio sobre o algoritmo de construção. Todavia, são comuns problemas de classificação em que as frequências relativas das classes variam significativamente. Algoritmos baseados em minimização do erro global de classificação tendem a construir classificadores com baixas taxas de erro de classificação nas classes majoritárias e altas taxas de erro nas classes minoritárias. Esse fenômeno pode ser crítico quando as classes minoritárias representam eventos como a presença de uma doença grave (em um problema de diagnóstico médico) ou a inadimplência em um crédito concedido (em um problema de análise de crédito). Para tratar esse problema, diversos algoritmos TDIDT demandam a calibração de parâmetros {\\em ad-hoc} ou, na ausência de tais parâmetros, a adoção de métodos de balanceamento dos dados. As duas abordagens não apenas introduzem uma maior complexidade no uso das ferramentas de mineração de dados para usuários menos experientes, como também nem sempre estão disponíveis. Neste trabalho, propomos um novo algoritmo indutor de árvores de classificação para problemas com dados desbalanceados. Esse algoritmo, denominado atualmente DDBT (Dynamic Discriminant Bounds Tree), utiliza um critério de partição de nós que, ao invés de se basear em frequências absolutas de classes, compara as proporções das classes nos nós com as proporções do conjunto de treinamento original, buscando formar subconjuntos com maior discriminação de classes em relação ao conjunto de dados original. Para a rotulação de nós terminais, o algoritmo atribui a classe com maior prevalência relativa no nó em relação à prevalência no conjunto original. Essas características fornecem ao algoritmo a flexibilidade para o tratamento de conjuntos de dados com desbalanceamento de classes, resultando em um maior equilíbrio entre as taxas de erro em classificação de objetos entre as classes. / Data mining techniques and, particularly, machine learning methods, have become very popular in recent years. Many decision support information systems and business intelligence tools have incorporated and made intensive use of such techniques. Top-Down Induction of Decision Trees Algorithms (TDIDT) appear among the most popular tools for supervised learning. One of their advantages with respect to other methods is that a decision tree is frequently easy to be interpreted by the domain specialist, precluding the necessity of previous knowledge about the induction algorithms. On the other hand, several typical classification problems involve unbalanced data (heterogeneous class prevalence). In such cases, algorithms based on global error minimization tend to induce classifiers with low error rates over the high prevalence classes, but with high error rates on the low prevalence classes. This phenomenon may be critical when low prevalence classes represent rare or important events, like the presence of a severe disease or the default in a loan. In order to address this problem, several TDIDT algorithms require the calibration of {\\em ad-hoc} parameters, or even data balancing techniques. These approaches usually make data mining tools more complex for less expert users, if they are ever available. In this work, we propose a new TDIDT algorithm for problems involving unbalanced data. This algorithm, currently named DDBT (Dynamic Discriminant Bounds Tree), uses a node partition criterion which is not based on absolute class frequencies, but compares the prevalence of each class in the current node with those in the original training sample. For terminal nodes labeling, the algorithm assigns the class with maximum ration between the relative prevalence in the node and the original prevalence in the training sample. Such characteristics provide more flexibility for the treatment of unbalanced data-sets, yielding a higher equilibrium among the error rates in the classes. Aprendizado supervisionado Árvore de classificação Árvore de decisão Classification tree Dados desbalanceados Data mining Decision Tree Mineração de dados Supervised learning Unbalanced data
18	Understanding current and potential distribution of Australian acacia species in southern Africa Motloung, Rethabile Frangenie 06 1900 (has links) This dissertation presents research on the value of using different sources of data to explore the factors determining invasiveness of introduced species. The research draws upon the availability of data on the historical trial plantings of alien species and other sources. The focus of the study is on Australian Acacia species as a taxon introduced into southern Africa (Lesotho, South Africa and Swaziland). The first component of the study focused on understanding the factors determining introduction outcome of species in historical trial plantings and invasion success of Australian Acacia species using Species Distribution Models (SDMs) and classification tree techniques. SDMs were calibrated using the native range occurrence records (Australia) and were validated using results of 150 years of South African government forestry trial planting records and invaded range data from the Southern African Plant Invaders Atlas. To understand factors associated with survival (‘trial success’) or failure to survive (‘trial failure’) of species in historical trial plantings, classification and regression tree analysis was used. The results indicate climate as one of the factors that explains introduction and/or invasion success of Australian Acacia species in southern Africa. However, the results also indicate that for ‘trial failures’ there are factors other than climate that could have influenced the trial outcome. This study emphasizes the need to integrate data on whether the species has been recorded to be invasive elsewhere with climate matching for invasion risk assessment. The second component of the study focused on understanding the distribution patterns of Australian Acacia species that are not known as invasive in southern Africa. The specific aims were to determine which species still exist at previously recorded sites and determine the current invasion status. This was done by collating data from different sources that list species introduced into southern Africa and then conducting revisits. For the purpose of this study, revisits means conducting field surveys based on recorded occurrences of introduced species. The known occurrence data for species on the list were obtained from different data sources and various invasion biology experts. As it was not practical to do revisits for all species on the list, three ornamental species (Acacia floribunda, A. pendula and A. retinodes) were selected as part of the pilot study for the conducted revisits in this study. Acacia retinodes trees were not found during the revisits. The results provided data that could be used to characterize species based on the Blackburn et al., (2011) scheme. However, it is not clear whether observed Acacia pendula or A. floribunda trees will spread away from the sites hence the need to continuously monitor sites for spread. The methods used in this research establish a protocol for future work on conducting revisits at known localities of introduced species to determine their population dynamics and thereby characterize the species according to the scheme for management purposes. / Dissertation (MSc)--University of Pretoria, 2014. / National Research Foundation (NRF) / Zoology and Entomology / MSc (Zoology) / Unrestricted Species distribution models Southern African Plant Invaders Atlas Forestry Classification tree Expert knowledge Field visits Alien tree UCTD
19	Remote Sensing for Detecting and Mapping Flowering Rush: A Case Study in the Ottawa National Wildlife Refuge (ONWR), Ohio Droog, Arisca 16 October 2012 (has links) No description available. Remote Sensing Flowering Rush Invasive Plants Species Linear Spectral Umixing Classification Tree Analysis Aerial Imagery Landsat TM Imagery Remote Sensing
20	Characterization of Foods by Chromatographic and Spectroscopic Methods Coupled to Chemometrics Aloglu, Ahmet Kemal 06 June 2018 (has links) No description available. Chemistry Food Science Chemometrics Classification Food Characterization Spectroscopy Multivariate Data Analysis Partial Least Squares Preprocessing Classification Tree Chromatography

Search results