Spelling suggestions: "subject:"[een] DECISION TREE"" "subject:"[enn] DECISION TREE""
121 |
Automated dust storm detection using satellite images : development of a computer system for the detection of dust storms from MODIS satellite images and the creation of a new dust storm databaseEl-Ossta, Esam Elmehde Amar January 2013 (has links)
Dust storms are one of the natural hazards, which have increased in frequency in the recent years over Sahara desert, Australia, the Arabian Desert, Turkmenistan and northern China, which have worsened during the last decade. Dust storms increase air pollution, impact on urban areas and farms as well as affecting ground and air traffic. They cause damage to human health, reduce the temperature, cause damage to communication facilities, reduce visibility which delays both road and air traffic and impact on both urban and rural areas. Thus, it is important to know the causation, movement and radiation effects of dust storms. The monitoring and forecasting of dust storms is increasing in order to help governments reduce the negative impact of these storms. Satellite remote sensing is the most common method but its use over sandy ground is still limited as the two share similar characteristics. However, satellite remote sensing using true-colour images or estimates of aerosol optical thickness (AOT) and algorithms such as the deep blue algorithm have limitations for identifying dust storms. Many researchers have studied the detection of dust storms during daytime in a number of different regions of the world including China, Australia, America, and North Africa using a variety of satellite data but fewer studies have focused on detecting dust storms at night. The key elements of this present study are to use data from the Moderate Resolution Imaging Spectroradiometers on the Terra and Aqua satellites to develop more effective automated method for detecting dust storms during both day and night and generate a MODIS dust storm database.
|
122 |
An Integrative Approach for Examining the Determinants of Abnormal Returns: The Cases of Internet Security Breach and Ecommerce InitiativeAndoh-Baidoo, Francis Kofi 01 January 2006 (has links)
Researchers in various business disciplines use the event study methodology to assess the market value of firms through capital market reaction to news in the public media about the firm's activities. Capital market reaction is assessed based on cumulative abnormal return (sum of abnormal returns over the event window). In this study, the event study methodology is used to assess the impact that two important information technology activities, Internet security breach and ecommerce initiative, have on the market value of firms. While prior research on the relationship between these business activities and cumulative abnormal return involved the use of regression analysis, in this study, we use decision tree induction and regression.For the Internet security breach study, we use negative cumulative abnormal return as a surrogate for damage to the breached firm. In contrast to what has been reported in the research literature, our results suggest that the relationship between cumulative abnormal return and the independent variables for both the Internet security breach and ecommerce initiative studies is complex, often involving conditional interactions between the independent variables. We report that the incomplete contract theory is unable to effectively explain the relationship between cumulative abnormal return and the organizational variables. Other ecommerce theories provide support to the findings from our analysis. We show that both attack and firm characteristics are determinants of damage to breached firms.Our results revealed that the use of decision tree induction presents additional insight to that provided by regression models. We illustrate that there is value in using data mining techniques to study the market value of e-commerce initiative and Internet security breach and that this approach has applicability in other domains and that Decision Tree can enhance the event study methodology.We demonstrate that Decision Tree induction can be used for both theory building and theory testing. We specifically employ Decision Tree induction to test and enhance ecommerce theories and develop a theoretical model for cumulative abnormal return and ecommerce. We also present theoretical models for Internet security breach and damage to the breached firm. These models can be used by decision makers in Internet security and ecommerce investments strategic formulations and implementations.
|
123 |
Adaptive Similarity of XML Data / Adaptive Similarity of XML DataJílková, Eva January 2014 (has links)
In the present work we explore application of XML schema mapping in conceptual modeling of XML schemas. We expand upon the previous efforts to map XML schemas to PIM schema via a decision tree. In this thesis more versatile method is implemented - the decision tree is trained from a large set of user- annotated mapping decision samples. Several variations of training that could improve the mapping results are proposed. The approach is evaluated in a wide range of experiments that show the advantages and disadvantages of the proposed variations of training. The work also contains a survey of different approaches to schema mapping and description of schema used in this work. Powered by TCPDF (www.tcpdf.org)
|
124 |
Artificial intelligence and Machine learning : a diabetic readmission studyForsman, Robin, Jönsson, Jimmy January 2019 (has links)
The maturing of Artificial intelligence provides great opportunities for healthcare, but also comes with new challenges. For Artificial intelligence to be adequate a comprehensive analysis of the data is necessary along with testing the data in multiple algorithms to determine which algorithm is appropriate to use. In this study collection of data has been gathered that consists of patients who have either been readmitted or not readmitted to hospital within 30-days after being admitted. The data has then been analyzed and compared in different algorithms to determine the most appropriate algorithm to use.
|
125 |
Uma adaptação do método Binary Relevance utilizando árvores de decisão para problemas de classificação multirrótulo aplicado à genômica funcional / An Adaptation of Binary Relevance for Multi-Label Classification applied to Functional GenomicsTanaka, Erica Akemi 30 August 2013 (has links)
Muitos problemas de classificação descritos na literatura de aprendizado de máquina e mineração de dados dizem respeito à classificação em que cada exemplo pertence a um único rótulo. Porém, vários problemas de classificação, principalmente no campo de Bioinformática são associados a mais de um rótulo; esses problemas são conhecidos como problemas de classificação multirrótulo. O princípio básico da classificação multirrótulo é similar ao da classificação tradicional (que possui um único rótulo), sendo diferenciada no número de rótulos a serem preditos, na qual há dois ou mais rótulos. Na área da Bioinformática muitos problemas são compostos por uma grande quantidade de rótulos em que cada exemplo pode estar associado. Porém, algoritmos de classificação tradicionais são incapazes de lidar com um conjunto de exemplos mutirrótulo, uma vez que esses algoritmos foram projetados para predizer um único rótulo. Uma solução mais simples é utilizar o método conhecido como método Binary Relevance. Porém, estudos mostraram que tal abordagem não constitui uma boa solução para o problema da classificação multirrótulo, pois cada classe é tratada individualmente, ignorando as possíveis relações entre elas. Dessa maneira, o objetivo dessa pesquisa foi propor uma nova adaptação do método Binary Relevance que leva em consideração relações entre os rótulos para tentar minimizar sua desvantagem, além de também considerar a capacidade de interpretabilidade do modelo gerado, não só o desempenho. Os resultados experimentais mostraram que esse novo método é capaz de gerar árvores que relacionam os rótulos correlacionados e também possui um desempenho comparável ao de outros métodos, obtendo bons resultados usando a medida-F. / Many classification problems described in the literature on Machine Learning and Data Mining relate to the classification in which each example belongs to a single class. However, many classification problems, especially in the field of Bioinformatics, are associated with more than one class; these problems are known as multi-label classification problems. The basic principle of multi-label classification is similar to the traditional classification (single label), and distinguished by the number of classes to be predicted, in this case, in which there are two or more labels. In Bioinformatics many problems are composed of a large number of labels that can be associated with each example. However, traditional classification algorithms are unable to cope with a set of multi-label examples, since these algorithms are designed to predict a single label. A simpler solution is to use the method known as Binary Relevance. However, studies have shown that this approach is not a good solution to the problem of multi-label classification because each class is treated individually, ignoring possible relations between them. Thus, the objective of this research was to propose a new adaptation of Binary Relevance method that took into account relations between labels trying to minimize its disadvantage, and also consider the ability of interpretability of the model generated, not just its performance. The experimental results show that this new method is capable of generating trees that relate labels and also has a performance comparable to other methods, obtaining good results using F-measure.
|
126 |
Application of Stochastic Decision Models to Solid Waste ManagementWright, William Ervin 08 1900 (has links)
This research applies stochastic decision tree analytical techniques to a decision of the type a small community may face when choosing a solid waste disposal system from among several alternatives. Specifically targeted are those situations in which a community finds itself (1) lying at or near the boundary of a central planning area, (2) in a position to exercise one of several disposal options, and (3) has access to the data base on solid waste which has been systematically developed by a central planning agency. The options available may or may not be optimal in terms of total cost, either to the community or to adjacent communities which participate in centrally coordinated or jointly organized activities. The study suggests that stochastic simulation models, drawing upon a data base developed by central planning agencies in cases where local data are inadequate or not available, can be useful in evaluating disposal alternatives at the community level. Further, the decision tree can be usefully employed to communicate results of the analysis. Some important areas of further research on the small community disposal system selection problem are noted.
|
127 |
Algoritmo para indução de árvores de classificação para dados desbalanceados / Algorithm for induction of classification trees for unbalanced dataCláudio Frizzarini 21 November 2013 (has links)
As técnicas de mineração de dados, e mais especificamente de aprendizado de máquina, têm se popularizado enormemente nos últimos anos, passando a incorporar os Sistemas de Informação para Apoio à Decisão, Previsão de Eventos e Análise de Dados. Por exemplo, sistemas de apoio à decisão na área médica e ambientes de \\textit{Business Intelligence} fazem uso intensivo dessas técnicas. Algoritmos indutores de árvores de classificação, particularmente os algoritmos TDIDT (Top-Down Induction of Decision Trees), figuram entre as técnicas mais comuns de aprendizado supervisionado. Uma das vantagens desses algoritmos em relação a outros é que, uma vez construída e validada, a árvore tende a ser interpretada com relativa facilidade, sem a necessidade de conhecimento prévio sobre o algoritmo de construção. Todavia, são comuns problemas de classificação em que as frequências relativas das classes variam significativamente. Algoritmos baseados em minimização do erro global de classificação tendem a construir classificadores com baixas taxas de erro de classificação nas classes majoritárias e altas taxas de erro nas classes minoritárias. Esse fenômeno pode ser crítico quando as classes minoritárias representam eventos como a presença de uma doença grave (em um problema de diagnóstico médico) ou a inadimplência em um crédito concedido (em um problema de análise de crédito). Para tratar esse problema, diversos algoritmos TDIDT demandam a calibração de parâmetros {\\em ad-hoc} ou, na ausência de tais parâmetros, a adoção de métodos de balanceamento dos dados. As duas abordagens não apenas introduzem uma maior complexidade no uso das ferramentas de mineração de dados para usuários menos experientes, como também nem sempre estão disponíveis. Neste trabalho, propomos um novo algoritmo indutor de árvores de classificação para problemas com dados desbalanceados. Esse algoritmo, denominado atualmente DDBT (Dynamic Discriminant Bounds Tree), utiliza um critério de partição de nós que, ao invés de se basear em frequências absolutas de classes, compara as proporções das classes nos nós com as proporções do conjunto de treinamento original, buscando formar subconjuntos com maior discriminação de classes em relação ao conjunto de dados original. Para a rotulação de nós terminais, o algoritmo atribui a classe com maior prevalência relativa no nó em relação à prevalência no conjunto original. Essas características fornecem ao algoritmo a flexibilidade para o tratamento de conjuntos de dados com desbalanceamento de classes, resultando em um maior equilíbrio entre as taxas de erro em classificação de objetos entre as classes. / Data mining techniques and, particularly, machine learning methods, have become very popular in recent years. Many decision support information systems and business intelligence tools have incorporated and made intensive use of such techniques. Top-Down Induction of Decision Trees Algorithms (TDIDT) appear among the most popular tools for supervised learning. One of their advantages with respect to other methods is that a decision tree is frequently easy to be interpreted by the domain specialist, precluding the necessity of previous knowledge about the induction algorithms. On the other hand, several typical classification problems involve unbalanced data (heterogeneous class prevalence). In such cases, algorithms based on global error minimization tend to induce classifiers with low error rates over the high prevalence classes, but with high error rates on the low prevalence classes. This phenomenon may be critical when low prevalence classes represent rare or important events, like the presence of a severe disease or the default in a loan. In order to address this problem, several TDIDT algorithms require the calibration of {\\em ad-hoc} parameters, or even data balancing techniques. These approaches usually make data mining tools more complex for less expert users, if they are ever available. In this work, we propose a new TDIDT algorithm for problems involving unbalanced data. This algorithm, currently named DDBT (Dynamic Discriminant Bounds Tree), uses a node partition criterion which is not based on absolute class frequencies, but compares the prevalence of each class in the current node with those in the original training sample. For terminal nodes labeling, the algorithm assigns the class with maximum ration between the relative prevalence in the node and the original prevalence in the training sample. Such characteristics provide more flexibility for the treatment of unbalanced data-sets, yielding a higher equilibrium among the error rates in the classes.
|
128 |
Algoritmo para indução de árvores de classificação para dados desbalanceados / Algorithm for induction of classification trees for unbalanced dataFrizzarini, Cláudio 21 November 2013 (has links)
As técnicas de mineração de dados, e mais especificamente de aprendizado de máquina, têm se popularizado enormemente nos últimos anos, passando a incorporar os Sistemas de Informação para Apoio à Decisão, Previsão de Eventos e Análise de Dados. Por exemplo, sistemas de apoio à decisão na área médica e ambientes de \\textit{Business Intelligence} fazem uso intensivo dessas técnicas. Algoritmos indutores de árvores de classificação, particularmente os algoritmos TDIDT (Top-Down Induction of Decision Trees), figuram entre as técnicas mais comuns de aprendizado supervisionado. Uma das vantagens desses algoritmos em relação a outros é que, uma vez construída e validada, a árvore tende a ser interpretada com relativa facilidade, sem a necessidade de conhecimento prévio sobre o algoritmo de construção. Todavia, são comuns problemas de classificação em que as frequências relativas das classes variam significativamente. Algoritmos baseados em minimização do erro global de classificação tendem a construir classificadores com baixas taxas de erro de classificação nas classes majoritárias e altas taxas de erro nas classes minoritárias. Esse fenômeno pode ser crítico quando as classes minoritárias representam eventos como a presença de uma doença grave (em um problema de diagnóstico médico) ou a inadimplência em um crédito concedido (em um problema de análise de crédito). Para tratar esse problema, diversos algoritmos TDIDT demandam a calibração de parâmetros {\\em ad-hoc} ou, na ausência de tais parâmetros, a adoção de métodos de balanceamento dos dados. As duas abordagens não apenas introduzem uma maior complexidade no uso das ferramentas de mineração de dados para usuários menos experientes, como também nem sempre estão disponíveis. Neste trabalho, propomos um novo algoritmo indutor de árvores de classificação para problemas com dados desbalanceados. Esse algoritmo, denominado atualmente DDBT (Dynamic Discriminant Bounds Tree), utiliza um critério de partição de nós que, ao invés de se basear em frequências absolutas de classes, compara as proporções das classes nos nós com as proporções do conjunto de treinamento original, buscando formar subconjuntos com maior discriminação de classes em relação ao conjunto de dados original. Para a rotulação de nós terminais, o algoritmo atribui a classe com maior prevalência relativa no nó em relação à prevalência no conjunto original. Essas características fornecem ao algoritmo a flexibilidade para o tratamento de conjuntos de dados com desbalanceamento de classes, resultando em um maior equilíbrio entre as taxas de erro em classificação de objetos entre as classes. / Data mining techniques and, particularly, machine learning methods, have become very popular in recent years. Many decision support information systems and business intelligence tools have incorporated and made intensive use of such techniques. Top-Down Induction of Decision Trees Algorithms (TDIDT) appear among the most popular tools for supervised learning. One of their advantages with respect to other methods is that a decision tree is frequently easy to be interpreted by the domain specialist, precluding the necessity of previous knowledge about the induction algorithms. On the other hand, several typical classification problems involve unbalanced data (heterogeneous class prevalence). In such cases, algorithms based on global error minimization tend to induce classifiers with low error rates over the high prevalence classes, but with high error rates on the low prevalence classes. This phenomenon may be critical when low prevalence classes represent rare or important events, like the presence of a severe disease or the default in a loan. In order to address this problem, several TDIDT algorithms require the calibration of {\\em ad-hoc} parameters, or even data balancing techniques. These approaches usually make data mining tools more complex for less expert users, if they are ever available. In this work, we propose a new TDIDT algorithm for problems involving unbalanced data. This algorithm, currently named DDBT (Dynamic Discriminant Bounds Tree), uses a node partition criterion which is not based on absolute class frequencies, but compares the prevalence of each class in the current node with those in the original training sample. For terminal nodes labeling, the algorithm assigns the class with maximum ration between the relative prevalence in the node and the original prevalence in the training sample. Such characteristics provide more flexibility for the treatment of unbalanced data-sets, yielding a higher equilibrium among the error rates in the classes.
|
129 |
Model error space and data assimilation in the Mediterranean Sea and nested grids / Espace d'erreur et assimilation de données dans un modèle de la Mer Mediterranée et des grilles gigognes.Vandenbulcke, Luc 11 June 2007 (has links)
In this work, we implemented the GHER hydrodynamic model in the Gulf of
Lions (resolution 1/100°). This model is nested interactively in another model
covering the North-Western basin of the Mediterranean Sea (resolution 1/20°),
itself nested in a model covering the whole basin (1/4°). A data assimilation
filter, called the SEEK filter, is used to test in which of those grids observations taken in the Gulf of Lions are best assimilated. Therefore, twin experiments are used: a reference run is considered as the truth, and another run, starting from different initial conditions, assimilates pseudo-observations coming from
the reference run. It appeared that, in order to best constrain the coastal
model, available data should be assimilated in that model. The most efficient setup, however, is to group all the state vectors from the 3 grids into a single vector, and hence coherently modify the 3 domains at once during assimilation cycles.
Operational forecasting with nested models often only uses so-called passive
nesting: no data feedback happens from the regional models to the global model.
We propose a new idea: to use data assimilation as a substitute for the feedback.
Using again twin experiments, we show that when assimilating outputs
from the regional model in the global model, this has benecial impacts for the
subsequent forecasts in the regional model.
The data assimilation method used in those experiments corrects errors in the
models using only some privileged directions in the state space. Furthermore, these directions are selected from a previous model run. This is a weakness of the method when real observations are available. We tried to build new directions of the state space using an ensemble run, this time covering only the Mediterranean basin (without grid nesting). This led to a quantitative characterization of the forecast errors we might expect when various parameters and external forcings are affected by uncertainties.
Finally, using these new directions, we tried to build a statistical model supposed to simulate the hydrodynamical model using only a fraction of the computer resources needed by the latter. To achieve this goal, we tried out artifficial neural networks, nearest-neighbor and regression trees. This study constitutes only the first step toward an innovative statistical model, as in its present form, only a few degrees of freedom are considered and the primitive equation model is still required to build the AL method. We tried forecasting at 2 different time
horizons: one day and one week.
|
130 |
Predicting The Disease Of Alzheimer (ad) With Snp Biomarkers And Clinical Data Based Decision Support System Using Data Mining Classification ApproachesErdogan, Onur 01 September 2012 (has links) (PDF)
Single Nucleotide Polymorphisms (SNPs) are the most common DNA sequence variations where only a single nucleotide (A, T, C, G) in the human genome differs between individuals. Besides being the main genetic reason behind individual phenotypic differences, SNP variations have the potential to exploit the molecular basis of many complex diseases. Association of SNPs subset with diseases and analysis of the genotyping data with clinical findings will provide practical and affordable methodologies for the prediction of diseases in clinical settings. So, there is a need to determine the SNP subsets and patients&rsquo / clinical data which is informative for the prediction or the diagnosis of the particular diseases. So far, there is no established approach for selecting the representative SNP subset and patients&rsquo / clinical data, and data mining methodology that is based on finding hidden and key patterns over huge databases. This approach have the highest potential for extracting the knowledge from genomic datasets and to select the number of SNPs and most effective clinical features for diseases that are informative and relevant for clinical diagnosis. In this study we have applied one of the widely used data mining classification methodology: &ldquo / decision tree&rdquo / for associating the SNP Biomarkers and clinical data with the Alzheimer&rsquo / s disease (AD), which is the most common form of &ldquo / dementia&rdquo / . Different tree construction parameters have been compared for the optimization, and the most efficient and accurate tree for predicting the AD is presented.
|
Page generated in 0.0619 seconds