Spelling suggestions: "subject:"[een] DECISION TREE"" "subject:"[enn] DECISION TREE""
181 |
Avaliação da alteração da qualidade do solo em diferentes áreas de Cerrado / Evaluation of soil quality change in different areas of CerradoGuerra, Fernando 11 September 2015 (has links)
A necessidade de minimizar os impactos ambientais, sem reduções da produtividade, levou à busca por métodos alternativos de produção, visando à manutenção da qualidade do solo (QS) e a sustentabilidade dos ecossistemas. Para a avaliação e quantificação da QS, o estudo foi dividido em três capítulos, cada qual com um objetivo específico: avaliação dos atributos do solo para obter um conjunto mínimo de indicadores (MDS, minimum data set) e cálculo de índice aditivo (IQSa) e aditivo ponderado (IQSap); avaliação do desempenho ambiental de sistemas agrícolas na produção de biomassa pela ótica da emergia e; aplicação do modelo de árvore de decisão (AD) para identificar os principais atributos afetados com a mudança do uso da terra. O estudo foi conduzido em São Carlos e São Desidério, Estado de São Paulo e da Bahia, respectivamente. Em São Carlos, as amostras de solo foram coletadas de duas áreas nativas (cerradão e cerrado stricto sensu) e áreas de cana-de-açúcar e pastagem. Em São Desidério, foram avaliadas quatro áreas agrícolas com diferentes tempos de uso (5, 8, 12 e 18 anos) no sistema de rotação soja-milho-algodão e área de Cerrado nativo. No 1º capítulo, o MDS foi obtido por meio da análise de componentes principais, normalizado em escores e integrado em IQSa e IQSap. No 2º capítulo foi quantificada somente a alteração da qualidade do solo (?QS) entre as áreas agrícolas e Cerrado nativo e, aliado à adoção de protocolos de contabilidade ambiental, foram propostos os indicadores emergéticos Input Embodiment in Soil Quality Change (IESQ) e Input Embodiment in Additional Biomass (IEAB). No último capítulo, foi gerada AD com o uso do solo estabelecido como atributo meta, enquanto atributos químicos e físicos do solo foram utilizados como atributos preditores. No 1º capítulo, o MDS foi composto por: soma de bases, pH, matéria orgânica do solo, teor de alumínio (Al), teor de argila, densidade do solo, água na capacidade de campo e carbono da biomassa microbiana do solo. Os valores de IQSa e IQSap para cerradão, cerrado stricto sensu, cana-de-açúcar e pastagem foram de 3,88, 2,24, 4,72 e 3,76 e 0,62, 0,36, 0,57 e 0,54, respectivamente, onde os maiores valores foram observados para o cerradão. No 2º capítulo, a área com 12 anos apresentou o maior ?QS com valores de +29,3. O total de emergia incorporada à soja, milho e algodão foram de 4,68E+15, 5,38E+15 e 7,28E+15 sej ha-1 ano-1, respectivamente. A área de 12 anos foi a que apresentou maior eficiência no uso de recursos por acréscimo de QS (IESQ = 0,19E+15 sej unidade-1) e por unidade de biomassa (IEAB = 0,78E15 sej Mg-1), o que equivale a uma demanda de 73% a menos de insumos (em termos de emergia) para obter o mesmo acréscimo de biomassa que a área de 8 anos de cultivo. A AD gerada no 3º capítulo permitiu a identificação dos atributos do solo mais importantes na diferenciação de Cerrado nativo e de cultivos agrícolas. O teor de Al, pH, fósforo e carbono orgânico total foram os atributos selecionados pela AD / The need to minimize environmental impacts without productivity reductions led to the search for alternative methods of production, in order to maintain soil quality (SQ) and the sustainability of ecosystems. This study was divided into three chapters, each one with a specific goal: soil attributes were analyzed in an attempt to obtain a minimum data set (MDS). Then the additive and weighted additive index were calculated, aimed at quantifying the SQ; the environmental performance of agricultural systems in the production of biomass was evaluated from the perspective of emergy; decision tree model (DT) was applied to identify the soil attributes which are affected by the change use from native Cerrado to non-tillage systems. This study was carried out in São Carlos and São Desidério cities, São Paulo and Bahia States, respectively. At São Carlos, topsoil samples were collected from different land uses includes two natural (cerradão and stricto sensu cerrado) and two agricultural areas (sugarcane and pasture). At São Desidério, soil samples were collected from four agricultural areas with different usage periods (5, 8, 12 and 18 years) in the rotation soy-corn-cotton system and native Cerrado area. In chapter 1, MDS was identified through principal component analysis, normalized and integrated into additive and weight additive index. In chapter 2, only the soil quality change (?SQ) between agricultural areas and native Cerrado was quantified, and combined with the adoption of environmental accounting protocols, the emergy indicators Input Embodiment in Soil Quality Change (IESQ) and Input Embodiment in Additional Biomass (IEAB) were proposed. In chapter 3, the data set used was the same from the second chapter. The soil attributes were used as potential predictor in the generation of the DT and the target attribute was land use. In chapter 1, the MDS was composed by sum of bases, pH, soil organic matter, aluminum content (Al), clay, bulk density, water content at field capacity and microbial biomass carbon. The SQI additive and weight additive for cerradão, cerrado, sugarcane and pasture were 3.88, 2.24, 4.72 and 3.76, and 0.62, 0.36, 0.57 and 0.54, respectively, with the highest value for cerradão. In chapter 2, the 12 years area had the highest ?SQ with +29.3 values. The total emergy incorporated into the soybeans, corn and cotton crops were 4.68E+15, 5.38E+15 and 7.28E+15 sej ha-1 year-1, res pectively. The 12 year area was the most efficient on the use of resources (external inputs) by SQ increase (IESQ = 0.19E+15 sej unit-1) and per unit of biomass (IEAB = 0.78E+15 sej Mg-1). This is equivalent to 73% less inputs demand (in terms of emergy) for obtaining the same increase of biomass of area of 8 years of cultivation. The DT model in chapter 3 allowed us to better understand the main attributes responsible for the differentiation of native Cerrado and agricultural areas. The content of Al, pH, phosphorus and total organic carbon were the attributes selected for the DT
|
182 |
Uma comparação de métodos de classificação aplicados à detecção de fraude em cartões de crédito / A comparison of classification methods applied to credit card fraud detectionGadi, Manoel Fernando Alonso 22 April 2008 (has links)
Em anos recentes, muitos algoritmos bio-inspirados têm surgido para resolver problemas de classificação. Em confirmação a isso, a revista Nature, em 2002, publicou um artigo que já apontava para o ano de 2003 o uso comercial de Sistemas Imunológicos Artificiais para detecção de fraude em instituições financeiras por uma empresa britânica. Apesar disso, não observamos, a luz de nosso conhecimento, nenhuma publicação científica com resultados promissores desde então. Nosso trabalho tratou de aplicar Sistemas Imunológicos Artificiais (AIS) para detecção de fraude em cartões de crédito. Comparamos AIS com os métodos de Árvore de Decisão (DT), Redes Neurais (NN), Redes Bayesianas (BN) e Naive Bayes (NB). Para uma comparação mais justa entre os métodos, busca exaustiva e algoritmo genético (GA) foram utilizados para selecionar um conjunto paramétrico otimizado, no sentido de minimizar o custo de fraude na base de dados de cartões de crédito cedida por um emissor de cartões de crédito brasileiro. Em adição à essa otimização, fizemos também uma análise e busca por parâmetros mais robustos via multi-resolução, estes parâmetros são apresentados neste trabalho. Especificidades de bases de fraude como desbalanceamento de dados e o diferente custo entre falso positivo e negativo foram levadas em conta. Todas as execuções foram realizadas no Weka, um software público e Open Source, e sempre foram utilizadas bases de teste para validação dos classificadores. Os resultados obtidos são consistentes com Maes et al. que mostra que BN são melhores que NN e, embora NN seja um dos métodos mais utilizados hoje, para nossa base de dados e nossas implementações, encontra-se entre os piores métodos. Apesar do resultado pobre usando parâmetros default, AIS obteve o melhor resultado com os parâmetros otimizados pelo GA, o que levou DT e AIS a apresentarem os melhores e mais robustos resultados entre todos os métodos testados. / In 2002, January the 31st, the famous journal Nature, with a strong impact in the scientific environment, published some news about immune based systems. Among the different considered applications, we can find detection of fraudulent financial transactions. One can find there the possibility of a commercial use of such system as close as 2003, in a British company. In spite of that, we do not know of any scientific publication that uses Artificial Immune Systems in financial fraud detection. This work reports results very satisfactory on the application of Artificial Immune Systems (AIS) to credit card fraud detection. In fact, scientific financial fraud detection publications are quite rare, as point out Phua et al. [PLSG05], in particular for credit card transactions. Phua et al. points out the fact that no public database of financial fraud transactions is available for public tests as the main cause of such a small number of publications. Two of the most important publications in this subject that report results about their implementations are the prized Maes (2000), that compares Neural Networks and Bayesian Networks in credit card fraud detection, with a favored result for Bayesian Networks and Stolfo et al. (1997), that proposed the method AdaCost. This thesis joins both these works and publishes results in credit card fraud detection. Moreover, in spite the non availability of Maes data and implementations, we reproduce the results of their and amplify the set of comparisons in such a way to compare the methods Neural Networks, Bayesian Networks, and also Artificial Immune Systems, Decision Trees, and even the simple Naïve Bayes. We reproduce in certain way the results of Stolfo et al. (1997) when we verify that the usage of a cost sensitive meta-heuristics, in fact generalized from the generalization done from the AdaBoost to the AdaCost, applied to several tested methods substantially improves it performance for all methods, but Naive Bayes. Our analysis took into account the skewed nature of the dataset, as well as the need of a parametric adjustment, sometimes through the usage of genetic algorithms, in order to obtain the best results from each compared method.
|
183 |
Violence, security perception and mode choice on trips to and from a university campus / Violência, percepção de segurança e escolha modal em viagens a um campus universitárioSilva, Denise Capasso da 04 August 2017 (has links)
This dissertation addresses the validation of the hypothesis there is a general sense that violence and security perception influence the use of sustainable travel modes. The research characterizes the issue of security perception among University of São Paulo (Brazil) users and identifies the way the sense of security and violence occurrences are related to the travel mode choice. An online survey on security perception and the way its participants access the campus was conducted. The target relationships were explored by Decision Tree (DT) algorithms. An initial exploratory analysis revealed occurrences of violence and reports of insecurity perception were strongly correlated on streets around the campus. The time analysis of violence distribution presented the incidents concentrated at night and during the week. The study also showed that security perception variation according to gender and travel mode choice is less sensitive to security perception than to the occurrence of violence, or type of affiliation to the university. Finally, DT algorithms explored the relation of spatially treated variables (i.e. route length to the university, density of violence occurrences and insecurity reports on the route) to mode choice. The results also showed that distance to the campus was relevant to the mode choice only in routes not strongly considered unsafe. In routes of higher insecurity perception, the share of nonmotorized modes was more expressive and the largest participation of sustainable modes was on routes with high incidence of violence. Since it is counterintuitive to assume numerous walking trips are a consequence of violence, the opposite was considered as a possible explanation to those results. The present study reinforces the need for increased surveillance in regions with high participation of non-motorized modes, for preventing users from shifting to motorized modes. / Esta dissertação busca comprovar a hipótese de que a violência e a percepção de segurança influenciam o uso de modos de transporte sustentáveis. A pesquisa caracteriza a questão da percepção de segurança entre os usuários da Universidade de São Paulo (Brasil), em São Carlos, e identifica como o sentimento de segurança pessoal e a violência estão relacionados com a escolha do modo de viagem. Foi realizada uma pesquisa on-line sobre a percepção de segurança dos usuários da universidade e a forma como eles acessam o campus. As interações foram exploradas por algoritmos de Árvore de Decisão (AD). Uma análise exploratória inicial mostrou que ocorrências de violência e relatos de insegurança estavam fortemente correlacionados nos trechos de via ao redor do campus. A análise temporal da distribuição da violência apresentou os incidentes concentrados à noite e durante os dias de semana. Além disso, a pesquisa mostrou que a percepção de segurança variou de acordo com o gênero e a escolha modal é menos sensível à percepção de segurança do que a ocorrência de violência, ou vinculação com a universidade. Por fim, os algoritmos de AD foram executados para explorar a relação das variáveis tratadas espacialmente (ou seja, o comprimento da rota até o campus, além da densidade de ocorrências e relatos de insegurança na rota) com a escolha modal. O último resultado obtido na análise foi que a distância até a universidade era relevante para a escolha modal apenas em rotas onde não há numerosos relatos de insegurança. A participação dos modos não motorizados foi mais expressiva nas rotas com maior percepção de insegurança, e em rotas com alta incidência de violência. Como não é razoável supor que mais viagens a pé são uma consequência dos roubos e sim o oposto, o estudo reforça a importância de aumentar a segurança nas regiões de alta incidência de viagens não motorizadas, de forma a não incentivar a migração destes usuários para modos motorizados.
|
184 |
Avaliação da alteração da qualidade do solo em diferentes áreas de Cerrado / Evaluation of soil quality change in different areas of CerradoFernando Guerra 11 September 2015 (has links)
A necessidade de minimizar os impactos ambientais, sem reduções da produtividade, levou à busca por métodos alternativos de produção, visando à manutenção da qualidade do solo (QS) e a sustentabilidade dos ecossistemas. Para a avaliação e quantificação da QS, o estudo foi dividido em três capítulos, cada qual com um objetivo específico: avaliação dos atributos do solo para obter um conjunto mínimo de indicadores (MDS, minimum data set) e cálculo de índice aditivo (IQSa) e aditivo ponderado (IQSap); avaliação do desempenho ambiental de sistemas agrícolas na produção de biomassa pela ótica da emergia e; aplicação do modelo de árvore de decisão (AD) para identificar os principais atributos afetados com a mudança do uso da terra. O estudo foi conduzido em São Carlos e São Desidério, Estado de São Paulo e da Bahia, respectivamente. Em São Carlos, as amostras de solo foram coletadas de duas áreas nativas (cerradão e cerrado stricto sensu) e áreas de cana-de-açúcar e pastagem. Em São Desidério, foram avaliadas quatro áreas agrícolas com diferentes tempos de uso (5, 8, 12 e 18 anos) no sistema de rotação soja-milho-algodão e área de Cerrado nativo. No 1º capítulo, o MDS foi obtido por meio da análise de componentes principais, normalizado em escores e integrado em IQSa e IQSap. No 2º capítulo foi quantificada somente a alteração da qualidade do solo (?QS) entre as áreas agrícolas e Cerrado nativo e, aliado à adoção de protocolos de contabilidade ambiental, foram propostos os indicadores emergéticos Input Embodiment in Soil Quality Change (IESQ) e Input Embodiment in Additional Biomass (IEAB). No último capítulo, foi gerada AD com o uso do solo estabelecido como atributo meta, enquanto atributos químicos e físicos do solo foram utilizados como atributos preditores. No 1º capítulo, o MDS foi composto por: soma de bases, pH, matéria orgânica do solo, teor de alumínio (Al), teor de argila, densidade do solo, água na capacidade de campo e carbono da biomassa microbiana do solo. Os valores de IQSa e IQSap para cerradão, cerrado stricto sensu, cana-de-açúcar e pastagem foram de 3,88, 2,24, 4,72 e 3,76 e 0,62, 0,36, 0,57 e 0,54, respectivamente, onde os maiores valores foram observados para o cerradão. No 2º capítulo, a área com 12 anos apresentou o maior ?QS com valores de +29,3. O total de emergia incorporada à soja, milho e algodão foram de 4,68E+15, 5,38E+15 e 7,28E+15 sej ha-1 ano-1, respectivamente. A área de 12 anos foi a que apresentou maior eficiência no uso de recursos por acréscimo de QS (IESQ = 0,19E+15 sej unidade-1) e por unidade de biomassa (IEAB = 0,78E15 sej Mg-1), o que equivale a uma demanda de 73% a menos de insumos (em termos de emergia) para obter o mesmo acréscimo de biomassa que a área de 8 anos de cultivo. A AD gerada no 3º capítulo permitiu a identificação dos atributos do solo mais importantes na diferenciação de Cerrado nativo e de cultivos agrícolas. O teor de Al, pH, fósforo e carbono orgânico total foram os atributos selecionados pela AD / The need to minimize environmental impacts without productivity reductions led to the search for alternative methods of production, in order to maintain soil quality (SQ) and the sustainability of ecosystems. This study was divided into three chapters, each one with a specific goal: soil attributes were analyzed in an attempt to obtain a minimum data set (MDS). Then the additive and weighted additive index were calculated, aimed at quantifying the SQ; the environmental performance of agricultural systems in the production of biomass was evaluated from the perspective of emergy; decision tree model (DT) was applied to identify the soil attributes which are affected by the change use from native Cerrado to non-tillage systems. This study was carried out in São Carlos and São Desidério cities, São Paulo and Bahia States, respectively. At São Carlos, topsoil samples were collected from different land uses includes two natural (cerradão and stricto sensu cerrado) and two agricultural areas (sugarcane and pasture). At São Desidério, soil samples were collected from four agricultural areas with different usage periods (5, 8, 12 and 18 years) in the rotation soy-corn-cotton system and native Cerrado area. In chapter 1, MDS was identified through principal component analysis, normalized and integrated into additive and weight additive index. In chapter 2, only the soil quality change (?SQ) between agricultural areas and native Cerrado was quantified, and combined with the adoption of environmental accounting protocols, the emergy indicators Input Embodiment in Soil Quality Change (IESQ) and Input Embodiment in Additional Biomass (IEAB) were proposed. In chapter 3, the data set used was the same from the second chapter. The soil attributes were used as potential predictor in the generation of the DT and the target attribute was land use. In chapter 1, the MDS was composed by sum of bases, pH, soil organic matter, aluminum content (Al), clay, bulk density, water content at field capacity and microbial biomass carbon. The SQI additive and weight additive for cerradão, cerrado, sugarcane and pasture were 3.88, 2.24, 4.72 and 3.76, and 0.62, 0.36, 0.57 and 0.54, respectively, with the highest value for cerradão. In chapter 2, the 12 years area had the highest ?SQ with +29.3 values. The total emergy incorporated into the soybeans, corn and cotton crops were 4.68E+15, 5.38E+15 and 7.28E+15 sej ha-1 year-1, res pectively. The 12 year area was the most efficient on the use of resources (external inputs) by SQ increase (IESQ = 0.19E+15 sej unit-1) and per unit of biomass (IEAB = 0.78E+15 sej Mg-1). This is equivalent to 73% less inputs demand (in terms of emergy) for obtaining the same increase of biomass of area of 8 years of cultivation. The DT model in chapter 3 allowed us to better understand the main attributes responsible for the differentiation of native Cerrado and agricultural areas. The content of Al, pH, phosphorus and total organic carbon were the attributes selected for the DT
|
185 |
Aplicação de classificadores para determinação de conformidade de biodiesel / Attesting compliance of biodiesel quality using classification methodsLOPES, Marcus Vinicius de Sousa 26 July 2017 (has links)
Submitted by Rosivalda Pereira (mrs.pereira@ufma.br) on 2017-09-04T17:47:07Z
No. of bitstreams: 1
MarcusLopes.pdf: 2085041 bytes, checksum: 14f6f9bbe0d5b050a23103874af8c783 (MD5) / Made available in DSpace on 2017-09-04T17:47:07Z (GMT). No. of bitstreams: 1
MarcusLopes.pdf: 2085041 bytes, checksum: 14f6f9bbe0d5b050a23103874af8c783 (MD5)
Previous issue date: 2017-07-26 / The growing demand for energy and the limitations of oil reserves have led to the
search for renewable and sustainable energy sources to replace, even partially, fossil fuels.
Biodiesel has become in last decades the main alternative to petroleum diesel. Its quality
is evaluated by given parameters and specifications which vary according to country or
region like, for example, in Europe (EN 14214), US (ASTM D6751) and Brazil (RANP
45/2014), among others. Some of these parameters are intrinsically related to the composition
of fatty acid methyl esters (FAMEs) of biodiesel, such as viscosity, density, oxidative
stability and iodine value, which allows to relate the behavior of these properties with the
size of the carbon chain and the presence of unsaturation in the molecules. In the present
work four methods for direct classification (support vector machine, K-nearest neighbors,
decision tree classifier and artificial neural networks) were optimized and compared to
classify biodiesel samples according to their compliance to viscosity, density, oxidative
stability and iodine value, having as input the composition of fatty acid methyl esters,
since those parameters are intrinsically related to composition of biodiesel. The classifi-
cations were carried out under the specifications of standards EN 14214, ASTM D6751
and RANP 45/2014. A comparison between these methods of direct classification and empirical
equations (indirect classification) distinguished positively the direct classification
methods in the problem addressed, especially when the biodiesel samples have properties
values very close to the limits of the considered specifications. / A demanda crescente por fontes de energia renováveis e como alternativa aos combustíveis
fósseis tornam o biodiesel como uma das principais alternativas para substituição dos derivados do petróleo. O controle da qualidade do biodiesel durante processo de
produção e distribuição é extremamente importante para garantir um combustível com
qualidade confiável e com desempenho satisfatório para o usuário final. O biodiesel é
caracterizado pela medição de determinadas propriedades de acordo com normas internacionais.
A utilização de métodos de aprendizagem de máquina para a caracterização do
biodiesel permite economia de tempo e dinheiro. Neste trabalho é mostrado que para a
determinação da conformidade de um biodiesel os classificadores SVM, KNN e Árvore de
decisões apresentam melhores resultados que os métodos de predição de trabalhos anteriores.
Para as propriedades de viscosidade densidade, índice de iodo e estabilidade oxidativa
(RANP 45/2014, EN14214:2014 e ASTM D6751-15) os classificadores KNN e Árvore de
decisões apresentaram-se como melhores opções. Estes resultados mostram que os classificadores
podem ser aplicados de forma prática visando economia de tempo, recursos
financeiros e humanos.
|
186 |
Emprego de diferentes algoritmos de árvores de decisão na classificação da atividade celular in vitro para tratamentos de superfícies de titânioFernandes, Fabiano Rodrigues January 2017 (has links)
O interesse pela área de análise e caracterização de materiais biomédicos cresce, devido a necessidade de selecionar de forma adequada, o material a ser utilizado. Dependendo das condições em que o material será submetido, a caracterização poderá abranger a avaliação de propriedades mecânicas, elétricas, bioatividade, imunogenicidade, eletrônicas, magnéticas, ópticas, químicas e térmicas. A literatura relata o emprego da técnica de árvores de decisão, utilizando os algoritmos SimpleCart(CART) e J48, para classificação de base de dados (dataset), gerada a partir de resultados de artigos científicos. Esse estudo foi realizado afim de identificar características superficiais que otimizassem a atividade celular. Para isso, avaliou-se, a partir de artigos publicados, o efeito de tratamento de superfície do titânio na atividade celular in vitro (células MC3TE-E1). Ficou constatado que, o emprego do algoritmo SimpleCart proporcionou uma melhor resposta em relação ao algoritmo J48. Nesse contexto, o presente trabalho tem como objetivo aplicar, para esse mesmo estudo, os algoritmos CHAID (Chi-square iteration automatic detection) e CHAID Exaustivo, comparando com os resultados obtidos com o emprego do algoritmo SimpleCart. A validação dos resultados, mostraram que o algoritmo CHAID Exaustivo obteve o melhor resultado em comparação ao algoritmo CHAID, obtendo uma estimativa de acerto de 75,9% contra 58,6% respectivamente, e um erro padrão de 7,9% contra 9,1% respectivamente, enquanto que, o algoritmo já testado na literatura SimpleCart(CART) teve como resultado 34,5% de estimativa de acerto com um erro padrão de 8,8%. Com relação aos tempos de execução apurados sobre 22 mil registros, evidenciaram que o algoritmo CHAID Exaustivo apresentou os melhores tempos, com ganho de 0,02 segundos sobre o algoritmo CHAID e 14,45 segundos sobre o algoritmo SimpleCart(CART). / The interest for the area of analysis and characterization of biomedical materials as the need for selecting the adequate material to be used increases. However, depending on the conditions to which materials are submitted, characterization may involve the evaluation of mechanical, electrical, optical, chemical and thermal properties besides bioactivity and immunogenicity. Literature review shows the application decision trees, using SimpleCart(CART) and J48 algorithms, to classify the dataset, which is generated from the results of scientific articles. Therefore the objective of this study was to identify surface characteristics that optimizes the cellular activity. Based on published articles, the effect of the surface treatment of titanium on the in vitro cells (MC3TE-E1 cells) was evaluated. It was found that applying SimpleCart algorithm gives better results than the J48. In this sense, the present study has the objective to apply the CHAID (Chi-square iteration automatic detection) algorithm and Exhaustive CHAID to the surveyed data, and compare the results obtained with the application of SimpleCart algorithm. The validation of the results showed that the Exhaustive CHAID obtained better results comparing to CHAID algorithm, obtaining 75.9 % of accurate estimation against 58.5%, respectively, while the standard error was 7.9% against 9.1%, respectively. Comparing the obtained results with SimpleCart(CART) results which had already been tested and presented in the literature, the results for accurate estimation was 34.5% and the standard error 8.8%. In relation to execution time found through the 22.000 registers, it showed that the algorithm Exhaustive CHAID presented the best times, with a gain of 0.02 seconds over the CHAID algorithm and 14.45 seconds over the SimpleCart(CART) algorithm.
|
187 |
Applications of Knowledge Discovery in Quality Registries - Predicting Recurrence of Breast Cancer and Analyzing Non-compliance with a Clinical GuidelineRazavi, Amir Reza January 2007 (has links)
In medicine, data are produced from different sources and continuously stored in data depositories. Examples of these growing databases are quality registries. In Sweden, there are many cancer registries where data on cancer patients are gathered and recorded and are used mainly for reporting survival analyses to high level health authorities. In this thesis, a breast cancer quality registry operating in South-East of Sweden is used as the data source for newer analytical techniques, i.e. data mining as a part of knowledge discovery in databases (KDD) methodology. Analyses are done to sift through these data in order to find interesting information and hidden knowledge. KDD consists of multiple steps, starting with gathering data from different sources and preparing them in data pre-processing stages prior to data mining. Data were cleaned from outliers and noise and missing values were handled. Then a proper subset of the data was chosen by canonical correlation analysis (CCA) in a dimensionality reduction step. This technique was chosen because there were multiple outcomes, and variables had complex relationship to one another. After data were prepared, they were analyzed with a data mining method. Decision tree induction as a simple and efficient method was used to mine the data. To show the benefits of proper data pre-processing, results from data mining with pre-processing of the data were compared with results from data mining without data pre-processing. The comparison showed that data pre-processing results in a more compact model with a better performance in predicting the recurrence of cancer. An important part of knowledge discovery in medicine is to increase the involvement of medical experts in the process. This starts with enquiry about current problems in their field, which leads to finding areas where computer support can be helpful. The experts can suggest potentially important variables and should then approve and validate new patterns or knowledge as predictive or descriptive models. If it can be shown that the performance of a model is comparable to domain experts, it is more probable that the model will be used to support physicians in their daily decision-making. In this thesis, we validated the model by comparing predictions done by data mining and those made by domain experts without finding any significant difference between them. Breast cancer patients who are treated with mastectomy are recommended to receive radiotherapy. This treatment is called postmastectomy radiotherapy (PMRT) and there is a guideline for prescribing it. A history of this treatment is stored in breast cancer registries. We analyzed these datasets using rules from a clinical guideline and identified cases that had not been treated according to the PMRT guideline. Data mining revealed some patterns of non-compliance with the PMRT guideline. Further analysis with data mining revealed some reasons for guideline non-compliance. These patterns were then compared with reasons acquired from manual inspection of patient records. The comparisons showed that patterns resulting from data mining were limited to the stored variables in the registry. A prerequisite for better results is availability of comprehensive datasets. Medicine can take advantage of KDD methodology in different ways. The main advantage is being able to reuse information and explore hidden knowledge that can be obtained using advanced analysis techniques. The results depend on good collaboration between medical informaticians and domain experts and the availability of high quality data.
|
188 |
Reservoir screening criteria for deep slurry injectionNadeem, Muhammad January 2005 (has links)
Deep slurry injection is a process of solid waste disposal that involves grinding the solid waste to a relatively fine-grained consistency, mixing the ground waste with water and/or other liquids to form slurry, and disposing of the slurry by pumping it down a well at a high enough pressure that fractures are created within the target formation.
This thesis describes the site assessment criteria involved in selecting a suitable target reservoir for deep slurry injection. The main goals of this study are the follows: <ul> <li>Identify the geological parameters important for a prospective injection site</li> <li>Recognize the role of each parameter</li> <li>Determine the relationships among different parameters</li> <li>Design and develop a model which can assemble all the parameters into a semi-quantitative evaluation process that could allow site ranking and elimination of sites that are not suitable</li> <li>Evaluate the model against several real slurry injection cases and several prospective cases where slurry injection may take place in future</li> </ul> The quantitative and qualitative parameters that are recognized as important for making a decision regarding a target reservoir for deep slurry injection operations are permeability, porosity, depth, areal extent, thickness, mechanical strength, and compressibility of a reservoir; thickness and flow properties of the cap rock; geographical distance between an injection well and a waste source or collection centre; and, regional and detailed structural and tectonic setup of an area. Additional factors affecting the security level of a site include the details of the lithostratigraphic column overlying the target reservoir and the presence of overlying fracture blunting horizons. Each parameter is discussed in detail to determine its role in site assessment and also its relationship with other parameters. A geological assessment model is developed and is divided into two components; a decision tree and a numerical calculation system. The decision tree deals with the most critical parameters, those that render a site unsuitable or suitable, but of unspecified quality. The numerical calculation gives a score to a prospective injection site based on the rank numbers and weighting factors for the various parameters. The score for a particular site shows its favourability for the injection operation, and allows a direct comparison with other available sites. Three categories have been defined for this purpose, i. e. average, below average, and above average. A score range of 85 to 99 of 125 places a site in the ?average? category; a site will be unsuitable for injection if it belongs to the ?below average? category, i. e. if the total score is less than 85, and the best sites will generally have scores that are in the ?above average? category, with a score of 100 or higher. One may assume that for sites that fall in the ?average? category there will have to be more detailed tests and assessments. The geological assessment model is evaluated using original geological data from North America and Indonesia for sites that already have undergone deep slurry injection operations and also for some possible prospective sites. The results obtained from the model are satisfactory as they are in agreement with the empirical observations. Areas for future work consist of the writing of a computer program for the geological model, and further evaluation of the model using original data from more areas representing more diverse geology from around the world.
|
189 |
A Programming Framework To Implement Rule-based Target Detection In ImagesSahin, Yavuz 01 December 2008 (has links) (PDF)
An expert system is useful when conventional programming techniques fall short of capturing human expert knowledge and making decisions using this information. In this study, we describe a framework for capturing expert knowledge under a decision tree form and this framework can be used for making decisions based on captured knowledge. The framework proposed in this study is generic and can be used to create domain specific expert systems for different problems. Features are created or processed by the nodes of decision tree and a final conclusion is reached for each feature. Framework supplies 3 types of nodes to construct a decision tree. First type is the decision node, which guides the search path with its answers. Second type is the operator node, which creates new features using the inputs. Last type of node is the end node, which corresponds to a conclusion about a feature. Once the nodes of the tree are developed, then user can interactively create the decision tree and run the supplied inference engine to collect the result on a specific problem. The framework proposed is experimented with two case studies / " / Airport Runway Detection in High Resolution Satellite Images" / and " / Urban Area Detection in High Resolution Satellite Images" / . In these studies linear features are used for structural decisions and Scale Invariant Feature Transform (SIFT) features are used for testing existence of man made structures.
|
190 |
Classification Analysis Techniques for Skewed ClassChyi, Yu-Meei 12 February 2003 (has links)
Abstract
Existing classification analysis techniques (e.g., decision tree induction, backpropagation neural network, k-nearest neighbor classification, etc.) generally exhibit satisfactory classification effectiveness when dealing with data with non-skewed class distribution. However, real-world applications (e.g., churn prediction and fraud detection) often involve highly skewed data in decision outcomes (e.g., 2% churners and 98% non-churners). Such a highly skewed class distribution problem, if not properly addressed, would imperil the resulting learning effectiveness and might result in a ¡§null¡¨ prediction system that simply predicts all instances as having the majority decision class as the training instances (e.g., predicting all customers as non-churners). In this study, we extended the multi-classifier class-combiner approach and proposed a clustering-based multi-classifier class-combiner technique to address the highly skewed class distribution problem in classification analysis. In addition, we proposed four distance-based methods for selecting a subset of instances having the majority decision class for lowering the degree of skewness in a data set. Using two real-world datasets (including mortality prediction for burn patients and customer loyalty prediction), empirical results suggested that the proposed clustering-based multi-classifier class-combiner technique generally outperformed the traditional multi-classifier class-combiner approach and the four distance-based methods.
Keywords: Data Mining, Classification Analysis, Skewed Class Distribution Problem, Decision Tree Induction, Multi-classifier Class-combiner Approach, Clustering-based Multi-classifier Class-combiner Approach
|
Page generated in 0.072 seconds