Global ETD Search

221	Predicting Software Defectiveness by Mining Software Repositories Kasianenko, Stanislav January 2018 (has links) One of the important aims of the continuous software development process is to localize and remove all existing program bugs as fast as possible. Such goal is highly related to software engineering and defectiveness estimation. Many big companies started to store source code in software repositories as the later grew in popularity. These repositories usually include static source code as well as detailed data for defects in software units. This allows analyzing all the data without interrupting programing process. The main problem of large, complex software is impossibility to control everything manually while the price of the error can be very high. This might result in developers missing defects on testing stage and increase of maintenance cost. The general research goal is to find a way of predicting future software defectiveness with high precision. Reducing maintenance and development costs will contribute to reduce the time-to-market and increase software quality. To address the problem of estimating residual defects an approach was found to predict residual defectiveness of a software by the means of machine learning. For a prime machine learning algorithm, a regression decision tree was chosen as a simple and reliable solution. Data for this tree is extracted from static source code repository and divided into two parts: software metrics and defect data. Software metrics are formed from static code and defect data is extracted from reported issues in the repository. In addition to already reported bugs, they are augmented with unreported bugs found on “discussions” section in repository and parsed by a natural language processor. Metrics were filtered to remove ones, that were not related to defect data by applying correlation algorithm. Remaining metrics were weighted to use the most correlated combination as a training set for the decision tree. As a result, built decision tree model allows to forecast defectiveness with 89% chance for the particular product. This experiment was conducted using GitHub repository on a Java project and predicted number of possible bugs in a single file (Java class). The experiment resulted in designed method for predicting possible defectiveness from a static code of a single big (more than 1000 files) software version. repository mining software metric correlation defect bug natural language processing Pearson coefficient Breiman’s decision tree machine learning Computer Sciences Datavetenskap (datalogi)
222	Emprego de diferentes algoritmos de árvores de decisão na classificação da atividade celular in vitro para tratamentos de superfícies de titânio Fernandes, Fabiano Rodrigues January 2017 (has links) O interesse pela área de análise e caracterização de materiais biomédicos cresce, devido a necessidade de selecionar de forma adequada, o material a ser utilizado. Dependendo das condições em que o material será submetido, a caracterização poderá abranger a avaliação de propriedades mecânicas, elétricas, bioatividade, imunogenicidade, eletrônicas, magnéticas, ópticas, químicas e térmicas. A literatura relata o emprego da técnica de árvores de decisão, utilizando os algoritmos SimpleCart(CART) e J48, para classificação de base de dados (dataset), gerada a partir de resultados de artigos científicos. Esse estudo foi realizado afim de identificar características superficiais que otimizassem a atividade celular. Para isso, avaliou-se, a partir de artigos publicados, o efeito de tratamento de superfície do titânio na atividade celular in vitro (células MC3TE-E1). Ficou constatado que, o emprego do algoritmo SimpleCart proporcionou uma melhor resposta em relação ao algoritmo J48. Nesse contexto, o presente trabalho tem como objetivo aplicar, para esse mesmo estudo, os algoritmos CHAID (Chi-square iteration automatic detection) e CHAID Exaustivo, comparando com os resultados obtidos com o emprego do algoritmo SimpleCart. A validação dos resultados, mostraram que o algoritmo CHAID Exaustivo obteve o melhor resultado em comparação ao algoritmo CHAID, obtendo uma estimativa de acerto de 75,9% contra 58,6% respectivamente, e um erro padrão de 7,9% contra 9,1% respectivamente, enquanto que, o algoritmo já testado na literatura SimpleCart(CART) teve como resultado 34,5% de estimativa de acerto com um erro padrão de 8,8%. Com relação aos tempos de execução apurados sobre 22 mil registros, evidenciaram que o algoritmo CHAID Exaustivo apresentou os melhores tempos, com ganho de 0,02 segundos sobre o algoritmo CHAID e 14,45 segundos sobre o algoritmo SimpleCart(CART). / The interest for the area of analysis and characterization of biomedical materials as the need for selecting the adequate material to be used increases. However, depending on the conditions to which materials are submitted, characterization may involve the evaluation of mechanical, electrical, optical, chemical and thermal properties besides bioactivity and immunogenicity. Literature review shows the application decision trees, using SimpleCart(CART) and J48 algorithms, to classify the dataset, which is generated from the results of scientific articles. Therefore the objective of this study was to identify surface characteristics that optimizes the cellular activity. Based on published articles, the effect of the surface treatment of titanium on the in vitro cells (MC3TE-E1 cells) was evaluated. It was found that applying SimpleCart algorithm gives better results than the J48. In this sense, the present study has the objective to apply the CHAID (Chi-square iteration automatic detection) algorithm and Exhaustive CHAID to the surveyed data, and compare the results obtained with the application of SimpleCart algorithm. The validation of the results showed that the Exhaustive CHAID obtained better results comparing to CHAID algorithm, obtaining 75.9 % of accurate estimation against 58.5%, respectively, while the standard error was 7.9% against 9.1%, respectively. Comparing the obtained results with SimpleCart(CART) results which had already been tested and presented in the literature, the results for accurate estimation was 34.5% and the standard error 8.8%. In relation to execution time found through the 22.000 registers, it showed that the algorithm Exhaustive CHAID presented the best times, with a gain of 0.02 seconds over the CHAID algorithm and 14.45 seconds over the SimpleCart(CART) algorithm. Biomateriais Titânio Tratamento de superfícies Mineração de dados Algoritmos Algorithms MC3TE-E1 Titanium TiO2 Surface treatment Exhaustive CHAID CHAID CART SimpleCart Decision tree
223	Metodologia baseada em medidas dispersas de tensão e árvores de decisão para localização de faltas em sistemas de distribuição modernos / Methodology based on dispersed voltage measures and decision trees for fault location in modern distribution systems Marcel Ayres de Araújo 06 October 2017 (has links) Nos sistemas de distribuição, a grande ramificação, radialidade, heterogeneidade, dinâmica das cargas e demais particularidades, impõem dificuldades à localização de faltas, representando um desafio permanente na busca por melhores indicadores de continuidade e confiabilidade no fornecimento de energia elétrica. A regulação incisiva dos órgãos do setor, a penetração de geração distribuída e a tendência de modernização trazida pelas redes inteligentes, demandam detalhados estudos para readequação dos sistemas elétricos a conjuntura atual. Neste contexto, esta tese propõe o desenvolvimento de uma metodologia para localização de faltas em sistemas de distribuição empregando a capacidade dos medidores inteligentes de monitoramento e de aquisição de tensão em diferentes pontos da rede elétrica. A abordagem proposta baseia-se na estimação, por ferramentas de aprendizado de máquina, das impedâncias de sequência zero e positiva entre os pontos de alocação dos medidores inteligentes e de ocorrência de falta, e do estado de sensibilização destes medidores frente a correntes de falta. Assim, calculando-se as respectivas distâncias elétricas em função das impedâncias estimadas e definidas as direções das mesmas em relação a topologia da rede, busca-se identificar o ponto ou área com maior sobreposição de distâncias elétricas como o local ou a região de maior probabilidade da falta em relação aos medidores inteligentes. Para tanto, faz-se uso combinado de ferramentas convencionais e inteligentes pela aplicação dos conceitos de análise de sistemas elétricos, diagnóstico dos desvios de tensão, e classificação de padrões por meio da técnica de aprendizado de máquina denominada Árvore de Decisão. Os resultados obtidos pela aplicação desta metodologia demonstram que o uso de informações redundantes fornecidas pelos medidores inteligentes minimiza os erros de estimação. Além disso, para a maior parte dos casos testados o erro absoluto máximo de localização da falta se concentra entre 200 m e 1000 m, o que reduz a busca pelo local de ocorrência da falta pelas equipes de manutenção da rede elétrica. / In distribution systems, the dense branching, radial pattern, heterogeneity, dynamic of the loads, and other characteristics create several difficulties in defining the fault location, representing a great challenge in the search for better continuity and reliability indicators of the electrical energy supply. The intense government regulations, the increasing use of distributed generation, and the trend towards modernization via smart grids require a detailed study in order to upgrade the current systems. In this context, this thesis proposes a methodology development for fault location in distribution systems with the use of smart meters monitors and the acquisition of voltage at different points in the electrical network. The proposed method is based on the estimation, using machine learning, of the state of awareness of smart meters across the fault currents and of the zero and positive sequence impedance between the location of these meters and of the fault occurrence. Therefore, by calculating the electrical distances as a function of the estimated impedances and defining its the direction in relation to the network topology, the point/region with the biggest superposition of the electrical distances can be assigned as the point/region with the highest probability of fault occurrence in relation to the smart probes. For this purpose, a machine learning technique named decision tree is used to apply concept analyses to the electrical systems, diagnosis of voltage deviations, and pattern recognition of the electrical systems. The results obtained by the application of this methodology demonstrate that the use of redundant information provided by the smart meters minimizes estimation errors. In addition, for most of the cases tested, the maximum absolute error of the fault location is concentrated between 200 m and 1000 m, which reduces the search for the fault location by the maintenance teams of the electrical network. Árvore de decisão Localização de faltas Medidores inteligentes Redes elétricas inteligentes Sistemas de distribuição Decision tree Distribution systems Fault location Smart grid Smart meters
224	Desenvolvimento de sistema de informação para monitoramento da esclerose múltipla Souza, Luciana Ferreira de 22 February 2017 (has links) Submitted by Viviane Lima da Cunha (viviane@biblioteca.ufpb.br) on 2017-07-06T11:26:20Z No. of bitstreams: 1 arquivototal.pdf: 2621152 bytes, checksum: e02a372dc50ca8879df71843cf79f718 (MD5) / Made available in DSpace on 2017-07-06T11:26:20Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 2621152 bytes, checksum: e02a372dc50ca8879df71843cf79f718 (MD5) Previous issue date: 2017-02-22 / In the last decades, several countries have directed their actions to the incorporation of innovative technologies, applied in the health field, with the purpose of assisting the performance of professionals and users in the promotion of care, strengthened by public policies. In this context, there is a shortage of technological resources, aimed at comprehensive and multidisciplinary care, directed to patients with Multiple Sclerosis, especially of systems that support decision-making, in the follow-up of this clients, in Reference Centers belonging to the Unified Health System (SUS), this study aims to develop an information system (software prototype) to the monitoring of clinical parameters, indicative of impairment of the functionality of individuals with Multiple Sclerosis. This is a methodological study, of the type applied, involving the production of technology, composed by the stage of the software development process followed by the step of applying the decision tree model. The development of the software prototype followed the steps of the generic software engineering process presented by Pressman, which are: Communication, planning, modeling, construction and delivery. Modeling and prototyping took place from January to September 2016, along with the construction of the prototype's working flow diagram and interfaces. The flowchart was built in the Unified Modeling Language (UML) with the aid of the JUDE tool. The system was developed in PHP (Hipertext Processor) language, which is a script open source language, widely used, and is especially suitable for web development. Therefore, was used a Framework PHP (Laravel 5.2 Open Source), the MySQL as a database technology, and for the development of the screens HTML5, CSS3 and JQUERY were used. For the application of the decision tree model, was used the variables of the 50 patients enrolled in the software, in the Waikato Environment Analyis (WEKA) program, in Version 3.8, specifically the J48 algorithm. The results showed that, although the software prototype still has a path to be covered in future studies that will converge in its validation, it has shown a satisfactory performance for the activity of registration of professionals, patients and research instruments. Regarding the generated decision tree model, this contributed to the identification of the epidemiological and clinical variables associated with the worsening of the disability and also allowed the analysis of the differences of these associations in two distinct groups of treatment of Multiple Sclerosis. Considering all phases and tests of the system, the possibility of generating an electronic registry, which provides agility in the information process and contributes to the planning of the actions to the integrated multiprofessional assistance, as well as, the proposal of application of the decision tree model in order to classify the epidemiological variables associated with worsening disability, using the Expanded Disability Status Scale (EDSS) score, it is expected that the development of this study will awaken the need for further research using decision models that gives opportunity to health teams, especially those facing the complexity of assisting individuals with chronic diseases and progressive degeneration of organic functions. / Nas últimas décadas, tem havido uma preocupação dos governos com a incorporação de tecnologias inovadoras, aplicadas no campo da saúde, com o propósito de auxiliar o desempenho de profissionais e usuários na promoção do cuidado, fortalecido por políticas públicas. Nesse contexto, evidencia-se uma escassez de recursos tecnológicos, voltados à assistência integral e multidisciplinar, direcionada a pacientes com Esclerose Múltipla, especialmente de sistemas que deem suporte a tomada de decisão, no acompanhamento dessa clientela, em Centros de Referência pertencentes ao Sistema Único de Saúde. Assim, o presente estudo tem como objetivo de desenvolver um sistema de informação (protótipo de software) para o monitoramento de parâmetros clínicos, indicativos de comprometimento da funcionalidade de indivíduos com Esclerose Múltipla. Trata-se de um estudo metodológico, do tipo aplicado, envolvendo produção de tecnologia, composto pela etapa do processo de desenvolvimento do software seguida da etapa de aplicação do modelo de árvore de decisão. O desenvolvimento do protótipo de software seguiu os passos do processo genérico de engenharia de software apresentado por Pressman. A modelagem e a prototipação ocorreram no período de janeiro a setembro de 2016, juntamente com a construção do fluxograma de funcionamento do protótipo e das interfaces. O fluxograma foi construído na linguagem unificada de modelagem com auxílio da ferramenta JUDE. O sistema foi desenvolvido em linguagem PHP (Hipertext Processor), que é uma linguagem de script open source (código aberto) de uso livre, muito utilizada, e especialmente adequada para o desenvolvimento web. Portanto, utilizou-se uma Framework PHP (Laravel 5.2 Open Source), o MySQL como tecnologia de banco de dados, e para desenvolvimento das telas usou-se o HTML5, CSS3 e JQUERY. Para a aplicação do modelo de árvore de decisão, recorreu-se as variáveis contidas no cadastro de 50 pacientes e o programa Waikato Environment Analyis, na Versão 3.8, especificamente o algoritmo J48. Os resultados apontaram que, o protótipo de software mostrou desempenho satisfatório para a funcionalidade da atividade de cadastro de profissionais, de pacientes e de instrumentos de pesquisa. Quanto ao modelo de árvore de decisão gerado, este contribuiu para a identificação das variáveis epidemiológicas e clínicas associadas à piora da incapacidade e ainda possibilitou a análise das diferenças destas associações em dois grupos distintos de tratamento da Esclerose Múltipla. Considerando todas as fases e testes do sistema, a possibilidade de gerar um registro eletrônico, que proporcione agilidade no processo da informação e que contribua para o planejamento das ações frente à assistência multiprofissional integrada, bem como, a proposta de aplicação do modelo de árvore de decisão capaz de classificar as variáveis epidemiológicas associadas a piora da incapacidade, utilizando-se o escore da Escala Expandida do Estado de Incapacidade (EDSS). Espera-se que o desenvolvimento desse estudo desperte a necessidade de outras pesquisas utilizando modelos de decisão que oportunizem às equipes de saúde, em especial aquelas que enfrentam a complexidade de assistir indivíduos com doenças crônicas e de degeneração progressiva das funções orgânicas. Tecnologia da informação Informática em saúde Esclerose Múltipla Tomada de decisão e árvore de decisão Information technology Health informatics Multiple sclerosis Decision-making and decision tree CIENCIAS DA SAUDE::SAUDE COLETIVA
225	Técnicas de Data Mining na aquisição de clientes para financiamento de Crédito Direto ao Consumidor - CDC / Data Mining Techniques to acquire new customers for financing of Consumer Credit Adriana Maria Marques da Silva 27 September 2012 (has links) O trabalho busca dissertar sobre as técnicas de data mining mais difundidas: regressão logística, árvore de decisão e rede neural, além de avaliar se tais técnicas oferecem ganhos financeiros para instituições privadas que contam com processos ativos de conquista de clientes. Uma empresa do setor financeiro será utilizada como objeto de estudo, especificamente nos seus processos de aquisição de novos clientes para adesão do Crédito Direto ao Consumidor (CDC). Serão mostrados os resultados da aplicação nas três técnicas mencionadas, para que seja possível verificar se o emprego de modelos estatísticos discriminam os clientes potenciais mais propensos dos menos propensos à adesão do CDC e, então, verificar se tal ação impulsiona na obtenção de ganhos financeiros. Esses ganhos poderão vir mediante redução dos custos de marketing abordando-se somente os clientes com maiores probabilidades de responderem positivamente à campanha. O trabalho apresentará o funcionamento de cada técnica teoricamente, e conforme os resultados indicam, data mining é uma grande oportunidade para ganhos financeiros em uma empresa. / The paper intends to discourse about most widespread data mining techniques: logistic regression, decision tree and neural network, and assess whether these techniques provide financial gains for private institutions that have active processes for business development. A company of the financial sector is used as object of study, specifically in the processes of acquiring new customers for adhesion to consumer credit (in Brazil CDC). This research will show the results of the three above mentioned techniques, to check whether the statistical models point out relevant differences between prospects´ intentions to adhere to consumer credit. In the meantime, the techniques are checked whether they leverage financial gain. These gains are expected to came from better focused and directed marketing efforts. The paper presents the operation of each technique theoretically, and as the results indicate, data mining is a great opportunity for a company boost profits. Árvore de decisão Crédito direto ao consumidor Financiamento Mineração de dados Redes neurais Regressão logística CDC Data Mining Decision Tree Logistic Regression Neural Network
226	Mineração de dados climaticos para previsão local de geada e deficiencia hidrica / Data mining climatic for frost and deficit hidric forescast Bucene, Luciana Corpas, 1974- 12 August 2018 (has links) Orientadores: Luiz Henrique Antunes Rodrigues, Eduardo Delgado Assad / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Agricola / Made available in DSpace on 2018-08-12T21:35:45Z (GMT). No. of bitstreams: 1 Bucene_LucianaCorpas_D.pdf: 2595416 bytes, checksum: 86c930f5cf0a3ca7ba03de8acb811ea8 (MD5) Previous issue date: 2008 / Resumo: As perdas que ocorrem na agricultura são grandes, devido, principalmente, à ocorrência de sinistros climáticos que ocorrem nas plantações. Muitas vezes, os impactos social e econômico causados pelos danos são significativos, uma vez que envolvem fatores como a produção e o preço de alimentos. Como exemplos, têm-se a produção de café e a de cana-de-açúcar no Estado de São Paulo, que sofrem alternâncias motivadas por eventos climáticos adversos e, em especial, as geadas e as secas, que reduzem drasticamente as produções. Neste sentido, este estudo propõe identificar relações entre parâmetros climáticos, como temperatura máxima, temperatura mínima, precipitação, entre outros atributos, visando descobrir eventuais novos conhecimentos, a partir do comportamento conhecido dos atributos climáticos já ocorridos no passado, com o propósito de desenvolver a previsão local de geada e a previsão de deficiência hídrica. Para isso, foram aplicadas técnicas de descoberta de conhecimento em grandes bancos de dados climáticos. Utilizaram-se as ferramentas WEKA e o DISCOVER, que foram consideradas satisfatórias, uma vez que os objetivos propostos foram atingidos. As bases de dados disponíveis atenderam a necessidade para a realização do projeto, apresentando um volume de dados e atributos suficientes para que pudesse gerar resultados para a previsão local de geada e de deficiência hídrica. Referente aos resultados, com até 1 dia de antecedência à geada, o modelo gerado foi considerado confiável. A partir de 2 dias de antecedência à geada, os resultados encontrados apresentam uma diminuição no grau de acerto quanto mais distante estiver de acontecer o evento geada. Para o caso deficiência hídrica, os resultados encontrados foram diferenciados conforme a classe. Para a classe não, com 1dia até 15 dias de antecedência ao evento, o grau de acerto foi alto e aceitável. A classe forte, em seguida à classe não, é a que apresenta melhores resultados de acerto, decaindo para as outras classes. Até 3 dias de antecedência ao evento deficiência hídrica e, dependendo do mês, o grau de acerto é aceitável. De 4 dias em diante, os resultados mostram que o modelo gerado não é aceitável / Abstract: The losses that occur in agriculture are high, mainly due to the occurrence of crop damages due to climatic events. Many times, the social and economic impacts caused by the damages are significant, since they involve factors such as the production and the price of foods. For example, coffee and sugarcane production in São Paulo State suffer alternations motivated by adverse climatic events and, in special, frost and drought, that greatly reduce the production. The purpose of this study is to identify relationships between climatic parameters, such as maximum temperature, minimum temperature, precipitation, etc., in order to discover eventual new knowledge, from known behavior of the climatic attributes already occurred in the past, with the objective of developing local frost and deficit water forecast models. To achieve this, data mining techniques were applied to climatic data bases. WEKA and the DISCOVER tools had been used and considered satisfactory, since they reached the objectives. The available databases were suitable for the accomplishment of the project, presenting enough volume of data and attributes so that it could generate results for the frost and water deficit forecast. Concerning to the results, with up to 1 day of antecedence to the frost, the generated model was considered trustworthy. From 2 days of antecedence to the frost the results present a reduction in the accuracy. For water deficit, results were differentiated, depending on the class. For the not class, from 1 to 15 days of antecedence to the event, the accuracy was high and acceptable. The strong class, following the not class, is the one that presents better results, falling down for the other classes. Up to 3 days of antecedence to the event water deficit and, depending on the month, the accuracy is acceptable. For 4 days or more in advance, the results showed that the generated model is not acceptable / Doutorado / Doutor em Engenharia Agrícola Agricultura - Fatores climaticos Agricultura - Previsão Meteorologia agricola Inteligência artificial Aprendizado de máquina Árvores de decisão Artificial intelligence Intelligence sytems Decision tree Climatic alert
227	Ferramenta computacional para apoio ao gerenciamento e à classificação de sementes de soja submetidas ao teste de tetrazólio / Computing tool to support management and classification of soy seeds submitted to tetrazolium test Rocha, Davi Marcondes 07 December 2016 (has links) Submitted by Neusa Fagundes (neusa.fagundes@unioeste.br) on 2017-09-25T14:47:50Z No. of bitstreams: 1 Davi_Rocha2017.pdf: 3573661 bytes, checksum: 8912d0785316cee5fdd46712b6f23d78 (MD5) / Made available in DSpace on 2017-09-25T14:47:50Z (GMT). No. of bitstreams: 1 Davi_Rocha2017.pdf: 3573661 bytes, checksum: 8912d0785316cee5fdd46712b6f23d78 (MD5) Previous issue date: 2016-12-07 / Fundação Araucária de Apoio ao Desenvolvimento Científico e Tecnológico do Estado do Paraná (FA) / Production and use of high quality seeds are important factors for the soybean farming. Therefore the quality control system in the seed industry must be reliable, accurate and fast. Seed technology research has been striving to develop or improve tests to enable seed quality evaluation. Tetrazolium test, besides evaluating the viability and vigor of the seeds, provides information about the potencial causing agents of quality reduction. Even though not using expensive instruments and reagents, the test requires a well-trained seed analyst, and the test’s accuracy depends on their knowledge about the all involved techniques and procedures, including the subjectivity of the observer. Therefore, the objective of the present research was to develop a computational tool that could minimize the implicit subjectivity in the test, contributing to increase information credibility and ensure the accuracy results. This tool allows, by tetrazolium test images, to identify seeds damage, as well as their location and extension, making the interpretation less subjective. From the feature extraction data in digital images of tetrazolium test, supervised classification algorithms were applied to do segmentation in the images, generating a classified image. The proposed system was tested using a selection of samples to training the classifier model and, from this model, the images classification of the tetrazolium test, to extract information about the seeds damage. The system allowed, in addition to an easier way for damages identification in the tetrazolium test images, the extraction of accurate information on displayed damage and achieve the control of the analyzed samples. The classifier performed the assignment of the predetermined categories efficiently for non-present data training set, with 96.6% of correctly classified instances and Kappa index of 0.95%, making the system a supplementary tool in decision making for the tetrazolium test. / A produção e a utilização de sementes de alta qualidade são fatores de importância para o cultivo da soja. Para isso, o sistema de controle de qualidade na indústria de sementes deve ser confiável, preciso e rápido. A pesquisa em tecnologia de sementes tem se esforçado em desenvolver ou aprimorar testes que possibilitem a avaliação da qualidade das sementes. O teste de tetrazólio, além de avaliar a viabilidade e o vigor de sementes, fornece informações sobre possíveis agentes causadores da redução de sua qualidade. Embora não se utilize de instrumentos e reagentes caros, o teste requer um analista de sementes bem treinado, sendo que a precisão do mesmo depende do conhecimento de todas as técnicas e procedimentos envolvidos, devendo-se considerar a subjetividade do observador. Sendo assim, o objetivo desta pesquisa foi desenvolver uma ferramenta computacional que minimizasse a subjetividade implícita na realização do teste, contribuindo para gerar maior credibilidade nas informações e garantindo precisão nos resultados. Esta ferramenta permite, a partir de imagens do teste de tetrazólio, realizar a identificação dos danos presentes nas sementes, bem como sua localização e sua extensão nos tecidos, tornando a interpretação menos subjetiva. A partir da extração de dados de características das imagens digitais do teste de tetrazólio, foram aplicados algoritmos de classificação supervisionada para realizar a segmentação destas imagens, produzindo uma imagem classificada. O sistema proposto foi testado utilizando a seleção de amostras para treino do modelo classificador e, a partir deste modelo, a classificação das imagens do teste de tetrazólio, para extração de informações sobre os danos verificados nas sementes. O sistema permitiu, além da identificação dos danos nas imagens do teste de tetrazólio de forma facilitada, a extração de informações mais seguras sobre os danos presentes e realizar o controle das amostras analisadas. O classificador realizou a atribuição das classes predeterminadas de forma eficiente para dados não presentes no conjunto de treinamento, com 96,6% de instâncias classificadas corretamente e Índice Kappa de 0,95%, tornando o sistema uma ferramenta suplementar na tomada de decisão para o teste de tetrazólio. Qualidade de sementes Vigor de sementes Viabilidade de sementes Reconhecimento de padrões Árvore de decisão Classificação supervisionada Seed quality Seed vigor Seed viability Pattern recognition Decision tree Supervised classification CIENCIAS AGRARIAS::ENGENHARIA AGRICOLA
228	Análise de crédito com segmentação da carteira, modelos de análise discriminante, regressão logística e classification and regression trees (CART) / Análise de crédito com segmentação da carteira, modelos de análise discriminante, regressão logística e classification and regression trees (CART) Santos, Ernani Possato dos 14 August 2015 (has links) Made available in DSpace on 2016-03-15T19:32:56Z (GMT). No. of bitstreams: 1 Ernani Possato dos Santosprot.pdf: 2286270 bytes, checksum: 96bb14c147c5baa96f3ae6ca868056d6 (MD5) Previous issue date: 2015-08-14 / The credit claims to be one of the most important tools to trigger and move the economic wheel. Once it is well used it will bring benefits on a large scale to society; although if it is used without any balance it might bring loss to the banks, companies, to governments and also to the population. In relation to this context it becomes fundamental to evaluate models of credit capable of anticipating processses of default with an adequate degree of accuracy so as to avoid or at least to reduce the risk of credit. This study also aims to evaluate three credit risk models, being two parametric models, discriminating analysis and logistic regression, and one non-parametric, decision tree, aiming to check the accuracy of them, before and after the segmentation of such sample through the criteria of costumer s size. This research relates to an applied study about Industry BASE. / O crédito se configura em uma das mais importantes ferramentas para alavancar negócios e girar a roda da economia. Se bem utilizado, trará benefícios em larga escala à sociedade, porém, se utilizado sem equilíbrio, poderá trazer prejuízos, também em larga escala, a bancos, a empresas, aos governos e aos cidadãos. Em função deste contexto, é precípuo avaliar modelos de crédito capazes de prever, com grau adequado de acurácia, processos de default, a fim de se evitar ou, pelo menos, reduzir o risco de crédito. Este estudo tem como finalidade avaliar três modelos de análise do risco de crédito, sendo dois modelos paramétricos, análise discriminante e regressão logística, e um não-paramétrico, árvore de decisão, em que se avaliou a acurácia destes modelos, antes e após a segmentação da amostra desta pesquisa por meio do critério de porte dos clientes. Esta pesquisa se refere a um estudo aplicado sobre a Indústria BASE. crédito risco de crédito análise discriminante regressão logística árvore de decisão segmentação credit credit risk discriminating analysis logistic regression decision tree segmentation
229	Análise inteligente de dados em um banco de dados de procedimentos em cardiologia intervencionista / Intelligent data analysis in an interventional cardiology procedures database Cantídio de Moura Campos Neto 02 August 2016 (has links) O tema deste estudo abrange duas áreas do conhecimento: a Medicina e a Ciência da Computação. Consiste na aplicação do processo de descoberta de conhecimento em base de Dados (KDD - Knowledge Discovery in Databases), a um banco de dados real na área médica denominado Registro Desire. O Registro Desire é o registro mais longevo da cardiologia intervencionista mundial, unicêntrico e acompanha por mais de 13 anos 5.614 pacientes revascularizados unicamente pelo implante de stents farmacológicos. O objetivo é criar por meio desta técnica um modelo que seja descritivo e classifique os pacientes quanto ao risco de ocorrência de eventos cardíacos adversos maiores e indesejáveis, e avaliar objetivamente seu desempenho. Posteriormente, apresentar as regras extraídas deste modelo aos usuários para avaliar o grau de novidade e de concordância do seu conteúdo com o conhecimento dos especialistas. Foram criados modelos simbólicos de classificação pelas técnicas da árvore de decisão e regras de classificação utilizando para a etapa de mineração de dados os algoritmos C4.5, Ripper e CN2, em que o atributo-classe foi a ocorrência ou não do evento cardíaco adverso. Por se tratar de uma classificação binária, os modelos foram avaliados objetivamente pelas métricas associadas à matriz de confusão como acurácia, sensibilidade, área sob a curva ROC e outras. O algoritmo de mineração processa automaticamente todos os atributos de cada paciente exaustivamente para identificar aqueles fortemente associados com o atributo-classe (evento cardíaco) e que irão compor as regras. Foram extraídas as principais regras destes modelos de modo indireto, por meio da árvore de decisão ou diretamente pela regra de classificação, que apresentaram as variáveis mais influentes e preditoras segundo o algoritmo de mineração. Os modelos permitiram entender melhor o domínio de aplicação, relacionando a influência de detalhes da rotina e as situações associadas ao procedimento médico. Pelo modelo, foi possível analisar as probabilidades da ocorrência e da não ocorrência de eventos em diversas situações. Os modelos induzidos seguiram uma lógica de interpretação dos dados e dos fatos com a participação do especialista do domínio. Foram geradas 32 regras das quais três foram rejeitadas, 20 foram regras esperadas e sem novidade, e 9 foram consideradas regras não tão esperadas, mas que tiveram grau de concordância maior ou igual a 50%, o que as tornam candidatas à investigação para avaliar sua eventual importância. Tais modelos podem ser atualizados ao aplicar novamente o algoritmo de mineração ao banco com os dados mais recentes. O potencial dos modelos simbólicos e interpretáveis é grande na Medicina quando aliado à experiência do profissional, contribuindo para a Medicina baseada em evidência. / The main subject of this study comprehends two areas of knowledge, the Medical and Computer Science areas. Its purpose is to apply the Knowledge Discovery Database-KDD to the DESIRE Registry, an actual Database in Medical area. The DESIRE Registry is the oldest world\'s registry in interventional cardiology, is unicentric, which has been following up 5.614 resvascularized patients for more then 13 years, solely with pharmacological stent implants. The goal is to create a model using this technique that is meaningful to classify patients as the risk of major adverse cardiac events (MACE) and objectively evaluate their performance. Later present rules drawn from this model to the users to assess the degree of novelty and compliance of their content with the knowledge of experts. Symbolic classification models were created using decision tree model, and classification rules using for data mining step the C4.5 algorithms, Ripper and CN2 where the class attribute is the presence or absence of a MACE. As the classification is binary, the models where objectively evaluated by metrics associated to the Confusion Matrix, such as accuracy, sensitivity, area under the ROC curve among others. The data mining algorithm automatically processes the attributes of each patient, who are thoroughly tested in order to identify the most predictive to the class attribute (MACE), whom the rules will be based on. Indirectly, using decision tree, or directly, using the classification rules, the main rules of these models were extracted to show the more predictable and influential variables according to the mining algorithm. The models allowed better understand the application range, creating a link between the influence of the routine details and situations related to the medical procedures. The model made possible to analyse the probability of occurrence or not of events in different situations. The induction of the models followed an interpretation of the data and facts with the participation of the domain expert. Were generated 32 rules of which only three were rejected, 20 of them were expected rules and without novelty and 9 were considered rules not as expected but with a degree of agreement higher or equal 50%, which became candidates for an investigation to assess their possible importance. These models can be easily updated by reapplying the mining process to the database with the most recent data. There is a great potential of the interpretable symbolic models when they are associated with professional background, contributing to evidence-based medicine. Árvores de decisão Cardiologia Doença das Coronárias Mineração de dados Stents Artificial intelligence C4.5 Cardiology Coronary disease Data mining Database Decision tree KDD
230	Decision tree learning for intelligent mobile robot navigation Shah Hamzei, G. Hossein January 1998 (has links) The replication of human intelligence, learning and reasoning by means of computer algorithms is termed Artificial Intelligence (Al) and the interaction of such algorithms with the physical world can be achieved using robotics. The work described in this thesis investigates the applications of concept learning (an approach which takes its inspiration from biological motivations and from survival instincts in particular) to robot control and path planning. The methodology of concept learning has been applied using learning decision trees (DTs) which induce domain knowledge from a finite set of training vectors which in turn describe systematically a physical entity and are used to train a robot to learn new concepts and to adapt its behaviour. To achieve behaviour learning, this work introduces the novel approach of hierarchical learning and knowledge decomposition to the frame of the reactive robot architecture. Following the analogy with survival instincts, the robot is first taught how to survive in very simple and homogeneous environments, namely a world without any disturbances or any kind of "hostility". Once this simple behaviour, named a primitive, has been established, the robot is trained to adapt new knowledge to cope with increasingly complex environments by adding further worlds to its existing knowledge. The repertoire of the robot behaviours in the form of symbolic knowledge is retained in a hierarchy of clustered decision trees (DTs) accommodating a number of primitives. To classify robot perceptions, control rules are synthesised using symbolic knowledge derived from searching the hierarchy of DTs. A second novel concept is introduced, namely that of multi-dimensional fuzzy associative memories (MDFAMs). These are clustered fuzzy decision trees (FDTs) which are trained locally and accommodate specific perceptual knowledge. Fuzzy logic is incorporated to deal with inherent noise in sensory data and to merge conflicting behaviours of the DTs. In this thesis, the feasibility of the developed techniques is illustrated in the robot applications, their benefits and drawbacks are discussed. 629.8

Search results