Global ETD Search

151	Mapeamento digital de solos e o mapa de solos como ferramenta para classificação de aptidão de uso das terras / Digital soil mapping and soil map as a tool for classification of land suitability Höfig, Pedro January 2014 (has links) No Brasil, a execução de mapeamento de solos em todo o território nacional é uma demanda permanente das instituições de pesquisa e por órgãos de planejamento, dado que é uma importante ferramenta para o planejamento da ocupação racional das terras. O Mapeamento Digital de Solo (MDS) surge como alternativa para aumentar a viabilidade de execução de levantamentos de solos, utilizando-se de informações relacionadas ao relevo para mapear os solos. Este estudo objetiva testar metodologias de MDS com extrapolação para área fisiografimente semelhante e reclassificar o mapa pedológico gerado por MDS para criar um mapa de aptidão agrícola das terras e compará-lo com o mapa interpretativo gerado a partir do mapa convencional. Tendo em vista a escassez de dados existentes na Encosta do Sudeste do Rio Grande do Sul, o trabalho foi realizado em Sentinela do Sul e Cerro Grande do Sul. O MDS usou como modelos preditores um modelo geral de árvore de decisão (AD), testando-se um modelo para toda área e também o uso conjunto de dois modelos de predição. Uma vez que o MDS mapeia normalmente classes e propriedades dos solos e que desconhece-se o uso de tal técnica para gerar mapas de aptidão agrícola das terras, parte-se da hipótese que estes mapas possam ser criados a partir da reclassificação do mapa de solos gerados por MDS. O uso de modelos conjuntos de AD gerou modelos com mais acertos e maior capacidade de reprodução do mapa convencional de solos. A extrapolação para o município de Cerro Grande do Sul se mostrou eficiente. Ao classificar a aptidão agrícola das terras, a concordância entre o mapa convencional e os mapas preditos foi maior do que a concordância entre os mapas de solos. / In Brazil, the implementation of soil mapping throughout the national territory is a constant demand of research institutions and planning organs, as it is an important tool for rational planning of land occupation. Digital Soil Mapping (DSM) is an alternative to increase the viability of the soil survey because plots the information based on the relief to draw the soil map. This study aims to test methodologies DSM applied to similar landscapes areas. It also aims to reclassify the pedological map generated by DSM to create a new land suitability classes map and compare it with the land suitability classes map generated from conventional maps. The study was conducted in South Sentinel and Cerro Grande do Sul considering the lack of data in that area. The MDS was generated using a global model of decision tree (DT) for the entire area and combined with the use of two predictive models. The use of DSM to land suitability classes map is unknown. Perhaps interpretive maps created from the reclassification of DSM can produce more accurate maps than the predictor model would generate of the pedological map. The use of set models of DT created models with greater hits and higher reproductive capacity of the conventional map. The extrapolation to Cerro Grande do Sul was efficient . The DSM was more efficient to classify land suitability classes than to classify pedological maps, but this system of land sutability needs adjustments to reflect the local reality. Aptidão agrícola Mapeamento digital Uso da terra Classificacao do solo Sentinela do Sul (RS) Cerro Grande do Sul (RS) Decision trees Soil survey
152	Análise de dados de bases de honeypots: estatística descritiva e regras de IDS Ferreira, Pedro Henrique Matheus da Costa 04 March 2015 (has links) Made available in DSpace on 2016-03-15T19:37:56Z (GMT). No. of bitstreams: 1 PEDRO HENRIQUE MATHEUS DA COSTA FERREIRA.pdf: 2465586 bytes, checksum: c81a1527d816aeb0b216330fd4267b93 (MD5) Previous issue date: 2015-03-04 / Fundação de Amparo a Pesquisa do Estado de São Paulo / A honeypot is a computer security system dedicated to being probed, attacked or compromised. The information collected help in the identification of threats to computer network assets. When probed, attacked and compromised the honeypot receives a sequence of commands that are mainly intended to exploit a vulnerability of the emulated systems. This work uses data collected by honeypots to create rules and signatures for intrusion detection systems. The rules are extracted from decision trees constructed from the data sets of real honeypots. The results of experiments performed with four databases, both public and private, showed that the extraction of rules for an intrusion detection system is possible using data mining techniques, particularly decision trees. The technique pointed out similarities between the data sets, even the collection occurring in places and periods of different times. In addition to the rules obtained, the technique allows the analyst to identify problems quickly and visually, facilitating the analysis process. / Um honeypot é um sistema computacional de segurança dedicado a ser sondado, atacado ou comprometido. As informações coletadas auxiliam na identificação de ameaças computacionais aos ativos de rede. Ao ser sondado, atacado e comprometido o honeypot recebe uma sequência de comandos que têm como principal objetivo explorar uma vulnerabilidade dos sistemas emulados. Este trabalho faz uso dos dados coletados por honeypots para a criação de regras e assinaturas para sistemas de detecção de intrusão. As regras são extraídas de árvores de decisão construídas a partir dos conjuntos de dados de um honeypot real. Os resultados dos experimentos realizados com quatro bases de dados, duas públicas e duas privadas, mostraram que é possível a extração de regras para um sistema de detecção de intrusão utilizando técnicas de mineração de dados, em particular as árvores de decisão. A técnica empregada apontou similaridades entre os conjuntos de dados, mesmo a coleta ocorrendo em locais e períodos de tempos distintos. Além das regras obtidas, a técnica permite ao analista identificar problemas existentes de forma rápida e visual, facilitando o processo de análise. honeypot dionaea mineração de dados IDS árvores de decisão honeypot dionaea data mining IDS decision trees CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
153	Desenvolvimento de um modelo de predição clínica para infecção-colonização por bactérias multidroga resistentes em um hospital geral / Development of a clinical prediction model for infection or colonization with multidrug-resistant bacteria in a general hospital Nascimento, Paulo Victor Fernandes Souza, 1964- 20 February 2013 (has links) Orientador: Paulo Roberto de Madureira / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Ciências Médicas / Made available in DSpace on 2018-08-22T21:12:11Z (GMT). No. of bitstreams: 1 Nascimento_PauloVictorFernandesSouza_D.pdf: 2007565 bytes, checksum: e01eda41bcffc893c044e9ae75f35532 (MD5) Previous issue date: 2013 / Resumo: As infecções relacionadas à assistência à saúde são responsáveis pela elevação do custo assistencial, aumento da morbimortalidade hospitalar e aumento do tempo de internação. Uma característica peculiar dessas infecções diz respeito à resistência dos microrganismos envolvidos. Protocolos de tratamento de infecções graves, como pneumonia e sepse, indicam o uso inicial de associações antimicrobianas de largo espectro, caso o paciente apresente fatores de risco para resistência. Posteriormente, com o resultado das culturas, o esquema terapêutico inicial seria readequado. Entretanto, esse processo conhecido como "descalonamento" ocorre de forma infrequente. Assim, no momento da escolha inicial dos antimicrobianos para o tratamento de síndromes infecciosas graves, os profissionais se deparam com um dilema: Utilizar um esquema de amplo espectro para a maior proteção do paciente, mas que raramente será revisto e contribuir para o aumento da resistência da microbiota hospitalar, ou tentar o uso de esquemas menos abrangentes? Com o objetivo de auxiliar o médico nesse momento da prescrição, procurou-se identificar possíveis características dos pacientes que pudessem servir como fatores preditores para infecção ou colonização para microrganismos multirresistentes. Em um hospital geral de 90 leitos, na cidade de São José dos Campos, no Estado de São Paulo, Brasil, foi conduzido um estudo de caso-coorte, entre junho de 2009 e junho de 2011, em que todos os pacientes que realizaram pelo menos um exame de cultura foram incluídos (753 pacientes). Os casos foram definidos como todos os pacientes que apresentaram culturas clínicas com o isolamento de pelo menos um microrganismo multirresistente (146 pacientes). A multirresistência foi definida conforme o consenso do Centro de Controle de Infecções e Doenças dos Estados Unidos da América em associação com o Centro Europeu para Prevenção e Controle de Doenças. Os controles foram todos os pacientes que se submeteram a culturas as quais não demonstraram crescimento de um agente multirresistente. Foram avaliadas quatorze variáveis demográficas e clínicas, comumente identificadas como fatores de risco. Foram construídos três modelos de predição clínica: regressão logística, árvore de classificação e floresta aleatória. No modelo de regressão logística, em função de intensa colinearidade, optou-se pela eliminação das variáveis pelo método backward. Na validação interna deste modelo, foi utilizada a técnica de reamostragem por bootstrap. O novo modelo foi calibrado com um fator de shrinkage de 0,91. Os modelos de árvore de classificação e floresta aleatória identificaram, de maneira semelhante, as variáveis mais importantes para predição que foram: história de internação nos últimos 180 dias, tempo de internação até a realização da cultura, Índice de comorbidades de Charlson, presença de cateter nasoentérico, traqueostomia e cateter venoso central. Foi realizada a validação externa temporal com uma nova amostra coletada entre julho e dezembro de 2011, num total de 342 pacientes. As acurácias dos modelos de regressão logística, árvore de classificação e floresta aleatória foram avaliadas por curvas ROC (Receiver operating characteristic). As áreas sobre a curva foram respectivamente: 72,4%, 66,2% e 69,2%. O modelo final da regressão logística com o total de pacientes estudados (1092) apresentou uma área sob a curva ROC corrigida do otimismo de 77,1% / Abstract: Healthcare-associated infections are responsible for rising health care costs, increasing morbidity, mortality, and longer hospital stays. A peculiar characteristic of such infections is the resistance of the involved microorganisms. The presence of infectious agents resistant to multiple classes of antimicrobials is increasing in such infections. Thus, multidrug resistance brings a real challenge to everyday clinical practice. Protocols for treatment of severe infections such as pneumonia and sepsis indicate the use of broad-spectrum antimicrobial associations as the initial therapy if the patient has a risk factor for resistance. Later, with the result of cultures, an adjustment of the initial therapeutic regimen would be expected. However, this process, known as de-escalation, occurs infrequently. Thus, at the moment of choosing the initial antibiotics for treating serious infectious syndromes, physicians are challenged with a dilemma: either to prescribe broad-spectrum antibiotics and contribute to increasing antibiotic resistance or to use a narrow spectrum of antimicrobials and put patients' prognosis at risk. The aim of this study was to identify potential predictors for the harboring of multidrug-resistant bacteria and to build a clinical prediction model that could help physicians to recognize patients with different risks for infection or colonization by these microorganisms. We conducted a case-cohort study in a 90-bed general hospital, at São José dos Campos, São Paulo State, Brazil, with all patients that performed at least one culture (753 patients). Cases were defined as patients that had had a culture demonstrating a multi-resistant agent (146 patients). Controls were all other patients that had had at least one culture. The consensus definition from the Center for Disease Control and the European Centre for Disease Prevention and Control was used to describe antibiotic multi-resistance. Fourteen traditional risk factors were evaluated as predictors. We constructed three clinical prediction models: logistical regression, classification tree, and random forest. In the logistical regression model, due to severe collinearity, we chose to eliminate variables by the backward method. In this model, for internal validation, we used the bootstrap resampling procedure. The new model was calibrated with the use of a shrinkage factor of 0.91. Similarly, the classification tree and random forest models identified that the most important variables for prediction were: admission history of 180 days, tube feeding, and length of hospital stay before culture, Charlson comorbidity index, central venous catheter, and tracheostomy. A temporal external validation was performed with a new sample collected between July and December 2011, with 342 patients. The accuracies of logistic regression, classification tree and random forest models were evaluated by ROC (Receiver operating characteristic) curves. The areas under the curve were 72.4%, 66.2% and 69.2%, respectively. The final logistical regression model with the overall study population (1092 patients) is described and shows an optimism-corrected area under the ROC curve of 77.1% / Doutorado / Epidemiologia / Doutor em Saude Coletiva Infecção hospitalar Resistência microbiana a medicamentos Previsões Modelos logísticos Árvores de decisões Cross infection Drug resistance, Microbial Forecasting Logistic models Decision trees
154	Využití statistických metod při oceňování nemovitostí / Valuation of real estates using statistical methods Funiok, Ondřej January 2017 (has links) The thesis deals with the valuation of real estates in the Czech Republic using statistical methods. The work focuses on a complex task based on data from an advertising web portal. The aim of the thesis is to create a prototype of the statistical predication model of the residential properties valuation in Prague and to further evaluate the dissemination of its possibilities. The structure of the work is conceived according to the CRISP-DM methodology. On the pre-processed data are tested the methods regression trees and random forests, which are used to predict the price of real estate.
155	Continuous cast width prediction using a data mining approach De Beer, Petrus Gerhardus 02 November 2007 (has links) In modern times continuous casting is the preferred way to convert molten steel into solid forms to enable further processing. At Columbus Stainless the continuous casting machine cast slabs of constant thickness with varying width. One important aspect of the continuously cast strand that must be controlled, is the strand width. The strand width exiting from the casting machine, has a direct influence on the product yield which in turn influences the profitability of the company. In general, the strand width control on the austentic and ferritic type steels achieved is excellent with the exception of the 12% chrome non stabilised ferritic steel. This steel type exhibited different strand width changes when a sequence of different heats was cast. The strand width changes corresponded to the different heats in the sequence. Each heat has a unique chemistry and a relationship between the austenite and ferrite fraction at high temperature and the resulting strand width change was explained by Siyasiya[27]. The relationship between the heat composition and width change has in the past resulted in the development of a model that enabled the prediction of the expected width change of a specific heat before it is cast to enable preventative action to be taken. This model has been implemented as an on-line prediction model in the production environment with very encouraging results. This study was initiated because it was uncertain if the implemented model was the most accurate for this application. This study is concerned with the development of more models based on different techniques in an attempt to implement a more accurate model. The data mining techniques used include statistical regression, decision trees and fuzzy logic. The results indicated that the existing model was the most accurate and it could not be improved upon. / Dissertation (MEng (Mechanical Engineering))--University of Pretoria, 2007. / Mechanical and Aeronautical Engineering / MEng / unrestricted Stainless steel Continuous casting Statistical regression Decision trees Fuzzy logic Rule based model Width change Strand width control UCTD
156	Modeling and Survival Analysis of Breast Cancer: A Statistical, Artificial Neural Network, and Decision Tree Approach Mudunuru, Venkateswara Rao 26 March 2016 (has links) Survival analysis today is widely implemented in the fields of medical and biological sciences, social sciences, econometrics, and engineering. The basic principle behind the survival analysis implies to a statistical approach designed to take into account the amount of time utilized for a study period, or the study of time between entry into observation and a subsequent event. The event of interest pertains to death and the analysis consists of following the subject until death. Events or outcomes are defined by a transition from one discrete state to another at an instantaneous moment in time. In the recent years, research in the area of survival analysis has increased greatly because of its large usage in areas related to bio sciences and the pharmaceutical studies. After identifying the probability density function that best characterizes the tumors and survival times of breast cancer women, one purpose of this research is to compare the efficiency between competing estimators of the survival function. Our study includes evaluation of parametric, semi-parametric and nonparametric analysis of probability survival models. Artificial Neural Networks (ANNs), recently applied to a number of clinical, business, forecasting, time series prediction, and other applications, are computational systems consisting of artificial neurons called nodes arranged in different layers with interconnecting links. The main interest in neural networks comes from their ability to approximate complex nonlinear functions. Among the available wide range of neural networks, most research is concentrated around feed forward neural networks called Multi-layer perceptrons (MLPs). One of the important components of an artificial neural network (ANN) is the activation function. This work discusses properties of activation functions in multilayer neural networks applied to breast cancer stage classification. There are a number of common activation functions in use with ANNs. The main objective in this work is to compare and analyze the performance of MLPs which has back-propagation algorithm using various activation functions for the neurons of hidden and output layers to evaluate their performance on the stage classification of breast cancer data. Survival analysis can be considered a classification problem in which the application of machine-learning methods is appropriate. By establishing meaningful intervals of time according to a particular situation, survival analysis can easily be seen as a classification problem. Survival analysis methods deals with waiting time, i.e. time till occurrence of an event. Commonly used method to classify this sort of data is logistic regression. Sometimes, the underlying assumptions of the model are not true. In model building, choosing an appropriate model depends on complexity and the characteristics of the data that affect the appropriateness of the model. Two such strategies, which are used nowadays frequently, are artificial neural network (ANN) and decision trees (DT), which needs a minimal assumption. DT and ANNs are widely used methodological tools based on nonlinear models. They provide a better prediction and classification results than the traditional methodologies such as logistic regression. This study aimed to compare predictions of the ANN, DT and logistic models by breast cancer survival. In this work our goal is to design models using both artificial neural networks and logistic regression that can precisely predict the output (survival) of breast cancer patients. Finally we compare the performances of these models using receiver operating characteristic (ROC) analysis. Statistical Modeling Survival Analysis Parametric Analysis Probability Distribution Decision Trees Artificial Neural Networks Classification. Biostatistics Computer Sciences Statistics and Probability
157	Faktory ovlivňující spokojenost doktorandů se zázemím pro studium / Factors influencing the satisfaction with facilities for PhD studies Paul, Miroslav January 2016 (has links) This diploma thesis deals with the satisfaction of PhD students with facilities for the study by means of data gained from DOKTORANDI 2014 survey. The aim of the thesis is to identify factors that influence the satisfaction with facilities for PhD studies and finding similarities among different fields of studies according to satisfaction with facilities. The first part of this thesis contains a description of higher education with a focus on PhD programs and a description of statistical methods that are subsequently used in analytical part and a description of DOKTORANDI 2014 survey. The analytical part aims to answer the questions which factors affect the PhD students´ satisfaction with facilities for study using logistic regression and decision trees. Further it tries to determine the satisfaction similarities of PhD study fields with facilities for studying using cluster analysis.
158	Využití metod manažerského rozhodování při zakládání nového podniku na trhu / Use of Methods of Managerial Decision-Making in foundation of the new enterprise on the market Oberhel, Martin January 2014 (has links) This thesis is focused on the methods and tools which are helpful in decision-making under uncertainty and risk. The methods of decision-making for discrete and continuous values of risk factors are used in the thesis. In case of discrete values of risk factors and decision-making under risk, the thesis uses the rule of expected values, the rule of expected value and variance and also calculates the value of perfect information. In case of decision-making under uncertainty, the thesis is focused on the rule of maximin and maximax, Laplace's rule, Hurwitz's rule and Savage's rule. The following part of the thesis is devoted to decision-making with continuous values of risk factors. It utilizes the Monte Carlo simulation method and the sensitivity analysis with the help of Lumina Analytica software. The last part of the thesis is aimed at utilization of decision trees in case of multistage decision-making. It uses the Treeplan software which works as a plugin in MS office Excel. All the mentioned methods are practically applied to a concrete case of analysing and ex post evaluating the business plans of a company, which is based at Jindřichův Hradec market.
159	Využití metod data miningu při analýze kreditních dat / Using data mining methods in the analysis of credit risk data Tvaroh, Tomáš January 2013 (has links) This thesis focuses on comparison of selected data mining methods for solving classification tasks with the method of logistic regression. First part of the thesis briefly introduces data mining as a scientific discipline and classification task is shown in the context of knowledge data discovery. Next part explains the principle of particular methods amongst which, along with logistic regression, artificial neural networks, classification decision trees and Support Vector Machine method were selected. Together with mathematical background of each algorithm, demonstration of how the classification functions for new examples is mentioned. Analytical part of this thesis tests decribed methods on real-world data from the Lending Club company and they are compared based on classification accuracy. Towards the end, an evaluation of logistic regression is made in terms of whether its majority position is due to historical reasons or for its high classification accuracy compared to other methods.
160	Využití statistických metod v data miningu při predikci chování zákazníků internetového obchodu / The use of statistical methods in data mining in predicting consumer behaviour for Internet purchases Podzimková, Michaela January 2015 (has links) Data mining is a new discipline that occurs with increasing amount of stored data and the increasing need to obtain the information hidden in them. It is focused on the mining of potentially useful information from large data sets and it lies at the intersection of statistics, machine learning, artificial intelligence, databases and other areas. The aim of this thesis is to present the process of data mining with an emphasis on its connection with statistics and to describe a selection of statistical methods widely used in this field and which were also used in the applied data mining problem in this thesis. Real data from purchases in the online store show that using different methods gives different results and interesting information about purchasing behavior, and also proves that not all methods are always applicable to all types of tasks.

Search results