• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 77
  • 74
  • 52
  • 10
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 271
  • 271
  • 177
  • 167
  • 95
  • 56
  • 55
  • 51
  • 50
  • 47
  • 44
  • 43
  • 42
  • 40
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
251

Extraction de connaissances pour la modélisation tri-dimensionnelle de l'interactome structural / Knowledge-based approaches for modelling the 3D structural interactome

Ghoorah, Anisah W. 22 November 2012 (has links)
L'étude structurale de l'interactome cellulaire peut conduire à des découvertes intéressantes sur les bases moléculaires de certaines pathologies. La modélisation par homologie et l'amarrage de protéines ("protein docking") sont deux approches informatiques pour modéliser la structure tri-dimensionnelle (3D) d'une interaction protéine-protéine (PPI). Des études précédentes ont montré que ces deux approches donnent de meilleurs résultats quand des données expérimentales sur les PPIs sont prises en compte. Cependant, les données PPI ne sont souvent pas disponibles sous une forme facilement accessible, et donc ne peuvent pas être re-utilisées par les algorithmes de prédiction. Cette thèse présente une approche systématique fondée sur l'extraction de connaissances pour représenter et manipuler les données PPI disponibles afin de faciliter l'analyse structurale de l'interactome et d'améliorer les algorithmes de prédiction par la prise en compte des données PPI. Les contributions majeures de cette thèse sont de : (1) décrire la conception et la mise en oeuvre d'une base de données intégrée KBDOCK qui regroupe toutes les interactions structurales domaine-domaine (DDI); (2) présenter une nouvelle méthode de classification des DDIs par rapport à leur site de liaison dans l'espace 3D et introduit la notion de site de liaison de famille de domaines protéiques ("domain family binding sites" ou DFBS); (3) proposer une classification structurale (inspirée du système CATH) des DFBSs et présenter une étude étendue sur les régularités d'appariement entre DFBSs en terme de structure secondaire; (4) introduire une approche systématique basée sur le raisonnement à partir de cas pour modéliser les structures 3D des complexes protéiques à partir des DDIs connus. Une interface web (http://kbdock.loria.fr) a été développée pour rendre accessible le système KBDOCK / Understanding how the protein interactome works at a structural level could provide useful insights into the mechanisms of diseases. Comparative homology modelling and ab initio protein docking are two computational methods for modelling the three-dimensional (3D) structures of protein-protein interactions (PPIs). Previous studies have shown that both methods give significantly better predictions when they incorporate experimental PPI information. However, in general, PPI information is often not available in an easily accessible way, and cannot be re-used by 3D PPI modelling algorithms. Hence, there is currently a need to develop a reliable framework to facilitate the reuse of PPI data. This thesis presents a systematic knowledge-based approach for representing, describing and manipulating 3D interactions to study PPIs on a large scale and to facilitate knowledge-based modelling of protein-protein complexes. The main contributions of this thesis are: (1) it describes an integrated database of non-redundant 3D hetero domain interactions; (2) it presents a novel method of describing and clustering DDIs according to the spatial orientations of the binding partners, thus introducing the notion of "domain family-level binding sites" (DFBS); (3) it proposes a structural classification of DFBSs similar to the CATH classification of protein folds, and it presents a study of secondary structure propensities of DFBSs and interaction preferences; (4) it introduces a systematic case-base reasoning approach to model on a large scale the 3D structures of protein complexes from existing structural DDIs. All these contributions have been made publicly available through a web server (http://kbdock.loria.fr)
252

Apport des images satellites à très haute résolution spatiale couplées à des données géographiques multi-sources pour l’analyse des espaces urbains / Contribution of very high spatial resolution satellite images combined with multi-sources geographic data to analyse urban spaces

Rougier, Simon 28 September 2016 (has links)
Les villes sont confrontées à de nombreuses problématiques environnementales. Leurs gestionnaires ont besoin d'outils et d'une bonne connaissance de leur territoire. Un objectif est de mieux comprendre comment s'articulent les trames grise et verte pour les analyser et les représenter. Il s'agit aussi de proposer une méthodologie pour cartographier la structure urbaine à l'échelle des tissus en tenant compte de ces trames. Les bases de données existantes ne cartographient pas la végétation de manière exhaustive. Ainsi la première étape est d'extraire la végétation arborée et herbacée à partir d'images satellites Pléiades par une analyse orientée-objet et une classification par apprentissage actif. Sur la base de ces classifications et de données multi-sources, la cartographie des tissus se base sur une démarche d'extraction de connaissances à partir d'indicateurs issus de l'urbanisme et de l'écologie du paysage. Cette méthodologie est construite sur Strasbourg puis appliquée à Rennes. / Climate change presents cities with significant environmental challenges. Urban planners need decision-making tools and a better knowledge of their territory. One objective is to better understand the link between the grey and the green infrastructures in order to analyse and represent them. The second objective is to propose a methodology to map the urban structure at urban fabric scale taking into account the grey and green infrastructures. In current databases, vegetation is not mapped in an exhaustive way. Therefore the first step is to extract tree and grass vegetation using Pléiades satellite images using an object-based image analysis and an active learning classification. Based on those classifications and multi-sources data, an approach based on knowledge discovery in databases is proposed. It is focused on set of indicators mostly coming from urbanism and landscape ecology. The methodology is built on Strasbourg and applied on Rennes to validate and check its reproducibility.
253

Algorithmes pour la fouille de données et la bio-informatique / Algorithms for data mining and bio-informatics

Mondal, Kartick Chandra 12 July 2013 (has links)
L'extraction de règles d'association et de bi-clusters sont deux techniques de fouille de données complémentaires majeures, notamment pour l'intégration de connaissances. Ces techniques sont utilisées dans de nombreux domaines, mais aucune approche permettant de les unifier n'a été proposée. Hors, réaliser ces extractions indépendamment pose les problèmes des ressources nécessaires (mémoire, temps d'exécution et accès aux données) et de l'unification des résultats. Nous proposons une approche originale pour extraire différentes catégories de modèles de connaissances tout en utilisant un minimum de ressources. Cette approche est basée sur la théorie des ensembles fermés et utilise une nouvelle structure de données pour extraire des représentations conceptuelles minimales de règles d'association, bi-clusters et règles de classification. Ces modèles étendent les règles d'association et de classification et les bi-clusters classiques, les listes d'objets supportant chaque modèle et les relations hiérarchiques entre modèles étant également extraits. Cette approche a été appliquée pour l'analyse de données d'interaction protéomiques entre le virus VIH-1 et l'homme. L'analyse de ces interactions entre espèces est un défi majeur récent en bio-informatique. Plusieurs bases de données intégrant des informations hétérogènes sur les interactions et des connaissances biologiques sur les protéines ont été construites. Les résultats expérimentaux montrent que l'approche proposée peut traiter efficacement ces bases de données et que les modèles conceptuels extraits peuvent aider à la compréhension et à l'analyse de la nature des relations entre les protéines interagissant. / Knowledge pattern extraction is one of the major topics in the data mining and background knowledge integration domains. Out of several data mining techniques, association rule mining and bi-clustering are two major complementary tasks for these topics. These tasks gained much importance in many domains in recent years. However, no approach was proposed to perform them in one process. This poses the problems of resources required (memory, execution times and data accesses) to perform independent extractions and of the unification of the different results. We propose an original approach for extracting different categories of knowledge patterns while using minimum resources. This approach is based on the frequent closed patterns theoretical framework and uses a novel suffix-tree based data structure to extract conceptual minimal representations of association rules, bi-clusters and classification rules. These patterns extend the classical frameworks of association and classification rules, and bi-clusters as data objects supporting each pattern and hierarchical relationships between patterns are also extracted. This approach was applied to the analysis of HIV-1 and human protein-protein interaction data. Analyzing such inter-species protein interactions is a recent major challenge in computational biology. Databases integrating heterogeneous interaction information and biological background knowledge on proteins have been constructed. Experimental results show that the proposed approach can efficiently process these databases and that extracted conceptual patterns can help the understanding and analysis of the nature of relationships between interacting proteins.
254

A scalable evolutionary learning classifier system for knowledge discovery in stream data mining

Dam, Hai Huong, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2008 (has links)
Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks.
255

A scalable evolutionary learning classifier system for knowledge discovery in stream data mining

Dam, Hai Huong, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2008 (has links)
Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks.
256

A scalable evolutionary learning classifier system for knowledge discovery in stream data mining

Dam, Hai Huong, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2008 (has links)
Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks.
257

有關對調適與演化機制的再審思-在財務時間序列資料中應用的統計分析 / Rethinking the Appeal of Adaptation and Evolution: Statistical Analysis of Empirical Study in the Financial Time Series

林維垣 Unknown Date (has links)
本研究的主要目的是希望喚起國內、外學者對演化科學在經濟學上的重視,結合電腦、生物科技、心理學與數學於經濟學中,希望對傳統經濟學上因簡化假設而無法克服的實際經濟問題,可以利用電腦模擬技術獲得解決,並獲取新知與技能。 本研究共有六章,第一章為緒論,敘述緣由與研究動機。第二章介紹傳統經濟學的缺失,再以資料掘取知識及智慧系統建構金融市場。第三章則介紹各種不同人工智慧的方法以模擬金融市場的投資策略。第四章建立無結構性變遷時間序列模型--交易策略電腦模擬分析,僅以遺傳演算法模擬金融市場的投資策略,分別由投資組合、交易成本、調適性、演化、與統計的觀點對策略作績效評分析。第五章則建立簡單的結構性變遷模型,分別由調適性與統計的觀點,採取遺傳演算法再對投資策略進行有效性評估分析。第六章則利用資料掘取知識與智慧系統結合計量經濟學的方法,建構遺傳演算法發展投資策略的步驟,以台灣股票市場的資料進行實証研究,分別就投資策略、交易成本、調適性與演化的觀點作分析。最後一章則為結論。 未來研究的方向有: 1. 其他各種不同人工智慧的方法的比較分析,如人工神經網路、遺傳規劃法等進行績效的交叉比較分析。 2. 利用分類系統(Classifier System)與模糊邏輯的方法,改善標準遺傳演算法對策略編碼的效率,並建構各種不同的複雜策略以符合真實世界的決策過程。 3. 建構其他人工時間資料的模擬比較分析,例如ARCH (Autoregressive Conditional Heteroskedasticity)模型、Threshold 模型、 確定性(Deterministic) 模型等其他時間序列模型與更複雜的結構性變遷模型。 4. 進一步研究遺傳演算法所使用的完整資訊(例如,各種不同指標的選取)。 5. 本研究係採用非即時分析系統(Offline System),進一步研究即時分析系統 (Online Sysetem)在實務上是有必要的。 / Historically, the study of economics has been advanced by a combination of empirical observation and theoretic development. The analysis of mathematical equilibrium in theoretical economic models has been the predominant mode of progress in recent decades. Such models provide powerful insights into economic processes, but usually make restrictive assumptions and appear to be over simplifications of complex economic system. However, the advent of cheap computing power and new intelligent technologies makes it possible to delve further into some of the complexities inherent in the real economy. It is now feasible to create a rudimentary form of “artificial economic life”. First, we build the framework of artificial stock markets by using data mining and intelligent system. Second, in order to analyze competition among buyers and sellers in the artificial market, we introduce various methods of artificial intelligence to design trading rules, and investigate how machine-learning techniques might be applied to search the optimal investment strategy. Third, we create a miniature economic laboratory to build the artificial stock market by genetic algorithms to analyze investment strategies, by using real and artificial data, which consider both structural change and nonstructural change cases. Finally, we use statistical analysis to examine the performance of the portfolio strategies generated by genetic algorithms.
258

An?lise dos indicadores de qualidade versus taxa de abandono utilizando m?todo de regress?o m?ltipla para servi?o de banda larga

Fernandes Neto, Andr? Pedro 20 June 2008 (has links)
Made available in DSpace on 2014-12-17T14:52:36Z (GMT). No. of bitstreams: 1 AndrePFN.pdf: 1525936 bytes, checksum: edb576494fd35f42e78d512df4fc02df (MD5) Previous issue date: 2008-06-20 / Telecommunication is one of the most dynamic and strategic areas in the world. Many technological innovations has modified the way information is exchanged. Information and knowledge are now shared in networks. Broadband Internet is the new way of sharing contents and information. This dissertation deals with performance indicators related to maintenance services of telecommunications networks and uses models of multivariate regression to estimate churn, which is the loss of customers to other companies. In a competitive environment, telecommunications companies have devised strategies to minimize the loss of customers. Loosing customers presents a higher cost than obtaining new ones. Corporations have plenty of data stored in a diversity of databases. Usually the data are not explored properly. This work uses the Knowledge Discovery in Databases (KDD) to establish rules and new models to explain how churn, as a dependent variable, are related to a diversity of service indicators, such as time to deploy the service (in hours), time to repair (in hours), and so on. Extraction of meaningful knowledge is, in many cases, a challenge. Models were tested and statistically analyzed. The work also shows results that allows the analysis and identification of which quality services indicators influence the churn. Actions are also proposed to solve, at least in part, this problem / A ?rea de telecomunica??es ? uma das mais estrat?gicas e din?micas do mundo atual. Esse fato se deve a in?meras inova??es tecnol?gicas que afetaram a forma como as informa??es trafegam. O conhecimento deixou de ser percebido como um ac?mulo linear, l?gico e cronol?gico de informa??es e passou a ser visto como uma constru??o em rede, consequentemente a massifica??o da Internet banda larga em alta velocidade teve grande influ?ncia sobre esse fen?meno. Essa disserta??o aborda um estudo sobre medi??o de desempenho e servi?os de manuten??o em telecomunica??es, com o uso de ferramentas de descoberta de conhecimento em base de dados (KDD). Objetiva-se transformar informa??es, armazenadas nas bases de dados de uma grande empresa de telecomunica??es do pa?s, em conhecimento ?til. A metodologia de pesquisa utilizada focou no uso de an?lise de regress?o m?ltipla como ferramenta para estimar a taxa de abandono de clientes em servi?os de Internet de banda larga, como vari?vel dependente, e indicadores de qualidade de servi?o como vari?veis independentes. Modelos foram testados e analisados estatisticamente. O trabalho apresenta resultados que permitem analisar e identificar quais os indicadores de qualidade que exercem maior influ?ncia na taxa de abandono dos clientes. S?o propostas sugest?es que possam ser aplicadas para melhoria de qualidade do servi?o percebido e consequentemente diminui??es das perdas com a taxa de abandono
259

[en] DATA MINING APPLIED TO DIRECT MARKETING AND MARKET SEGMENTATION / [es] MINERACIÓN DE DATOS PARA LA SOLUCIÓN DE PROBLEMAS DE MARKETING DIRECTO Y SEGMENTACIÓN DE MERCADO / [pt] MINERAÇÃO DE DADOS APLICADA NA SOLUÇÃO DE PROBLEMAS DE MARKETING DIRETO E SEGMENTAÇÃO DE MERCADO

HUGO LEONARDO COSTA DE AZEVEDO 28 August 2001 (has links)
[pt] Devido à quantidade cada vez maior de dados armazenada pelas instituições, a área de mineração de dados tem se tornado cada vez mais relevante e vários métodos e métodos têm sido propostos de maneira a aumentar sua aplicabilidade e desempenho. Esta dissertação investiga o uso de diversos métodos e técnicas de mineração de dados na modelagem e solução de problemas de Marketing. O objetivo do trabalho foi fazer um levantamento de alguns métodos e técnicas de mineração, avaliar seus desempenhos e procurar integrá-los na solução de problemas de marketing que envolvessem tarefas de agrupamento ou classificação. O trabalho consistiu de quatro etapas principais: estudo sobre o processo de descoberta de conhecimento em bancos de dados (KDD - Knowledge Discovery in Databases); estudo sobre Marketing e alguns problemas de Marketing de Banco de Dados (DBM - Database Marketing) que envolvessem tarefas de agrupamento e classificação; levantamento e estudo de métodos e técnicas de Inteligência Computacional e Estatística que pudessem ser empregados na solução de alguns desses problemas; e estudos de caso. A primeira etapa do trabalho envolveu um estudo detalhado das diversas fases do processo de KDD: limpeza dos dados; seleção; codificação e transformação; redução de dimensionalidade; mineração; e pós-processamento. Na segunda etapa foram estudados os principais conceitos de Marketing e de DBM e a relação entre eles e o processo de KDD. Pesquisaram-se alguns dos tipos de problemas comuns na área e escolheram- se para análise dois que fossem suficientemente complexos e tivessem a possibilidade de se ter acesso a alguma empresa que fornecesse os dados e validasse a solução posteriormente. Os casos selecionados foram um de marketing direto e outro de segmentação de mercado. Na terceira etapa, foram estudados os métodos de Inteligência Computacional e Estatística usualmente empregados em tarefas de agrupamento e classificação de dados. Foram estudados: Redes Perceptron Multi-Camadas, Mapas Auto- Organizáveis, Fuzzy C-Means, K-means, sistemas Neuro-Fuzzy, Árvores de Decisão, métodos Hierárquicos de agrupamento, Regressão Logística, Fuções Discriminantes de Fisher, entre outros. Por fim, na última etapa, procurou-se integrar todos os métodos e técnicas estudados na solução de dois estudos de caso, propostos inicialmente na segunda etapa do trabalho. Uma vez proposta a solução para os estudos de caso, elas foram levadas aos especialistas em Marketing das empresas para serem validadas no âmbito do negócio. Os estudos de caso mostraram a grande utilidade e aplicabilidade dos métodos e técnicas estudadas em problemas de marketing direto e segmentação de mercado. Sem o emprego dos mesmos, a solução para muitos desses problemas tornar-se-ia extremamente imprecisa ou até mesmo inviável. Mostraram também a grande importância das fases iniciais de pré-processamento dos dados no processo de KDD. Muitos desafios persistem ainda na área de mineração de dados, como a dificuldade de modelar dados não lineares e de manipular quantidades muito grande de dados, o que garante um vasto campo para pesquisa nos próximos anos. / [en] The Data Mining field has received great attention lately, due to the increasing amount of data stored by companies and institutions. A great number of Data Mining methods have been proposed so far, which is good but sometimes leads to confusion. This dissertation investigates the performance of many different methods and techniques of Data Mining used to model and solve Marketing problems. The goal of this research was to look for and study some data mining methods, compare them, and try to integrate them to solve Marketing problems involving clustering and classification tasks. This research can be divided in four stages: a study of the process of Knowledge Discovery in Databases (KDD); a study about Marketing problems involving clustering and classification; a study of some methods and techniques of Statistics and Computational Intelligence that could be used to solve some of those problems; and case studies. On the first stage of the research, the different tasks (clustering, classification, modeling, etc) and phases (data cleansing, data selection, data transformation, Data Mining, etc) of a KDD process were studied in detail. The second stage involved a study of the main concepts of Marketing and Database Marketing and their relation to the KDD process. The most common types of problems in the field were studied and, among them, two were selected to be furthered analyzed as case studies. One case was related to Direct Marketing and the other to Market Segmentation. These two cases were chosen because they were complex enough and it was possible to find a company to provide data to the problem and access to their marketing department. On the third stage, many different methods for clustering and classification were studied and compared. Among those methods, there were: Multilayer Perceptrons, Self Organizing Maps, Fuzzy C-Means, K-Means, Neuro-Fuzzy systems, Decision Trees, Hierarquical Clustering Methods, Logistic Regression, Fisher`s Linear Discriminants, etc Finally, on the last stage, all the methods and techniques studied were put together to solve the two case studies proposed earlier. Once they were solved, their solutions were submitted to the Marketing Department of the company who provided the data, so that they could validate the results in the context of their business. The case studies were able to show the large potential of applicability of the methods and techniques studied on problems of Market Segmentation and Direct Marketing. Without employing those methods, it would be very hard or even impossible to solve those problems. The case studies also helped verify the very important role of the data pre-processing phase on the KDD process. Many challenges persist in the data mining field. One could mention, for example, the difficulty to model non-linear data and to manipulate larges amounts of data. These and many other challenges provide a vast field of research to be done in the next years. / [es] Debido a la cantidad cada vez mayor de datos almacenados por las instituiciones, el área de mineración de datos há ganado relevancia y varios métodos han sido propuestos para aumentar su aplicabilidad y desempeño. Esta disertación investiga el uso de diversos métodos y técnicas de mineración de datos en la modelación y solución de problemas de Marketing. EL objetivo del trabajo fue hacer un levantamiento de algunos métodos y técnicas de mineración, evaluar su desempeño e integrarlos en la solución de problemas de marketing que involucran tareas de agrupamiento y clasificación. EL trabajo consta de cuatro etapas principales: estudio sobre el proceso de descubrimiento de conocimientos en bancos de datos (KDD - Knowledge Discovery in Databases); estudio sobre Marketing y algunos problemas de Marketing de Banco de Datos (DBM - Database Marketing) que incluyen tareas de agrupamientoy clasificación; levantamiento y estudio de métodos y técnicas de Inteligencia Computacional y Estadística que pueden ser empleados en la solución de algunos problemas; y por último, estudios de casos. La primera etapa del trabajo contiene un estudio detallado de las diversas fases del proceso de KDD: limpeza de datos; selección; codificación y transformación; reducción de dimensionalidad; mineración; y posprocesamento. En la segunda etapa fueron estudados los principales conceptos de Marketing y de DBM y la relación entre ellos y el proceso de KDD. Algunos de los tipos de problemas comunes en la área fueron investigados, seleccionando dos de ellos, por ser suficientemente complejos y tener posibilidad de acceso a alguna empresa que suministrase los datos y evaluase posteriormente la solución. Los casos selecionados fueron uno de marketing directo y otro de segmentación de mercado. En la tercera etapa, se estudiaron los métodos de Inteligencia Computacional y Estadística que son empleados usualmente en tareas de agrupamiento y clasificación de datos. Éstos fueron: Redes Perceptron Multicamada, Mapas Autoorganizables, Fuzzy C-Means, K-means, sistemas Neuro- Fuzzy, Árboles de Decisión, métodos Jerárquicos de agrupamiento, Regresión Logística, Fuciones Discriminantes de Fisher, entre otros. En la última etapa, se integraron todos los métodos y técnicas estudiados en la solución de dos estudios de casos, propuestos inicialmente en la segunda etapa del trabajo. Una vez proposta la solución para el estudios de casos, éstas fueron evaluadas por los especialistas en Marketing de las empresas. Los estudios de casos mostraron la grande utilidad y aplicabilidad de los métodos y técnicas estudiadas en problemas de marketing directo y segmentación de mercado. Sin el empleo de dichos métodos, la solución para muchos de esos problemas sería extremadamente imprecisa o hasta incluso inviáble. Se comprobó también la gran importancia de las fases iniciales de preprocesamiento de datos en el proceso de KDD. Existen todavía muchos desafíos en el área de mineración de datos, como la dificuldad de modelar datos no lineales y de manipular cantidades muy grandes de datos, lo que garantiza un vasto campo de investigación
260

Spatial analysis of invasive alien plant distribution patterns and processes using Bayesian network-based data mining techniques

Dlamini, Wisdom Mdumiseni Dabulizwe 03 1900 (has links)
Invasive alien plants have widespread ecological and socioeconomic impacts throughout many parts of the world, including Swaziland where the government declared them a national disaster. Control of these species requires knowledge on the invasion ecology of each species including how they interact with the invaded environment. Species distribution models are vital for providing solutions to such problems including the prediction of their niche and distribution. Various modelling approaches are used for species distribution modelling albeit with limitations resulting from statistical assumptions, implementation and interpretation of outputs. This study explores the usefulness of Bayesian networks (BNs) due their ability to model stochastic, nonlinear inter-causal relationships and uncertainty. Data-driven BNs were used to explore patterns and processes influencing the spatial distribution of 16 priority invasive alien plants in Swaziland. Various BN structure learning algorithms were applied within the Weka software to build models from a set of 170 variables incorporating climatic, anthropogenic, topo-edaphic and landscape factors. While all the BN models produced accurate predictions of alien plant invasion, the globally scored networks, particularly the hill climbing algorithms, performed relatively well. However, when considering the probabilistic outputs, the constraint-based Inferred Causation algorithm which attempts to generate a causal BN structure, performed relatively better. The learned BNs reveal that the main pathways of alien plants into new areas are ruderal areas such as road verges and riverbanks whilst humans and human activity are key driving factors and the main dispersal mechanism. However, the distribution of most of the species is constrained by climate particularly tolerance to very low temperatures and precipitation seasonality. Biotic interactions and/or associations among the species are also prevalent. The findings suggest that most of the species will proliferate by extending their range resulting in the whole country being at risk of further invasion. The ability of BNs to express uncertain, rather complex conditional and probabilistic dependencies and to combine multisource data makes them an attractive technique for species distribution modeling, especially as joint invasive species distribution models (JiSDM). Suggestions for further research are provided including the need for rigorous invasive species monitoring, data stewardship and testing more BN learning algorithms. / Environmental Sciences / D. Phil. (Environmental Science)

Page generated in 0.0998 seconds