191 |
Node Centric Community Detection and Evolutional Prediction in Dynamic NetworksOluwafolake A Ayano (13161288) 27 July 2022 (has links)
<p> </p>
<p>Advances in technology have led to the availability of data from different platforms such as the web and social media platforms. Much of this data can be represented in the form of a network consisting of a set of nodes connected by edges. The nodes represent the items in the networks while the edges represent the interactions between the nodes. Community detection methods have been used extensively in analyzing these networks. However, community detection in evolving networks has been a significant challenge because of the frequent changes to the networks and the need for real-time analysis. Using Static community detection methods for analyzing dynamic networks will not be appropriate because static methods do not retain a network’s history and cannot provide real-time information about the communities in the network.</p>
<p>Existing incremental methods treat changes to the network as a sequence of edge additions and/or removals; however, in many real-world networks, changes occur when a node is added with all its edges connecting simultaneously. </p>
<p>For efficient processing of such large networks in a timely manner, there is a need for an adaptive analytical method that can process large networks without recomputing the entire network after its evolution and treat all the edges involved with a node equally. </p>
<p>We proposed a node-centric community detection method that incrementally updates the community structure in the network using the already known structure of the network to avoid recomputing the entire network from the scratch and consequently achieve a high-quality community structure. The results from our experiments suggest that our approach is efficient for incremental community detection of node-centric evolving networks. </p>
|
192 |
Indexing and Search Algorithmsfor Web shops : / Indexering och sök algoritmer för webshoppar :Reimers, Axel, Gustafsson, Isak January 2016 (has links)
Web shops today needs to be more and more responsive, where one part of this responsivenessis fast product searches. One way of getting faster searches are by searching against anindex instead of directly against a database. Network Expertise Sweden AB (Net Exp) wants to explore different methods of implementingan index in their future web shop, building upon the open-source web shop platformSmartStore.NET. Since SmartStore.NET does all of its searches directly against itsdatabase, it will not scale well and will wear more on the database. The aim was thereforeto find different solutions to offload the database by using an index instead. A prototype that retrieved products from a database and made them searchable through anindex was developed, evaluated and implemented. The prototype indexed the data with aninverted index algorithm, and was made searchable with a search algorithm that mixed typeboolean queries with normal queries. / Webbutiker idag behöver vara mer och mer responsiva, en del av denna responsivitet ärsnabb produkt sökningar. Ett sätt att skaffa snabbare sökningar är genom att söka mot ettindex istället för att söka direkt mot en databas. Network Expertise Sweden AB vill utforska olika metoder för att implementera ett index ideras framtida webbutik, byggt ovanpå SmartStore.NET som är öppen käll-kod. Då Smart-Store.NET gör alla av sina sökningar direkt mot sin databas, kommer den inte att skala braoch kommer slita mer på databasen. Målsättningen var därför att hitta olika lösningar somavlastar databasen genom att använda ett index istället. En prototyp som hämtade produkter från en databas och gjorde dom sökbara genom ettindex var utvecklad, utvärderad och implementerad. Prototypen indexerade datan med eninverterad indexerings algoritm, och gjordes sökbara med en sök algoritm som blandar booleskafrågor med normala frågor. / <p></p><p></p><p></p>
|
193 |
A new model for worm detection and response : development and evaluation of a new model based on knowledge discovery and data mining techniques to detect and respond to worm infection by integrating incident response, security metrics and apoptosisMohd Saudi, Madihah January 2011 (has links)
Worms have been improved and a range of sophisticated techniques have been integrated, which make the detection and response processes much harder and longer than in the past. Therefore, in this thesis, a STAKCERT (Starter Kit for Computer Emergency Response Team) model is built to detect worms attack in order to respond to worms more efficiently. The novelty and the strengths of the STAKCERT model lies in the method implemented which consists of STAKCERT KDD processes and the development of STAKCERT worm classification, STAKCERT relational model and STAKCERT worm apoptosis algorithm. The new concept introduced in this model which is named apoptosis, is borrowed from the human immunology system has been mapped in terms of a security perspective. Furthermore, the encouraging results achieved by this research are validated by applying the security metrics for assigning the weight and severity values to trigger the apoptosis. In order to optimise the performance result, the standard operating procedures (SOP) for worm incident response which involve static and dynamic analyses, the knowledge discovery techniques (KDD) in modeling the STAKCERT model and the data mining algorithms were used. This STAKCERT model has produced encouraging results and outperformed comparative existing work for worm detection. It produces an overall accuracy rate of 98.75% with 0.2% for false positive rate and 1.45% is false negative rate. Worm response has resulted in an accuracy rate of 98.08% which later can be used by other researchers as a comparison with their works in future.
|
194 |
Textual data mining applications for industrial knowledge management solutionsUr-Rahman, Nadeem January 2010 (has links)
In recent years knowledge has become an important resource to enhance the business and many activities are required to manage these knowledge resources well and help companies to remain competitive within industrial environments. The data available in most industrial setups is complex in nature and multiple different data formats may be generated to track the progress of different projects either related to developing new products or providing better services to the customers. Knowledge Discovery from different databases requires considerable efforts and energies and data mining techniques serve the purpose through handling structured data formats. If however the data is semi-structured or unstructured the combined efforts of data and text mining technologies may be needed to bring fruitful results. This thesis focuses on issues related to discovery of knowledge from semi-structured or unstructured data formats through the applications of textual data mining techniques to automate the classification of textual information into two different categories or classes which can then be used to help manage the knowledge available in multiple data formats. Applications of different data mining techniques to discover valuable information and knowledge from manufacturing or construction industries have been explored as part of a literature review. The application of text mining techniques to handle semi-structured or unstructured data has been discussed in detail. A novel integration of different data and text mining tools has been proposed in the form of a framework in which knowledge discovery and its refinement processes are performed through the application of Clustering and Apriori Association Rule of Mining algorithms. Finally the hypothesis of acquiring better classification accuracies has been detailed through the application of the methodology on case study data available in the form of Post Project Reviews (PPRs) reports. The process of discovering useful knowledge, its interpretation and utilisation has been automated to classify the textual data into two classes.
|
195 |
Definition of a human-machine learning process from timed observations : application to the modelling of human behaviourfor the detection of abnormal behaviour of old people at home / Définition d'un processus d'apprentissage par l'homme et la machine à partir d'observations datées : application à la modélisation du comportement humain pour la détection des comportements anormaux de personnes âgées maintenues dans leur domicilePomponio, Laura 26 June 2012 (has links)
L'acquisition et la modélisation de connaissances ont été abordés jusqu'à présent selon deux approches principales : les êtres humains (experts) à l'aide des méthodologies de l'Ingénierie des Connaissances et le Knowledge Management, et les données à l'aide des techniques relevant de la découverte de connaissances à partir du contenu de bases de données (fouille de données). Cette thèse porte sur la conception d'un processus d'apprentissage conjoint par l'être humain et la machine combinant une approche de modélisation des connaissances de type Ingénierie des Connaissances (TOM4D, Timed Observation Modelling for Diagnosis) et une approche d'apprentissage automatique fondée sur un processus de découverte de connaissances à partir de données datées (TOM4L, Timed Observation Mining for Learning). Ces deux approches étant fondées sur la Théorie des Observations Datées, les modèles produits sont représentés dans le même formalisme ce qui permet leur comparaison et leur combinaison. Le mémoire propose également une méthode d'abstraction, inspiée des travaux de Newell sur le "Knowledge Level'' et fondée sur le paradigme d'observation datée, qui a pour but de traiter le problème de la différence de niveau d'abstraction inhérent entre le discours d'un expert et les données mesurées sur un système par un processus d'abstractions successives. Les travaux présentés dans ce mémoire ayant été menés en collaboration avec le CSTB de Sophia Antipolis (Centre Scientifique et Technique du Bâtiment), ils sont appliqués à la modélisation de l'activité humaine dans le cadre de l'aide aux personnes âgées maintenues à domicile. / Knowledge acquisition has been traditionally approached from a primarily people-driven perspective, through Knowledge Engineering and Management, or from a primarily data-driven approach, through Knowledge Discovery in Databases, rather than from an integral standpoint. This thesis proposes then a human-machine learning approach that combines a Knowledge Engineering modelling approach called TOM4D (Timed Observation Modelling For Diagnosis) with a process of Knowledge Discovery in Databases based on an automatic data mining technique called TOM4L (Timed Observation Mining For Learning). The combination and comparison between models obtained through TOM4D and those ones obtained through TOM4L is possible, owing to that TOM4D and TOM4L are based on the Theory of Timed Observations and share the same representation formalism. Consequently, a learning process nourished with experts' knowledge and knowledge discovered in data is defined in the present work. In addition, this dissertation puts forward a theoretical framework of abstraction levels, in line with the mentioned theory and inspired by the Newell's Knowledge Level work, in order to reduce the broad gap of semantic content that exists between data, relative to an observed process, in a database and what can be inferred in a higher level; that is, in the experts' discursive level. Thus, the human-machine learning approach along with the notion of abstraction levels are then applied to the modelling of human behaviour in smart environments. In particular, the modelling of elderly people's behaviour at home in the GerHome Project of the CSTB (Centre Scientifique et Technique du Bâtiment) of Sophia Antipolis, France.
|
196 |
Etude comportementale des mesures d'intérêt d'extraction de connaissances / Behavioral study of interestingness measures of knowledge extractionGrissa, Dhouha 02 December 2013 (has links)
La recherche de règles d’association intéressantes est un domaine important et actif en fouille de données. Puisque les algorithmes utilisés en extraction de connaissances à partir de données (ECD), ont tendance à générer un nombre important de règles, il est difficile à l’utilisateur de sélectionner par lui même les connaissances réellement intéressantes. Pour répondre à ce problème, un post-filtrage automatique des règles s’avère essentiel pour réduire fortement leur nombre. D’où la proposition de nombreuses mesures d’intérêt dans la littérature, parmi lesquelles l’utilisateur est supposé choisir celle qui est la plus appropriée à ses objectifs. Comme l’intérêt dépend à la fois des préférences de l’utilisateur et des données, les mesures ont été répertoriées en deux catégories : les mesures subjectives (orientées utilisateur ) et les mesures objectives (orientées données). Nous nous focalisons sur l’étude des mesures objectives. Néanmoins, il existe une pléthore de mesures objectives dans la littérature, ce qui ne facilite pas le ou les choix de l’utilisateur. Ainsi, notre objectif est d’aider l’utilisateur, dans sa problématique de sélection de mesures objectives, par une approche par catégorisation. La thèse développe deux approches pour assister l’utilisateur dans sa problématique de choix de mesures objectives : (1) étude formelle suite à la définition d’un ensemble de propriétés de mesures qui conduisent à une bonne évaluation de celles-ci ; (2) étude expérimentale du comportement des différentes mesures d’intérêt à partir du point de vue d’analyse de données. Pour ce qui concerne la première approche, nous réalisons une étude théorique approfondie d’un grand nombre de mesures selon plusieurs propriétés formelles. Pour ce faire, nous proposons tout d’abord une formalisation de ces propriétés afin de lever toute ambiguïté sur celles-ci. Ensuite, nous étudions, pour différentes mesures d’intérêt objectives, la présence ou l’absence de propriétés caractéristiques appropriées. L’évaluation des mesures est alors un point de départ pour une catégorisation de celle-ci. Différentes méthodes de classification ont été appliquées : (i) méthodes sans recouvrement (CAH et k-moyennes) qui permettent l’obtention de groupes de mesures disjoints, (ii) méthode avec recouvrement (analyse factorielle booléenne) qui permet d’obtenir des groupes de mesures qui se chevauchent. Pour ce qui concerne la seconde approche, nous proposons une étude empirique du comportement d’une soixantaine de mesures sur des jeux de données de nature différente. Ainsi, nous proposons une méthodologie expérimentale, où nous cherchons à identifier les groupes de mesures qui possèdent, empiriquement, un comportement semblable. Nous effectuons par la suite une confrontation avec les deux résultats de classification, formel et empirique dans le but de valider et mettre en valeur notre première approche. Les deux approches sont complémentaires, dans l’optique d’aider l’utilisateur à effectuer le bon choix de la mesure d’intérêt adaptée à son application. / The search for interesting association rules is an important and active field in data mining. Since knowledge discovery from databases used algorithms (KDD) tend to generate a large number of rules, it is difficult for the user to select by himself the really interesting knowledge. To address this problem, an automatic post-filtering rules is essential to significantly reduce their number. Hence, many interestingness measures have been proposed in the literature in order to filter and/or sort discovered rules. As interestingness depends on both user preferences and data, interestingness measures were classified into two categories : subjective measures (user-driven) and objective measures (data-driven). We focus on the study of objective measures. Nevertheless, there are a plethora of objective measures in the literature, which increase the user’s difficulty for choosing the appropriate measure. Thus, our goal is to avoid such difficulty by proposing groups of similar measures by means of categorization approaches. The thesis presents two approaches to assist the user in his problematic of objective measures choice : (1) formal study as per the definition of a set of measures properties that lead to a good measure evaluation ; (2) experimental study of the behavior of various interestingness measures from data analysispoint of view. Regarding the first approach, we perform a thorough theoretical study of a large number of measures in several formal properties. To do this, we offer first of all a formalization of these properties in order to remove any ambiguity about them. We then study for various objective interestingness measures, the presence or absence of appropriate characteristic properties. Interestingness measures evaluation is therefore a starting point for measures categorization. Different clustering methods have been applied : (i) non overlapping methods (CAH and k-means) which allow to obtain disjoint groups of measures, (ii) overlapping method (Boolean factor analysis) that provides overlapping groups of measures. Regarding the second approach, we propose an empirical study of the behavior of about sixty measures on datasets with different nature. Thus, we propose an experimental methodology, from which we seek to identify groups of measures that have empirically similar behavior. We do next confrontation with the two classification results, formal and empirical in order to validate and enhance our first approach. Both approaches are complementary, in order to help the user making the right choice of the appropriate interestingness measure to his application.
|
197 |
An analysis of semantic data quality defiencies in a national data warehouse: a data mining approachBarth, Kirstin 07 1900 (has links)
This research determines whether data quality mining can be used to describe, monitor and evaluate the scope and impact of semantic data quality problems in the learner enrolment data on the National Learners’ Records Database. Previous data quality mining work has focused on anomaly detection and has assumed that the data quality aspect being measured exists as a data value in the data set being mined. The method for this research is quantitative in that the data mining techniques and model that are best suited for semantic data quality deficiencies are identified and then applied to the data. The research determines that unsupervised data mining techniques that allow for weighted analysis of the data would be most suitable for the data mining of semantic data deficiencies. Further, the academic Knowledge Discovery in Databases model needs to be amended when applied to data mining semantic data quality deficiencies. / School of Computing / M. Tech. (Information Technology)
|
198 |
[en] INTELLIGENT ASSISTANCE FOR KDD-PROCESS ORIENTATION / [pt] ASSISTÊNCIA INTELIGENTE À ORIENTAÇÃO DO PROCESSO DE DESCOBERTA DE CONHECIMENTO EM BASES DE DADOSRONALDO RIBEIRO GOLDSCHMIDT 15 December 2003 (has links)
[pt] A notória complexidade inerente ao processo de KDD -
Descoberta de Conhecimento em Bases de Dados - decorre
essencialmente de aspectos relacionados ao controle e à
condução deste processo (Fayyad et al., 1996b; Hellerstein
et al., 1999). De uma maneira geral, estes aspectos envolvem
dificuldades em perceber inúmeros fatos cuja origem e os
níveis de detalhe são os mais diversos e difusos, em
interpretar adequadamente estes fatos, em conjugar
dinamicamente tais interpretações e em decidir que ações
devem ser realizadas de forma a procurar obter bons
resultados. Como identificar precisamente os objetivos do
processo, como escolher dentre os inúmeros algoritmos de
mineração e de pré-processamento de dados existentes e,
sobretudo, como utilizar adequadamente os algoritmos
escolhidos em cada situação são alguns exemplos
das complexas e recorrentes questões na condução de
processos de KDD. Cabe ao analista humano a árdua tarefa de
orientar a execução de processos de KDD. Para tanto, diante
de cada cenário, o homem utiliza sua experiência anterior,
seus conhecimentos e sua intuição para interpretar e
combinar os fatos de forma a decidir qual a estratégia a
ser adotada (Fayyad et al., 1996a, b; Wirth et al., 1998).
Embora reconhecidamente úteis e desejáveis, são poucas as
alternativas computacionais existentes voltadas a auxiliar
o homem na condução do processo de KDD (Engels, 1996; Amant
e Cohen, 1997; Livingston, 2001; Bernstein et al., 2002;
Brazdil et al., 2003). Aliado ao exposto acima, a demanda
por aplicações de KDD em diversas áreas vem crescendo de
forma muito acentuada nos últimos anos (Buchanan, 2000). É
muito comum não existirem profissionais com experiência em
KDD disponíveis para atender a esta crescente demanda
(Piatetsky-Shapiro, 1999). Neste contexto, a criação de
ferramentas inteligentes que auxiliem o homem no controle
do processo de KDD se mostra ainda mais oportuna (Brachman
e Anand, 1996; Mitchell, 1997). Assim sendo, esta tese teve
como objetivos pesquisar, propor, desenvolver e avaliar uma
Máquina de Assistência Inteligente à Orientação do Processo
de KDD que possa ser utilizada, fundamentalmente, como
instrumento didático voltado à formação de profissionais
especializados na área da Descoberta de Conhecimento em
Bases de Dados. A máquina proposta foi formalizada com base
na Teoria do Planejamento para Resolução de Problemas
(Russell e Norvig, 1995) da Inteligência Artificial
e implementada a partir da integração de funções de
assistência utilizadas em diferentes níveis de controle do
processo de KDD: Definição de Objetivos, Planejamento de
Ações de KDD, Execução dos Planos de Ações de KDD e
Aquisição e Formalização do Conhecimento. A Assistência à
Definição de Objetivos tem como meta auxiliar o homem
na identificação de tarefas de KDD cuja execução seja
potencialmente viável em aplicações de KDD. Esta
assistência foi inspirada na percepção de um certo tipo
de semelhança no nível intensional apresentado entre
determinados bancos de dados. Tal percepção auxilia na
prospecção do tipo de conhecimento a ser procurado, uma vez
que conjuntos de dados com estruturas similares tendem a
despertar interesses similares mesmo em aplicações de KDD
distintas. Conceitos da Teoria da Equivalência entre
Atributos de Bancos de Dados (Larson et al., 1989)
viabilizam a utilização de uma estrutura comum na qual
qualquer base de dados pode ser representada. Desta forma,
bases de dados, ao serem representadas na nova estrutura,
podem ser mapeadas em tarefas de KDD, compatíveis com tal
estrutura. Conceitos de Espaços Topológicos (Lipschutz,
1979) e recursos de Redes Neurais Artificiais (Haykin,
1999) são utilizados para viabilizar os mapeamentos entre
padrões heterogêneos. Uma vez definidos os objetivos em uma
aplicação de KDD, decisões sobre como tais objetivos podem
ser alcançados se tornam necessárias. O primeiro
passo envolve a escolha de qual algoritmo de mineração de dados é o mais
apropriado para o problema em questão. A Assistência ao Planejamento de Ações
de KDD auxilia o homem nesta escolha. Utiliza, para tanto, uma metodologia de
ordenação dos algoritmos de mineração baseada no desempenho prévio destes
algoritmos em problemas similares (Soares et al., 2001; Brazdil et al., 2003).
Critérios de ordenação de algoritmos baseados em similaridade entre bases de
dados nos níveis intensional e extensional foram propostos, descritos e avaliados.
A partir da escolha de um ou mais algoritmos de mineração de dados, o passo
seguinte requer a escolha de como deverá ser realizado o pré-processamento dos
dados. Devido à diversidade de algoritmos de pré-processamento, são muitas as
alternativas de combinação entre eles (Bernstein et al., 2002). A Assistência ao
Planejamento de Ações de KDD também auxilia o homem na formulação e na
escolha do plano ou dos planos de ações de KDD a serem adotados. Utiliza, para
tanto, conceitos da Teoria do Planejamento para Resolução de Problemas.
Uma vez escolhido um plano de ações de KDD, surge a necessidade de
executá-lo. A execução de um plano de ações de KDD compreende a execução, de
forma ordenada, dos algoritmos de KDD previstos no plano. A execução de um
algoritmo de KDD requer conhecimento sobre ele. A Assistência à Execução dos
Planos de Ações de KDD provê orientações específicas sobre algoritmos de KDD.
Adicionalmente, esta assistência dispõe de mecanismos que auxiliam, de forma
especializada, no processo de execução de algoritmos de KDD e na análise dos
resultados obtidos. Alguns destes mecanismos foram descritos e avaliados.
A execução da Assistência à Aquisição e Formalização do Conhecimento
constitui-se em um requisito operacional ao funcionamento da máquina proposta.
Tal assistência tem por objetivo adquirir e disponibilizar os conhecimentos sobre
KDD em uma representação e uma organização que viabilizem o processamento
das funções de assistência mencionadas anteriormente. Diversos recursos e
técnicas de aquisição de conhecimento foram utilizados na concepção desta
assistência. / [en] Generally speaking, such aspects involve difficulties in
perceiving innumerable facts whose origin and levels of
detail are highly diverse and diffused, in adequately
interpreting these facts, in dynamically conjugating such
interpretations, and in deciding which actions must be
performed in order to obtain good results. How are the
objectives of the process to be identified in a precise
manner? How is one among the countless existing data mining
and preprocessing algorithms to be selected? And most
importantly, how can the selected algorithms be put to
suitable use in each different situation? These are but
a few examples of the complex and recurrent questions that
are posed when KDD processes are performed. Human analysts
must cope with the arduous task of orienting the execution
of KDD processes. To this end, in face of each different
scenario, humans resort to their previous experiences,
their knowledge, and their intuition in order to interpret
and combine the facts and therefore be able to decide on
the strategy to be adopted (Fayyad et al., 1996a, b; Wirth
et al., 1998). Although the existing computational
alternatives have proved to be useful and desirable, few of
them are designed to help humans to perform KDD processes
(Engels, 1996; Amant and Cohen, 1997; Livingston, 2001;
Bernstein et al., 2002; Brazdil et al., 2003). In
association with the above-mentioned fact, the demand for
KDD applications in several different areas has increased
dramatically in the past few years (Buchanan, 2000). Quite
commonly, the number of available practitioners with
experience in KDD is not sufficient to satisfy this growing
demand (Piatetsky-Shapiro, 1999). Within such a context,
the creation of intelligent tools that aim to assist humans
in controlling KDD processes proves to be even more
opportune (Brachman and Anand, 1996; Mitchell, 1997).
Such being the case, the objectives of this thesis were to
investigate, propose, develop, and evaluate an Intelligent
Machine for KDD-Process Orientation that is basically
intended to serve as a teaching tool to be used in
professional specialization courses in the area of
Knowledge Discovery in Databases. The basis for
formalization of the proposed machine was the Planning
Theory for Problem-Solving (Russell and Norvig, 1995) in
Artificial Intelligence. Its implementation was based on
the integration of assistance functions that are used at
different KDD process control levels: Goal Definition, KDD
Action-Planning, KDD Action Plan Execution, and Knowledge
Acquisition and Formalization. The Goal Definition
Assistant aims to assist humans in identifying KDD
tasks that are potentially executable in KDD applications.
This assistant was inspired by the detection of a certain
type of similarity between the intensional levels presented
by certain databases. The observation of this fact helps
humans to mine the type of knowledge that must be
discovered since data sets with similar structures tend to
arouse similar interests even in distinct KDD applications.
Concepts from the Theory of Attribute Equivalence in
Databases (Larson et al., 1989) make it possible to use a
common structure in which any database may be represented.
In this manner, when databases are represented in the new
structure, it is possible to map them into KDD tasks that
are compatible with such a structure. Topological space
concepts and ANN resources as described in Topological
Spaces (Lipschutz, 1979) and Artificial Neural Nets
(Haykin, 1999) have been employed so as to allow mapping
between heterogeneous patterns. After the goals have been
defined in a KDD application, it is necessary to decide how
such goals are to be achieved. The first step involves
selecting the most appropriate data mining algorithm for
the problem at hand. The KDD Action-Planning Assistant
helps humans to make this choice. To this end, it makes
use of a methodology for ordering the mining algorithms
that is based on the previous experiences, their knowledge, and their intuition in order to
interpret and combine the facts and therefore be able to decide on the strategy to
be adopted (Fayyad et al., 1996a, b; Wirth et al., 1998). Although the existing
computational alternatives have proved to be useful and desirable, few of them are
designed to help humans to perform KDD processes (Engels, 1996; Amant &
Cohen, 1997; Livingston, 2001; Bernstein et al., 2002; Brazdil et al., 2003). In
association with the above-mentioned fact, the demand for KDD applications in
several different areas has increased dramatically in the past few years (Buchanan,
2000). Quite commonly, the number of available practitioners with experience in
KDD is not sufficient to satisfy this growing demand (Piatetsky-Shapiro, 1999).
Within such a context, the creation of intelligent tools that aim to assist humans in
controlling KDD processes proves to be even more opportune (Brachman &
Anand, 1996; Mitchell, 1997).
Such being the case, the objectives of this thesis were to investigate,
propose, develop, and evaluate an Intelligent Machine for KDD-Process
Orientation that is basically intended to serve as a teaching tool to be used in
professional specialization courses in the area of Knowledge Discovery in
Databases.
The basis for formalization of the proposed machine was the Planning
Theory for Problem-Solving (Russell and Norvig, 1995) in Artificial Intelligence.
Its implementation was based on the integration of assistance functions that are
used at different KDD process control levels: Goal Definition, KDD Action-
Planning, KDD Action Plan Execution, and Knowledge Acquisition and
Formalization.
The Goal Definition Assistant aims to assist humans in identifying KDD
tasks that are potentially executable in KDD applications. This assistant was
inspired by the detection of a certain type of similarity between the intensional
levels presented by certain databases. The observation of this fact helps humans to
mine the type of knowledge that must be discovered since data sets with similar
structures tend to arouse similar interests even in distinct KDD applications.
Concepts from the Theory of Attribute Equivalence in Databases (Larson et al.,
1989) make it possible to use a common structure in which any database may be
represented. In this manner, when databases are represented in the new structure,
it is possible to map them into KDD tasks that are compatible with such a
structure. Topological space concepts and ANN resources as described in
Topological Spaces (Lipschutz, 1979) and Artificial Neural Nets (Haykin, 1999)
have been employed so as to allow mapping between heterogeneous patterns.
After the goals have been defined in a KDD application, it is necessary to
decide how such goals are to be achieved. The first step involves selecting the
most appropriate data mining algorithm for the problem at hand. The KDD
Action-Planning Assistant helps humans to make this choice. To this end, it makes
use of a methodology for ordering the mining algorithms that is based on the
previous performance of these algorithms in similar problems (Soares et al., 2001;
Brazdil et al., 2003). Algorithm ordering criteria based on database similarity at
the intensional and extensional levels were proposed, described and evaluated.
The data mining algorithm or algorithms having been selected, the next step
involves selecting the way in which data preprocessing is to be performed. Since
there is a large variety of preprocessing algorithms, many are the alternatives for
combining them (Bernstein et al., 2002). The KDD Action-Planning Assistant also
helps humans to formulate and to select the KDD action plan or plans to be
adopted. To this end, it makes use of concepts contained in the Planning Theory
for Problem-Solving.
Once a KDD action plan has been chosen, it is necessary to execute it.
Executing a KDD action plan involves the ordered execution of the KDD
algorithms that have been anticipated in the plan. Executing a KDD algorithm
requires knowledge about it. The KDD Action Plan Execution Assistant provides
specific guidance on KDD algorithms. In addition, this assistant is equipped with
mechanisms that provide specialized assistance for performing the KDD
algorithm execution process and for analyzing the results obtained. Some of these
mechanisms have been described and evaluated.
The execution of the Knowledge Acquisition and Formalization Assistant
is an operational requirement for running the proposed machine. The objective of
this assistant is to acquire knowledge about KDD and to make such knowledge
available by representing and organizing it a way that makes it possible to process
the above-mentioned assistance functions. A variety of knowledge acquisition
resources and techniques were employed in the conception of this assistant.
|
199 |
Contribution de la découverte de motifs à l’analyse de collections de traces unitaires / Contribution to unitary traces analysis with pattern discoveryCavadenti, Olivier 27 September 2016 (has links)
Dans le contexte manufacturier, un ensemble de produits sont acheminés entre différents sites avant d’être vendus à des clients finaux. Chaque site possède différentes fonctions : création, stockage, mise en vente, etc. Les données de traçabilités décrivent de manière riche (temps, position, type d’action,…) les événements de création, acheminement, décoration, etc. des produits. Cependant, de nombreuses anomalies peuvent survenir, comme le détournement de produits ou la contrefaçon d’articles par exemple. La découverte des contextes dans lesquels surviennent ces anomalies est un objectif central pour les filières industrielles concernées. Dans cette thèse, nous proposons un cadre méthodologique de valorisation des traces unitaires par l’utilisation de méthodes d’extraction de connaissances. Nous montrons comment la fouille de données appliquée à des traces transformées en des structures de données adéquates permet d’extraire des motifs intéressants caractéristiques de comportements fréquents. Nous démontrons que la connaissance a priori, celle des flux de produits prévus par les experts et structurée sous la forme d’un modèle de filière, est utile et efficace pour pouvoir classifier les traces unitaires comme déviantes ou non, et permettre d’extraire les contextes (fenêtre de temps, type de produits, sites suspects,…) dans lesquels surviennent ces comportements anormaux. Nous proposons de plus une méthode originale pour détecter les acteurs de la chaîne logistique (distributeurs par exemple) qui auraient usurpé une identité (faux nom). Pour cela, nous utilisons la matrice de confusion de l’étape de classification des traces de comportement pour analyser les erreurs du classifieur. L’analyse formelle de concepts (AFC) permet ensuite de déterminer si des ensembles de traces appartiennent en réalité au même acteur. / In a manufacturing context, a product is moved through different placements or sites before it reaches the final customer. Each of these sites have different functions, e.g. creation, storage, retailing, etc. In this scenario, traceability data describes in a rich way the events a product undergoes in the whole supply chain (from factory to consumer) by recording temporal and spatial information as well as other important elements of description. Thus, traceability is an important mechanism that allows discovering anomalies in a supply chain, like diversion of computer equipment or counterfeits of luxury items. In this thesis, we propose a methodological framework for mining unitary traces using knowledge discovery methods. We show how the process of data mining applied to unitary traces encoded in specific data structures allows extracting interesting patterns that characterize frequent behaviors. We demonstrate that domain knowledge, that is the flow of products provided by experts and compiled in the industry model, is useful and efficient for classifying unitary traces as deviant or not. Moreover, we show how data mining techniques can be used to provide a characterization for abnormal behaviours (When and how did they occur?). We also propose an original method for detecting identity usurpations in the supply chain based on behavioral data, e.g. distributors using fake identities or concealing them. We highlight how the knowledge discovery in databases, applied to unitary traces encoded in specific data structures (with the help of expert knowledge), allows extracting interesting patterns that characterize frequent behaviors. Finally, we detail the achievements made within this thesis with the development of a platform of traces analysis in the form of a prototype.
|
200 |
Descoberta de regras de conhecimento utilizando computação evolutiva multiobjetivo / Discoveing knowledge rules with multiobjective evolutionary computingGiusti, Rafael 22 June 2010 (has links)
Na área de inteligência artificial existem algoritmos de aprendizado, notavelmente aqueles pertencentes à área de aprendizado de máquina AM , capazes de automatizar a extração do conhecimento implícito de um conjunto de dados. Dentre estes, os algoritmos de AM simbólico são aqueles que extraem um modelo de conhecimento inteligível, isto é, que pode ser facilmente interpretado pelo usuário. A utilização de AM simbólico é comum no contexto de classificação, no qual o modelo de conhecimento extraído é tal que descreve uma correlação entre um conjunto de atributos denominados premissas e um atributo particular denominado classe. Uma característica dos algoritmos de classificação é que, em geral, estes são utilizados visando principalmente a maximização das medidas de cobertura e precisão, focando a construção de um classificador genérico e preciso. Embora essa seja uma boa abordagem para automatizar processos de tomada de decisão, pode deixar a desejar quando o usuário tem o desejo de extrair um modelo de conhecimento que possa ser estudado e que possa ser útil para uma melhor compreensão do domínio. Tendo-se em vista esse cenário, o principal objetivo deste trabalho é pesquisar métodos de computação evolutiva multiobjetivo para a construção de regras de conhecimento individuais com base em critérios definidos pelo usuário. Para isso utiliza-se a biblioteca de classes e ambiente de construção de regras de conhecimento ECLE, cujo desenvolvimento remete a projetos anteriores. Outro objetivo deste trabalho consiste comparar os métodos de computação evolutiva pesquisados com métodos baseado em composição de rankings previamente existentes na ECLE. É mostrado que os métodos de computação evolutiva multiobjetivo apresentam melhores resultados que os métodos baseados em composição de rankings, tanto em termos de dominância e proximidade das soluções construídas com aquelas da fronteira Pareto-ótima quanto em termos de diversidade na fronteira de Pareto. Em otimização multiobjetivo, ambos os critérios são importantes, uma vez que o propósito da otimização multiobjetivo é fornecer não apenas uma, mas uma gama de soluções eficientes para o problema, das quais o usuário pode escolher uma ou mais soluções que apresentem os melhores compromissos entre os objetivos / Machine Learning algorithms are notable examples of Artificial Intelligence algorithms capable of automating the extraction of implicit knowledge from datasets. In particular, Symbolic Learning algorithms are those which yield an intelligible knowledge model, i.e., one which a user may easily read. The usage of Symbolic Learning is particularly common within the context of classification, which involves the extraction of knowledge such that the associated model describes correelation among a set of attributes named the premises and one specific attribute named the class. Classification algorithms usually target into creating knowledge models which maximize the measures of coverage and precision, leading to classifiers that tend to be generic and precise. Althought this constitutes a good approach to creating models that automate the decision making process, it may not yield equally good results when the user wishes to extract a knowledge model which could assist them into getting a better understanding of the domain. Having that in mind, it has been established as the main goal of this Masters thesis the research of multi-objective evolutionary computing methods to create individual knowledge rules maximizing sets of arbitrary user-defined criteria. This is achieved by employing the class library and knowledge rule construction environment ECLE, which had been developed during previous research work. A second goal of this Masters thesis is the comparison of the researched evolutionary computing methods against previously existing ranking composition methods in ECLE. It is shown in this Masters thesis that the employment of multi-objective evolutionary computing methods produces better results than those produced by the employment of ranking composition-based methods. This improvement is verified both in terms of solution dominance and proximity of the solution set to the Pareto-optimal front and in terms of Pareto-front diversity. Both criteria are important for evaluating the efficiency of multi-objective optimization algorithms, for the goal of multi-objective optimization is to provide a broad range of efficient solutions, so the user may pick one or more solutions which present the best trade-off among all objectives
|
Page generated in 0.0654 seconds