Spelling suggestions: "subject:"csrknowledge discovery inn databases"" "subject:"csrknowledge discovery inn atabases""
51 |
Indexing and Search Algorithmsfor Web shops : / Indexering och sök algoritmer för webshoppar :Reimers, Axel, Gustafsson, Isak January 2016 (has links)
Web shops today needs to be more and more responsive, where one part of this responsivenessis fast product searches. One way of getting faster searches are by searching against anindex instead of directly against a database. Network Expertise Sweden AB (Net Exp) wants to explore different methods of implementingan index in their future web shop, building upon the open-source web shop platformSmartStore.NET. Since SmartStore.NET does all of its searches directly against itsdatabase, it will not scale well and will wear more on the database. The aim was thereforeto find different solutions to offload the database by using an index instead. A prototype that retrieved products from a database and made them searchable through anindex was developed, evaluated and implemented. The prototype indexed the data with aninverted index algorithm, and was made searchable with a search algorithm that mixed typeboolean queries with normal queries. / Webbutiker idag behöver vara mer och mer responsiva, en del av denna responsivitet ärsnabb produkt sökningar. Ett sätt att skaffa snabbare sökningar är genom att söka mot ettindex istället för att söka direkt mot en databas. Network Expertise Sweden AB vill utforska olika metoder för att implementera ett index ideras framtida webbutik, byggt ovanpå SmartStore.NET som är öppen käll-kod. Då Smart-Store.NET gör alla av sina sökningar direkt mot sin databas, kommer den inte att skala braoch kommer slita mer på databasen. Målsättningen var därför att hitta olika lösningar somavlastar databasen genom att använda ett index istället. En prototyp som hämtade produkter från en databas och gjorde dom sökbara genom ettindex var utvecklad, utvärderad och implementerad. Prototypen indexerade datan med eninverterad indexerings algoritm, och gjordes sökbara med en sök algoritm som blandar booleskafrågor med normala frågor. / <p></p><p></p><p></p>
|
52 |
Definition of a human-machine learning process from timed observations : application to the modelling of human behaviourfor the detection of abnormal behaviour of old people at home / Définition d'un processus d'apprentissage par l'homme et la machine à partir d'observations datées : application à la modélisation du comportement humain pour la détection des comportements anormaux de personnes âgées maintenues dans leur domicilePomponio, Laura 26 June 2012 (has links)
L'acquisition et la modélisation de connaissances ont été abordés jusqu'à présent selon deux approches principales : les êtres humains (experts) à l'aide des méthodologies de l'Ingénierie des Connaissances et le Knowledge Management, et les données à l'aide des techniques relevant de la découverte de connaissances à partir du contenu de bases de données (fouille de données). Cette thèse porte sur la conception d'un processus d'apprentissage conjoint par l'être humain et la machine combinant une approche de modélisation des connaissances de type Ingénierie des Connaissances (TOM4D, Timed Observation Modelling for Diagnosis) et une approche d'apprentissage automatique fondée sur un processus de découverte de connaissances à partir de données datées (TOM4L, Timed Observation Mining for Learning). Ces deux approches étant fondées sur la Théorie des Observations Datées, les modèles produits sont représentés dans le même formalisme ce qui permet leur comparaison et leur combinaison. Le mémoire propose également une méthode d'abstraction, inspiée des travaux de Newell sur le "Knowledge Level'' et fondée sur le paradigme d'observation datée, qui a pour but de traiter le problème de la différence de niveau d'abstraction inhérent entre le discours d'un expert et les données mesurées sur un système par un processus d'abstractions successives. Les travaux présentés dans ce mémoire ayant été menés en collaboration avec le CSTB de Sophia Antipolis (Centre Scientifique et Technique du Bâtiment), ils sont appliqués à la modélisation de l'activité humaine dans le cadre de l'aide aux personnes âgées maintenues à domicile. / Knowledge acquisition has been traditionally approached from a primarily people-driven perspective, through Knowledge Engineering and Management, or from a primarily data-driven approach, through Knowledge Discovery in Databases, rather than from an integral standpoint. This thesis proposes then a human-machine learning approach that combines a Knowledge Engineering modelling approach called TOM4D (Timed Observation Modelling For Diagnosis) with a process of Knowledge Discovery in Databases based on an automatic data mining technique called TOM4L (Timed Observation Mining For Learning). The combination and comparison between models obtained through TOM4D and those ones obtained through TOM4L is possible, owing to that TOM4D and TOM4L are based on the Theory of Timed Observations and share the same representation formalism. Consequently, a learning process nourished with experts' knowledge and knowledge discovered in data is defined in the present work. In addition, this dissertation puts forward a theoretical framework of abstraction levels, in line with the mentioned theory and inspired by the Newell's Knowledge Level work, in order to reduce the broad gap of semantic content that exists between data, relative to an observed process, in a database and what can be inferred in a higher level; that is, in the experts' discursive level. Thus, the human-machine learning approach along with the notion of abstraction levels are then applied to the modelling of human behaviour in smart environments. In particular, the modelling of elderly people's behaviour at home in the GerHome Project of the CSTB (Centre Scientifique et Technique du Bâtiment) of Sophia Antipolis, France.
|
53 |
Etude comportementale des mesures d'intérêt d'extraction de connaissances / Behavioral study of interestingness measures of knowledge extractionGrissa, Dhouha 02 December 2013 (has links)
La recherche de règles d’association intéressantes est un domaine important et actif en fouille de données. Puisque les algorithmes utilisés en extraction de connaissances à partir de données (ECD), ont tendance à générer un nombre important de règles, il est difficile à l’utilisateur de sélectionner par lui même les connaissances réellement intéressantes. Pour répondre à ce problème, un post-filtrage automatique des règles s’avère essentiel pour réduire fortement leur nombre. D’où la proposition de nombreuses mesures d’intérêt dans la littérature, parmi lesquelles l’utilisateur est supposé choisir celle qui est la plus appropriée à ses objectifs. Comme l’intérêt dépend à la fois des préférences de l’utilisateur et des données, les mesures ont été répertoriées en deux catégories : les mesures subjectives (orientées utilisateur ) et les mesures objectives (orientées données). Nous nous focalisons sur l’étude des mesures objectives. Néanmoins, il existe une pléthore de mesures objectives dans la littérature, ce qui ne facilite pas le ou les choix de l’utilisateur. Ainsi, notre objectif est d’aider l’utilisateur, dans sa problématique de sélection de mesures objectives, par une approche par catégorisation. La thèse développe deux approches pour assister l’utilisateur dans sa problématique de choix de mesures objectives : (1) étude formelle suite à la définition d’un ensemble de propriétés de mesures qui conduisent à une bonne évaluation de celles-ci ; (2) étude expérimentale du comportement des différentes mesures d’intérêt à partir du point de vue d’analyse de données. Pour ce qui concerne la première approche, nous réalisons une étude théorique approfondie d’un grand nombre de mesures selon plusieurs propriétés formelles. Pour ce faire, nous proposons tout d’abord une formalisation de ces propriétés afin de lever toute ambiguïté sur celles-ci. Ensuite, nous étudions, pour différentes mesures d’intérêt objectives, la présence ou l’absence de propriétés caractéristiques appropriées. L’évaluation des mesures est alors un point de départ pour une catégorisation de celle-ci. Différentes méthodes de classification ont été appliquées : (i) méthodes sans recouvrement (CAH et k-moyennes) qui permettent l’obtention de groupes de mesures disjoints, (ii) méthode avec recouvrement (analyse factorielle booléenne) qui permet d’obtenir des groupes de mesures qui se chevauchent. Pour ce qui concerne la seconde approche, nous proposons une étude empirique du comportement d’une soixantaine de mesures sur des jeux de données de nature différente. Ainsi, nous proposons une méthodologie expérimentale, où nous cherchons à identifier les groupes de mesures qui possèdent, empiriquement, un comportement semblable. Nous effectuons par la suite une confrontation avec les deux résultats de classification, formel et empirique dans le but de valider et mettre en valeur notre première approche. Les deux approches sont complémentaires, dans l’optique d’aider l’utilisateur à effectuer le bon choix de la mesure d’intérêt adaptée à son application. / The search for interesting association rules is an important and active field in data mining. Since knowledge discovery from databases used algorithms (KDD) tend to generate a large number of rules, it is difficult for the user to select by himself the really interesting knowledge. To address this problem, an automatic post-filtering rules is essential to significantly reduce their number. Hence, many interestingness measures have been proposed in the literature in order to filter and/or sort discovered rules. As interestingness depends on both user preferences and data, interestingness measures were classified into two categories : subjective measures (user-driven) and objective measures (data-driven). We focus on the study of objective measures. Nevertheless, there are a plethora of objective measures in the literature, which increase the user’s difficulty for choosing the appropriate measure. Thus, our goal is to avoid such difficulty by proposing groups of similar measures by means of categorization approaches. The thesis presents two approaches to assist the user in his problematic of objective measures choice : (1) formal study as per the definition of a set of measures properties that lead to a good measure evaluation ; (2) experimental study of the behavior of various interestingness measures from data analysispoint of view. Regarding the first approach, we perform a thorough theoretical study of a large number of measures in several formal properties. To do this, we offer first of all a formalization of these properties in order to remove any ambiguity about them. We then study for various objective interestingness measures, the presence or absence of appropriate characteristic properties. Interestingness measures evaluation is therefore a starting point for measures categorization. Different clustering methods have been applied : (i) non overlapping methods (CAH and k-means) which allow to obtain disjoint groups of measures, (ii) overlapping method (Boolean factor analysis) that provides overlapping groups of measures. Regarding the second approach, we propose an empirical study of the behavior of about sixty measures on datasets with different nature. Thus, we propose an experimental methodology, from which we seek to identify groups of measures that have empirically similar behavior. We do next confrontation with the two classification results, formal and empirical in order to validate and enhance our first approach. Both approaches are complementary, in order to help the user making the right choice of the appropriate interestingness measure to his application.
|
54 |
An analysis of semantic data quality defiencies in a national data warehouse: a data mining approachBarth, Kirstin 07 1900 (has links)
This research determines whether data quality mining can be used to describe, monitor and evaluate the scope and impact of semantic data quality problems in the learner enrolment data on the National Learners’ Records Database. Previous data quality mining work has focused on anomaly detection and has assumed that the data quality aspect being measured exists as a data value in the data set being mined. The method for this research is quantitative in that the data mining techniques and model that are best suited for semantic data quality deficiencies are identified and then applied to the data. The research determines that unsupervised data mining techniques that allow for weighted analysis of the data would be most suitable for the data mining of semantic data deficiencies. Further, the academic Knowledge Discovery in Databases model needs to be amended when applied to data mining semantic data quality deficiencies. / School of Computing / M. Tech. (Information Technology)
|
55 |
[en] INTELLIGENT ASSISTANCE FOR KDD-PROCESS ORIENTATION / [pt] ASSISTÊNCIA INTELIGENTE À ORIENTAÇÃO DO PROCESSO DE DESCOBERTA DE CONHECIMENTO EM BASES DE DADOSRONALDO RIBEIRO GOLDSCHMIDT 15 December 2003 (has links)
[pt] A notória complexidade inerente ao processo de KDD -
Descoberta de Conhecimento em Bases de Dados - decorre
essencialmente de aspectos relacionados ao controle e à
condução deste processo (Fayyad et al., 1996b; Hellerstein
et al., 1999). De uma maneira geral, estes aspectos envolvem
dificuldades em perceber inúmeros fatos cuja origem e os
níveis de detalhe são os mais diversos e difusos, em
interpretar adequadamente estes fatos, em conjugar
dinamicamente tais interpretações e em decidir que ações
devem ser realizadas de forma a procurar obter bons
resultados. Como identificar precisamente os objetivos do
processo, como escolher dentre os inúmeros algoritmos de
mineração e de pré-processamento de dados existentes e,
sobretudo, como utilizar adequadamente os algoritmos
escolhidos em cada situação são alguns exemplos
das complexas e recorrentes questões na condução de
processos de KDD. Cabe ao analista humano a árdua tarefa de
orientar a execução de processos de KDD. Para tanto, diante
de cada cenário, o homem utiliza sua experiência anterior,
seus conhecimentos e sua intuição para interpretar e
combinar os fatos de forma a decidir qual a estratégia a
ser adotada (Fayyad et al., 1996a, b; Wirth et al., 1998).
Embora reconhecidamente úteis e desejáveis, são poucas as
alternativas computacionais existentes voltadas a auxiliar
o homem na condução do processo de KDD (Engels, 1996; Amant
e Cohen, 1997; Livingston, 2001; Bernstein et al., 2002;
Brazdil et al., 2003). Aliado ao exposto acima, a demanda
por aplicações de KDD em diversas áreas vem crescendo de
forma muito acentuada nos últimos anos (Buchanan, 2000). É
muito comum não existirem profissionais com experiência em
KDD disponíveis para atender a esta crescente demanda
(Piatetsky-Shapiro, 1999). Neste contexto, a criação de
ferramentas inteligentes que auxiliem o homem no controle
do processo de KDD se mostra ainda mais oportuna (Brachman
e Anand, 1996; Mitchell, 1997). Assim sendo, esta tese teve
como objetivos pesquisar, propor, desenvolver e avaliar uma
Máquina de Assistência Inteligente à Orientação do Processo
de KDD que possa ser utilizada, fundamentalmente, como
instrumento didático voltado à formação de profissionais
especializados na área da Descoberta de Conhecimento em
Bases de Dados. A máquina proposta foi formalizada com base
na Teoria do Planejamento para Resolução de Problemas
(Russell e Norvig, 1995) da Inteligência Artificial
e implementada a partir da integração de funções de
assistência utilizadas em diferentes níveis de controle do
processo de KDD: Definição de Objetivos, Planejamento de
Ações de KDD, Execução dos Planos de Ações de KDD e
Aquisição e Formalização do Conhecimento. A Assistência à
Definição de Objetivos tem como meta auxiliar o homem
na identificação de tarefas de KDD cuja execução seja
potencialmente viável em aplicações de KDD. Esta
assistência foi inspirada na percepção de um certo tipo
de semelhança no nível intensional apresentado entre
determinados bancos de dados. Tal percepção auxilia na
prospecção do tipo de conhecimento a ser procurado, uma vez
que conjuntos de dados com estruturas similares tendem a
despertar interesses similares mesmo em aplicações de KDD
distintas. Conceitos da Teoria da Equivalência entre
Atributos de Bancos de Dados (Larson et al., 1989)
viabilizam a utilização de uma estrutura comum na qual
qualquer base de dados pode ser representada. Desta forma,
bases de dados, ao serem representadas na nova estrutura,
podem ser mapeadas em tarefas de KDD, compatíveis com tal
estrutura. Conceitos de Espaços Topológicos (Lipschutz,
1979) e recursos de Redes Neurais Artificiais (Haykin,
1999) são utilizados para viabilizar os mapeamentos entre
padrões heterogêneos. Uma vez definidos os objetivos em uma
aplicação de KDD, decisões sobre como tais objetivos podem
ser alcançados se tornam necessárias. O primeiro
passo envolve a escolha de qual algoritmo de mineração de dados é o mais
apropriado para o problema em questão. A Assistência ao Planejamento de Ações
de KDD auxilia o homem nesta escolha. Utiliza, para tanto, uma metodologia de
ordenação dos algoritmos de mineração baseada no desempenho prévio destes
algoritmos em problemas similares (Soares et al., 2001; Brazdil et al., 2003).
Critérios de ordenação de algoritmos baseados em similaridade entre bases de
dados nos níveis intensional e extensional foram propostos, descritos e avaliados.
A partir da escolha de um ou mais algoritmos de mineração de dados, o passo
seguinte requer a escolha de como deverá ser realizado o pré-processamento dos
dados. Devido à diversidade de algoritmos de pré-processamento, são muitas as
alternativas de combinação entre eles (Bernstein et al., 2002). A Assistência ao
Planejamento de Ações de KDD também auxilia o homem na formulação e na
escolha do plano ou dos planos de ações de KDD a serem adotados. Utiliza, para
tanto, conceitos da Teoria do Planejamento para Resolução de Problemas.
Uma vez escolhido um plano de ações de KDD, surge a necessidade de
executá-lo. A execução de um plano de ações de KDD compreende a execução, de
forma ordenada, dos algoritmos de KDD previstos no plano. A execução de um
algoritmo de KDD requer conhecimento sobre ele. A Assistência à Execução dos
Planos de Ações de KDD provê orientações específicas sobre algoritmos de KDD.
Adicionalmente, esta assistência dispõe de mecanismos que auxiliam, de forma
especializada, no processo de execução de algoritmos de KDD e na análise dos
resultados obtidos. Alguns destes mecanismos foram descritos e avaliados.
A execução da Assistência à Aquisição e Formalização do Conhecimento
constitui-se em um requisito operacional ao funcionamento da máquina proposta.
Tal assistência tem por objetivo adquirir e disponibilizar os conhecimentos sobre
KDD em uma representação e uma organização que viabilizem o processamento
das funções de assistência mencionadas anteriormente. Diversos recursos e
técnicas de aquisição de conhecimento foram utilizados na concepção desta
assistência. / [en] Generally speaking, such aspects involve difficulties in
perceiving innumerable facts whose origin and levels of
detail are highly diverse and diffused, in adequately
interpreting these facts, in dynamically conjugating such
interpretations, and in deciding which actions must be
performed in order to obtain good results. How are the
objectives of the process to be identified in a precise
manner? How is one among the countless existing data mining
and preprocessing algorithms to be selected? And most
importantly, how can the selected algorithms be put to
suitable use in each different situation? These are but
a few examples of the complex and recurrent questions that
are posed when KDD processes are performed. Human analysts
must cope with the arduous task of orienting the execution
of KDD processes. To this end, in face of each different
scenario, humans resort to their previous experiences,
their knowledge, and their intuition in order to interpret
and combine the facts and therefore be able to decide on
the strategy to be adopted (Fayyad et al., 1996a, b; Wirth
et al., 1998). Although the existing computational
alternatives have proved to be useful and desirable, few of
them are designed to help humans to perform KDD processes
(Engels, 1996; Amant and Cohen, 1997; Livingston, 2001;
Bernstein et al., 2002; Brazdil et al., 2003). In
association with the above-mentioned fact, the demand for
KDD applications in several different areas has increased
dramatically in the past few years (Buchanan, 2000). Quite
commonly, the number of available practitioners with
experience in KDD is not sufficient to satisfy this growing
demand (Piatetsky-Shapiro, 1999). Within such a context,
the creation of intelligent tools that aim to assist humans
in controlling KDD processes proves to be even more
opportune (Brachman and Anand, 1996; Mitchell, 1997).
Such being the case, the objectives of this thesis were to
investigate, propose, develop, and evaluate an Intelligent
Machine for KDD-Process Orientation that is basically
intended to serve as a teaching tool to be used in
professional specialization courses in the area of
Knowledge Discovery in Databases. The basis for
formalization of the proposed machine was the Planning
Theory for Problem-Solving (Russell and Norvig, 1995) in
Artificial Intelligence. Its implementation was based on
the integration of assistance functions that are used at
different KDD process control levels: Goal Definition, KDD
Action-Planning, KDD Action Plan Execution, and Knowledge
Acquisition and Formalization. The Goal Definition
Assistant aims to assist humans in identifying KDD
tasks that are potentially executable in KDD applications.
This assistant was inspired by the detection of a certain
type of similarity between the intensional levels presented
by certain databases. The observation of this fact helps
humans to mine the type of knowledge that must be
discovered since data sets with similar structures tend to
arouse similar interests even in distinct KDD applications.
Concepts from the Theory of Attribute Equivalence in
Databases (Larson et al., 1989) make it possible to use a
common structure in which any database may be represented.
In this manner, when databases are represented in the new
structure, it is possible to map them into KDD tasks that
are compatible with such a structure. Topological space
concepts and ANN resources as described in Topological
Spaces (Lipschutz, 1979) and Artificial Neural Nets
(Haykin, 1999) have been employed so as to allow mapping
between heterogeneous patterns. After the goals have been
defined in a KDD application, it is necessary to decide how
such goals are to be achieved. The first step involves
selecting the most appropriate data mining algorithm for
the problem at hand. The KDD Action-Planning Assistant
helps humans to make this choice. To this end, it makes
use of a methodology for ordering the mining algorithms
that is based on the previous experiences, their knowledge, and their intuition in order to
interpret and combine the facts and therefore be able to decide on the strategy to
be adopted (Fayyad et al., 1996a, b; Wirth et al., 1998). Although the existing
computational alternatives have proved to be useful and desirable, few of them are
designed to help humans to perform KDD processes (Engels, 1996; Amant &
Cohen, 1997; Livingston, 2001; Bernstein et al., 2002; Brazdil et al., 2003). In
association with the above-mentioned fact, the demand for KDD applications in
several different areas has increased dramatically in the past few years (Buchanan,
2000). Quite commonly, the number of available practitioners with experience in
KDD is not sufficient to satisfy this growing demand (Piatetsky-Shapiro, 1999).
Within such a context, the creation of intelligent tools that aim to assist humans in
controlling KDD processes proves to be even more opportune (Brachman &
Anand, 1996; Mitchell, 1997).
Such being the case, the objectives of this thesis were to investigate,
propose, develop, and evaluate an Intelligent Machine for KDD-Process
Orientation that is basically intended to serve as a teaching tool to be used in
professional specialization courses in the area of Knowledge Discovery in
Databases.
The basis for formalization of the proposed machine was the Planning
Theory for Problem-Solving (Russell and Norvig, 1995) in Artificial Intelligence.
Its implementation was based on the integration of assistance functions that are
used at different KDD process control levels: Goal Definition, KDD Action-
Planning, KDD Action Plan Execution, and Knowledge Acquisition and
Formalization.
The Goal Definition Assistant aims to assist humans in identifying KDD
tasks that are potentially executable in KDD applications. This assistant was
inspired by the detection of a certain type of similarity between the intensional
levels presented by certain databases. The observation of this fact helps humans to
mine the type of knowledge that must be discovered since data sets with similar
structures tend to arouse similar interests even in distinct KDD applications.
Concepts from the Theory of Attribute Equivalence in Databases (Larson et al.,
1989) make it possible to use a common structure in which any database may be
represented. In this manner, when databases are represented in the new structure,
it is possible to map them into KDD tasks that are compatible with such a
structure. Topological space concepts and ANN resources as described in
Topological Spaces (Lipschutz, 1979) and Artificial Neural Nets (Haykin, 1999)
have been employed so as to allow mapping between heterogeneous patterns.
After the goals have been defined in a KDD application, it is necessary to
decide how such goals are to be achieved. The first step involves selecting the
most appropriate data mining algorithm for the problem at hand. The KDD
Action-Planning Assistant helps humans to make this choice. To this end, it makes
use of a methodology for ordering the mining algorithms that is based on the
previous performance of these algorithms in similar problems (Soares et al., 2001;
Brazdil et al., 2003). Algorithm ordering criteria based on database similarity at
the intensional and extensional levels were proposed, described and evaluated.
The data mining algorithm or algorithms having been selected, the next step
involves selecting the way in which data preprocessing is to be performed. Since
there is a large variety of preprocessing algorithms, many are the alternatives for
combining them (Bernstein et al., 2002). The KDD Action-Planning Assistant also
helps humans to formulate and to select the KDD action plan or plans to be
adopted. To this end, it makes use of concepts contained in the Planning Theory
for Problem-Solving.
Once a KDD action plan has been chosen, it is necessary to execute it.
Executing a KDD action plan involves the ordered execution of the KDD
algorithms that have been anticipated in the plan. Executing a KDD algorithm
requires knowledge about it. The KDD Action Plan Execution Assistant provides
specific guidance on KDD algorithms. In addition, this assistant is equipped with
mechanisms that provide specialized assistance for performing the KDD
algorithm execution process and for analyzing the results obtained. Some of these
mechanisms have been described and evaluated.
The execution of the Knowledge Acquisition and Formalization Assistant
is an operational requirement for running the proposed machine. The objective of
this assistant is to acquire knowledge about KDD and to make such knowledge
available by representing and organizing it a way that makes it possible to process
the above-mentioned assistance functions. A variety of knowledge acquisition
resources and techniques were employed in the conception of this assistant.
|
56 |
Descoberta de regras de conhecimento utilizando computação evolutiva multiobjetivo / Discoveing knowledge rules with multiobjective evolutionary computingGiusti, Rafael 22 June 2010 (has links)
Na área de inteligência artificial existem algoritmos de aprendizado, notavelmente aqueles pertencentes à área de aprendizado de máquina AM , capazes de automatizar a extração do conhecimento implícito de um conjunto de dados. Dentre estes, os algoritmos de AM simbólico são aqueles que extraem um modelo de conhecimento inteligível, isto é, que pode ser facilmente interpretado pelo usuário. A utilização de AM simbólico é comum no contexto de classificação, no qual o modelo de conhecimento extraído é tal que descreve uma correlação entre um conjunto de atributos denominados premissas e um atributo particular denominado classe. Uma característica dos algoritmos de classificação é que, em geral, estes são utilizados visando principalmente a maximização das medidas de cobertura e precisão, focando a construção de um classificador genérico e preciso. Embora essa seja uma boa abordagem para automatizar processos de tomada de decisão, pode deixar a desejar quando o usuário tem o desejo de extrair um modelo de conhecimento que possa ser estudado e que possa ser útil para uma melhor compreensão do domínio. Tendo-se em vista esse cenário, o principal objetivo deste trabalho é pesquisar métodos de computação evolutiva multiobjetivo para a construção de regras de conhecimento individuais com base em critérios definidos pelo usuário. Para isso utiliza-se a biblioteca de classes e ambiente de construção de regras de conhecimento ECLE, cujo desenvolvimento remete a projetos anteriores. Outro objetivo deste trabalho consiste comparar os métodos de computação evolutiva pesquisados com métodos baseado em composição de rankings previamente existentes na ECLE. É mostrado que os métodos de computação evolutiva multiobjetivo apresentam melhores resultados que os métodos baseados em composição de rankings, tanto em termos de dominância e proximidade das soluções construídas com aquelas da fronteira Pareto-ótima quanto em termos de diversidade na fronteira de Pareto. Em otimização multiobjetivo, ambos os critérios são importantes, uma vez que o propósito da otimização multiobjetivo é fornecer não apenas uma, mas uma gama de soluções eficientes para o problema, das quais o usuário pode escolher uma ou mais soluções que apresentem os melhores compromissos entre os objetivos / Machine Learning algorithms are notable examples of Artificial Intelligence algorithms capable of automating the extraction of implicit knowledge from datasets. In particular, Symbolic Learning algorithms are those which yield an intelligible knowledge model, i.e., one which a user may easily read. The usage of Symbolic Learning is particularly common within the context of classification, which involves the extraction of knowledge such that the associated model describes correelation among a set of attributes named the premises and one specific attribute named the class. Classification algorithms usually target into creating knowledge models which maximize the measures of coverage and precision, leading to classifiers that tend to be generic and precise. Althought this constitutes a good approach to creating models that automate the decision making process, it may not yield equally good results when the user wishes to extract a knowledge model which could assist them into getting a better understanding of the domain. Having that in mind, it has been established as the main goal of this Masters thesis the research of multi-objective evolutionary computing methods to create individual knowledge rules maximizing sets of arbitrary user-defined criteria. This is achieved by employing the class library and knowledge rule construction environment ECLE, which had been developed during previous research work. A second goal of this Masters thesis is the comparison of the researched evolutionary computing methods against previously existing ranking composition methods in ECLE. It is shown in this Masters thesis that the employment of multi-objective evolutionary computing methods produces better results than those produced by the employment of ranking composition-based methods. This improvement is verified both in terms of solution dominance and proximity of the solution set to the Pareto-optimal front and in terms of Pareto-front diversity. Both criteria are important for evaluating the efficiency of multi-objective optimization algorithms, for the goal of multi-objective optimization is to provide a broad range of efficient solutions, so the user may pick one or more solutions which present the best trade-off among all objectives
|
57 |
Zpracování asociačních pravidel metodou vícekriteriálního shlukování / Post-processing of association rules by multicriterial clustering methodKejkula, Martin January 2002 (has links)
Association rules mining is one of several ways of knowledge discovery in databases. Paradoxically, data mining itself can produce such great amounts of association rules that there is a new knowledge management problem: there can easily be thousands or even more association rules holding in a data set. The goal of this work is to design a new method for association rules post-processing. The method should be software and domain independent. The output of the new method should be structured description of the whole set of discovered association rules. The output should help user to work with discovered rules. The path to reach the goal I used is: to split association rules into clusters. Each cluster should contain rules, which are more similar each other than to rules from another cluster. The output of the method is such cluster definition and description. The main contribution of this Ph.D. thesis is the described new Multicriterial clustering association rules method. Secondary contribution is the discussion of already published association rules post-processing methods. The output of the introduced new method are clusters of rules, which cannot be reached by any of former post-processing methods. According user expectations clusters are more relevant and more effective than any former association rules clustering results. The method is based on two orthogonal clustering of the same set of association rules. One clustering is based on interestingness measures (confidence, support, interest, etc.). Second clustering is inspired by document clustering in information retrieval. The representation of rules in vectors like documents is fontal in this thesis. The thesis is organized as follows. Chapter 2 identify the role of association rules in the KDD (knowledge discovery in databases) process, using KDD methodologies (CRISP-DM, SEMMA, GUHA, RAMSYS). Chapter 3 define association rule and introduce characteristics of association rules (including interestingness measuress). Chapter 4 introduce current association rules post-processing methods. Chapter 5 is the introduction to cluster analysis. Chapter 6 is the description of the new Multicriterial clustering association rules method. Chapter 7 consists of several experiments. Chapter 8 discuss possibilities of usage and development of the new method.
|
58 |
[en] HYBRID INTELLIGENT SYSTEM FOR CLASSIFICATION OF NON-RESIDENTIAL ELECTRICITY CUSTOMERS PAYMENT PROFILES / [pt] SISTEMA INTELIGENTE HÍBRIDO PARA CLASSIFICAÇÃO DO PERFIL DE PAGAMENTO DOS CONSUMIDORES NÃO-RESIDENCIAIS DE ENERGIA ELÉTRICANORMA ALICE DA SILVA CARVALHO 26 March 2018 (has links)
[pt] O objetivo desta pesquisa é classificar o perfil de pagamento dos consumidores não-residenciais de energia elétrica, considerando conhecimento armazenado em base de dados de distribuidoras de energia elétrica. A motivação para desenvolvê-la surgiu da necessidade das distribuidoras por um modelo de suporte a formulação de estratégias capazes de reduzir o grau inadimplência. A metodologia proposta consiste em um sistema inteligente híbrido composto por módulos intercomunicativos que usam conhecimentos armazenados em base de dados para segmentar consumidores e, então, atingir o objetivo proposto. O sistema inicia-se com o módulo neural, que aloca as unidades consumidoras em grupos conforme similaridades (valor fatura, consumo, demanda medida/demanda contratada, intensidade energética e peso da conta no orçamento), em sequência, o módulo bayesiano, estabelece um escore entre 0 e 1 que permite predizer o perfil de pagamento das unidades considerando os grupos gerados e os atributos categóricos (atividade econômica, estrutura tarifária, mesorregião, natureza jurídica e porte empresarial) que caracterizam essas unidades. Os resultados revelaram que o sistema proposto estabelece razoável taxa de acerto na classificação do perfil de consumidores e, portanto, constitui uma importante ferramenta de suporte a formulação de estratégias para combate à inadimplência. Conclui-se que, o sistema híbrido proposto apresenta caráter generalista podendo ser adaptado e implementado em outros mercados. / [en] The objective of this research is to classify the non-residential electricity customer payment profiles regarding the knowledge stored in electricity distribution utilities databases. The motivation for development of the work from the need of electricity distribution by a support model to formulate strategies for tackling non-payment and late payment. The proposed methodology consists of
a hybrid intelligent system constituted by intercommunicating modules that use knowledge stored in database to customer segmentation and then achieve the proposed objective. The system begins with the neural module, which allocates the consuming units in groups according to similarities (bill amount, consumption, measured demand/contracted demand, energy intensity and share of the electricity
bill in the customer s income), in sequence, the Bayesian module establishes a score between 0 and 1 that allows to predict what payment profile of the units considering the generated groups and categorical attributes (business activity, tariff type, business size, mesoregion and company s legal form) that characterize these units. The results showed that the proposed system provides a reasonable
success rate when classifying customer profiles and thus constitutes an important tool in the formulation of strategies for tackling non-payment and late payment. In conclusion, the hybrid system proposed here is a generalist one and could usefully be adapted and implemented in other markets.
|
59 |
Modelação e análise da vida útil (metrológica) de medidores tipo indução de energia elétrica ativaSilva, Marcelo Rubia da [UNESP] 27 August 2010 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:22:31Z (GMT). No. of bitstreams: 0
Previous issue date: 2010-08-27Bitstream added on 2014-06-13T18:49:27Z : No. of bitstreams: 1
silva_mr_me_ilha.pdf: 2058535 bytes, checksum: 046bcb6196cc4909e675190cc0e21275 (MD5) / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / O estudo da confiabilidade operacional de equipamentos se tornou fundamental para as empresas possuírem o devido controle dos seus ativos, tanto pelo lado financeiro quanto em questões de segurança. O estudo da taxa de falha de equipamentos prevê quando as falhas irão ocorrer possibilitando estabelecer atitudes preventivas, porém, seu estudo deve ser realizado em condições de operação estabelecidas e fixas. Os medidores de energia elétrica, parte do ativo financeiro das concessionárias de energia, são equipamentos utilizados em diversas condições de operação, tanto nas condições do fluxo de energia, tais como presenças de harmônicos, subtensões, sobre-tensões e padrões de consumo distintos, quanto pelo local físico de instalação, tais como maresia, temperatura, umidade, etc. As falhas nos medidores eletromecânicos de energia elétrica são de difícil constatação uma vez que a maioria dos erros de medição, ocasionados principalmente por envelhecimento de componentes, não alteram a qualidade da energia fornecida e nem interrompem o seu fornecimento. Neste sentido, este trabalho propõe uma nova metodologia de determinação de falhas em medidores eletromecânicos de energia elétrica ativa. Faz-se uso de banco de dados de uma concessionária de energia elétrica e do processo de descoberta de conhecimento em bases de dados para selecionar as variáveis mais significativas na determinação de falhas em medidores eletromecânicos de energia elétrica ativa, incluindo no conjunto de falhas a operação com erros de medição acima do permitido pela legislação nacional (2010). Duas técnicas de mineração de dados foram utilizadas: regressão stepwise e árvores de decisão. As variáveis obtidas foram utilizadas na construção de um modelo de agrupamento de equipamentos associando a cada grupo uma probabilidade... / The operational reliability study of equipments has become primal in order to enterprises have the righteous control over their assets, both by financial side as by security reasons. The study for the hazard rate of equipments allows to foresee the failures for the equipments and to act preventively, but this study must be accomplished under established and fixed operation conditions. The energy meters, for their part, are equipments utilized in several operating conditions so on the utilization manner, like presence of harmonics, undervoltages and over-voltages and distinct consumption patterns, as on the installation location, like swel, temperature, humidity, etc. Failures in electromechanical Wh-meters are difficult to detect once that the majority of metering errors occurred mainly by aging of components do not change the quality of offered energy neither disrupt its supply. In this context, this work proposes a novel methodology to obtain failure determination for electromechanical Whmeters. It utilizes Wh-databases from an electrical company and of the process of knowledge discovery in databases to specify the most significant variables in determining failures in electromechanical Wh-meters, including in the failure set the operation with metering errors above those permitted by national regulations (2010). Two techniques of data mining were used in this work: stepwise regression and decision trees. The obtained variables were utilized on the construction of a model of clustering similar equipments and the probability of failure of those clusters were determined. As final results, an application in a friendly platform were developed in order to apply the methodology, and a case study was accomplished in order to demonstrate its feasibility.
|
60 |
Descoberta de regras de conhecimento utilizando computação evolutiva multiobjetivo / Discoveing knowledge rules with multiobjective evolutionary computingRafael Giusti 22 June 2010 (has links)
Na área de inteligência artificial existem algoritmos de aprendizado, notavelmente aqueles pertencentes à área de aprendizado de máquina AM , capazes de automatizar a extração do conhecimento implícito de um conjunto de dados. Dentre estes, os algoritmos de AM simbólico são aqueles que extraem um modelo de conhecimento inteligível, isto é, que pode ser facilmente interpretado pelo usuário. A utilização de AM simbólico é comum no contexto de classificação, no qual o modelo de conhecimento extraído é tal que descreve uma correlação entre um conjunto de atributos denominados premissas e um atributo particular denominado classe. Uma característica dos algoritmos de classificação é que, em geral, estes são utilizados visando principalmente a maximização das medidas de cobertura e precisão, focando a construção de um classificador genérico e preciso. Embora essa seja uma boa abordagem para automatizar processos de tomada de decisão, pode deixar a desejar quando o usuário tem o desejo de extrair um modelo de conhecimento que possa ser estudado e que possa ser útil para uma melhor compreensão do domínio. Tendo-se em vista esse cenário, o principal objetivo deste trabalho é pesquisar métodos de computação evolutiva multiobjetivo para a construção de regras de conhecimento individuais com base em critérios definidos pelo usuário. Para isso utiliza-se a biblioteca de classes e ambiente de construção de regras de conhecimento ECLE, cujo desenvolvimento remete a projetos anteriores. Outro objetivo deste trabalho consiste comparar os métodos de computação evolutiva pesquisados com métodos baseado em composição de rankings previamente existentes na ECLE. É mostrado que os métodos de computação evolutiva multiobjetivo apresentam melhores resultados que os métodos baseados em composição de rankings, tanto em termos de dominância e proximidade das soluções construídas com aquelas da fronteira Pareto-ótima quanto em termos de diversidade na fronteira de Pareto. Em otimização multiobjetivo, ambos os critérios são importantes, uma vez que o propósito da otimização multiobjetivo é fornecer não apenas uma, mas uma gama de soluções eficientes para o problema, das quais o usuário pode escolher uma ou mais soluções que apresentem os melhores compromissos entre os objetivos / Machine Learning algorithms are notable examples of Artificial Intelligence algorithms capable of automating the extraction of implicit knowledge from datasets. In particular, Symbolic Learning algorithms are those which yield an intelligible knowledge model, i.e., one which a user may easily read. The usage of Symbolic Learning is particularly common within the context of classification, which involves the extraction of knowledge such that the associated model describes correelation among a set of attributes named the premises and one specific attribute named the class. Classification algorithms usually target into creating knowledge models which maximize the measures of coverage and precision, leading to classifiers that tend to be generic and precise. Althought this constitutes a good approach to creating models that automate the decision making process, it may not yield equally good results when the user wishes to extract a knowledge model which could assist them into getting a better understanding of the domain. Having that in mind, it has been established as the main goal of this Masters thesis the research of multi-objective evolutionary computing methods to create individual knowledge rules maximizing sets of arbitrary user-defined criteria. This is achieved by employing the class library and knowledge rule construction environment ECLE, which had been developed during previous research work. A second goal of this Masters thesis is the comparison of the researched evolutionary computing methods against previously existing ranking composition methods in ECLE. It is shown in this Masters thesis that the employment of multi-objective evolutionary computing methods produces better results than those produced by the employment of ranking composition-based methods. This improvement is verified both in terms of solution dominance and proximity of the solution set to the Pareto-optimal front and in terms of Pareto-front diversity. Both criteria are important for evaluating the efficiency of multi-objective optimization algorithms, for the goal of multi-objective optimization is to provide a broad range of efficient solutions, so the user may pick one or more solutions which present the best trade-off among all objectives
|
Page generated in 0.1614 seconds