Spelling suggestions: "subject:"association rules"" "subject:"association jules""
81 |
An Incremental Approach to Discovering Regional Network Access PatternsTzeng, Yung-Shuen 18 July 2001 (has links)
This thesis proposes an incremental algorithm to discover regional network access patterns from traffic data of a regional network. Because the size of network traffic database is very large, we need to develop a fast algorithm of association rules in order to efficiently generate user access patterns. Attributed relational graph is used to represent user access patterns on the network. The change of relational graph indicates the access pattern of a regional network is changed. In order to keep the network access pattern up to date without spending great computation costs, we propose an incremental procedure to generalize network access patterns from time to time. The results can be used for supporting network administrators to easily keep track of network usage patterns and better manage regional networks
|
82 |
Mining Associations Using Directed HypergraphsSimha, Ramanuja N. 01 January 2011 (has links)
This thesis proposes a novel directed hypergraph based model for any database. We introduce the notion of association rules for multi-valued attributes, which is an adaptation of the definition of quantitative association rules known in the literature. The association rules for multi-valued attributes are integrated in building the directed hypergraph model. This model allows to capture attribute-level associations and their strength. Basing on this model, we provide association-based similarity notions between any two attributes and present a method for finding clusters of similar attributes. We then propose algorithms to identify a subset of attributes known as a leading indicator that influences the values of almost all other attributes. Finally, we present an association-based classifier that can be used to predict values of attributes. We demonstrate the effectiveness of our proposed model, notions, algorithms, and classifier through experiments on a financial time-series data set (S&P 500).
|
83 |
Data Mining For Rule Discovery In Relational DatabasesToprak, Serkan 01 September 2004 (has links) (PDF)
Data is mostly stored in relational databases today. However, most data mining algorithms are not capable of working on data stored in relational databases directly. Instead they require a preprocessing step for transforming relational data into algorithm specified form. Moreover, several data mining algorithms provide solutions for single relations only. Therefore, valuable hidden knowledge involving multiple relations remains undiscovered. In this thesis, an implementation is developed for discovering multi-relational association rules in relational databases. The implementation is based on a framework providing a representation of patterns in relational databases, refinement methods of patterns, and primitives for obtaining necessary record counts from database to calculate measures for patterns. The framework exploits meta-data of relational databases for pruning search space of patterns. The implementation extends the
framework by employing Apriori algorithm for further pruning the search space and discovering relational recursive patterns. Apriori algorithm is used for finding large itemsets of tables, which are used to refine patterns. Apriori algorithm is modified by changing support calculation method for itemsets. A method
for determining recursive relations is described and a solution is
provided for handling recursive patterns using aliases. Additionally, continuous attributes of tables are discretized utilizing equal-depth partitioning. The implementation is
tested with gene localization prediction task of KDD Cup 2001 and
results are compared to those of the winner approach.
|
84 |
Mineração de padrões sequenciais e geração de regras de associação envolvendo temporalidadeJoão, Rafael Stoffalette 07 May 2015 (has links)
Submitted by Aelson Maciera (aelsoncm@terra.com.br) on 2017-08-07T19:16:02Z
No. of bitstreams: 1
DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-07T19:18:39Z (GMT) No. of bitstreams: 1
DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-07T19:18:50Z (GMT) No. of bitstreams: 1
DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5) / Made available in DSpace on 2017-08-07T19:28:30Z (GMT). No. of bitstreams: 1
DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5)
Previous issue date: 2015-05-07 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Data mining aims at extracting useful information from a Database (DB). The mining
process enables, also, to analyze the data (e.g. correlations, predictions, chronological
relationships, etc.). The work described in this document proposes an approach to deal with
temporal knowledge extraction from a DB and describes the implementation of this
approach, as the computational system called S_MEMIS+AR. The system focuses on the
process of finding frequent temporal patterns in a DB and generating temporal association
rules, based on the elements contained in the frequent patterns identified. At the end of the
process performs an analysis of the temporal relationships between time intervals
associated with the elements contained in each pattern using the binary relationships
described by the Allen´s Interval Algebra. Both, the S_MEMISP+AR and the algorithm that
the system implements, were subsidized by the Apriori, the MEMISP and the ARMADA
approaches. Three experiments considering two different approaches were conducted with
the S_MEMISP+AR, using a DB of sale records of products available in a supermarket.
Such experiments were conducted to show that each proposed approach, besides inferring
new knowledge about the data domain and corroborating results that reinforce the implicit
knowledge about the data, also promotes, in a global way, the refinement and extension of
the knowledge about the data. / A mineração de dados tem como objetivo principal a extração de informações úteis a partir
de uma Base de Dados (BD). O processo de mineração viabiliza, também, a realização de
análises dos dados (e.g, identificação de correlações, predições, relações cronológicas,
etc.). No trabalho descrito nesta dissertação é proposta uma abordagem à extração de
conhecimento temporal a partir de uma BD e detalha a implementação dessa abordagem
por meio de um sistema computacional chamado S_MEMISP+AR. De maneira simplista, o
sistema tem como principal tarefa realizar uma busca por padrões temporais em uma base
de dados, com o objetivo de gerar regras de associação temporais entre elementos de
padrões identificados. Ao final do processo, uma análise das relações temporais entre os
intervalos de duração dos elementos que compõem os padrões é feita, com base nas
relações binárias descritas pelo formalismo da Álgebra Intervalar de Allen. O sistema
computacional S_MEMISP+AR e o algoritmo que o sistema implementa são subsidiados
pelas propostas Apriori, ARMADA e MEMISP. Foram realizados três experimentos distintos,
adotando duas abordagens diferentes de uso do S_MEMISP+AR, utilizando uma base de
dados contendo registros de venda de produtos disponibilizados em um supermercado. Tais
experimentos foram apresentados como forma de evidenciar que cada uma das
abordagens, além de inferir novo conhecimento sobre o domínio de dados e corroborar
resultados que reforçam o conhecimento implícito já existente sobre os dados, promovem,
de maneira global, o refinamento e extensão do conhecimento sobre os dados.
|
85 |
Integrating network analysis and data mining techniques into effective framework for Web mining and recommendation : a framework for Web mining and recommendationNagi, Mohamad January 2015 (has links)
The main motivation for the study described in this dissertation is to benefit from the development in technology and the huge amount of available data which can be easily captured, stored and maintained electronically. We concentrate on Web usage (i.e., log) mining and Web structure mining. Analysing Web log data will reveal valuable feedback reflecting how effective the current structure of a web site is and to help the owner of a web site in understanding the behaviour of the web site visitors. We developed a framework that integrates statistical analysis, frequent pattern mining, clustering, classification and network construction and analysis. We concentrated on the statistical data related to the visitors and how they surf and pass through the various pages of a given web site to land at some target pages. Further, the frequent pattern mining technique was used to study the relationship between the various pages constituting a given web site. Clustering is used to study the similarity of users and pages. Classification suggests a target class for a given new entity by comparing the characteristics of the new entity to those of the known classes. Network construction and analysis is also employed to identify and investigate the links between the various pages constituting a Web site by constructing a network based on the frequency of access to the Web pages such that pages get linked in the network if they are identified in the result of the frequent pattern mining process as frequently accessed together. The knowledge discovered by analysing a web site and its related data should be considered valuable for online shoppers and commercial web site owners. Benefitting from the outcome of the study, a recommendation system was developed to suggest pages to visitors based on their profiles as compared to similar profiles of other visitors. The conducted experiments using popular datasets demonstrate the applicability and effectiveness of the proposed framework for Web mining and recommendation. As a by product of the proposed method, we demonstrate how it is effective in another domain for feature reduction by concentrating on gene expression data analysis as an application with some interesting results reported in Chapter 5.
|
86 |
[en] CLASSIFICATION OF DATABASE REGISTERS THROUGH EVOLUTION OF ASSOCIATION RULES USING GENETIC ALGORITHMS / [pt] CLASSIFICAÇÃO DE REGISTROS EM BANCO DE DADOS POR EVOLUÇÃO DE REGRAS DE ASSOCIAÇÃO UTILIZANDO ALGORITMOS GENÉTICOSCARLOS HENRIQUE PEREIRA LOPES 19 October 2005 (has links)
[pt] Esta dissertação investiga a utilização de Algoritmos
Genéticos (AG) no processo de descoberta de conhecimento
implícito em Banco de Dados (KDD - Knowledge Discovery
Database). O objetivo do trabalho foi avaliar o desempenho
de Algoritmos Genéticos no processo de classificação de
registros em Bancos de Dados (BD). O processo de
classificação no contexto de Algoritmos Genéticos consiste
na evolução de regras de associação que melhor
caracterizem, através de sua acurácia e abrangência, um
determinado grupo de registros do BD. O trabalho consistiu
de 4 etapas principais: um estudo sobre a área de
Knowledge Discovery Database (KDD); a definição de um
modelo de AG aplicado à Mineração de Dados (Data Mining);
a implementação de uma ferramenta (Rule-Evolver) de
Mineração de Dados; e o estudo de casos.
O estudo sobre a área de KDD envolveu todo o processo de
descoberta de conhecimento útil em banco de dados:
definição do problema; seleção dos dados; limpeza dos
dados; pré-processamento dos dados; codificação dos dados;
enriquecimento dos dados; mineração dos dados e a
interpretação dos resultados. Em particular, o estudo
destacou a fase de Mineração de Dados e os algoritmos e
técnicas empregadas (Redes Neurais, Indução de regras,
Modelos Estatísticos e Algoritmos Genéticos). Deste estudo
resultou um survey sobre os principais projetos de
pesquisa na área.
A modelagem do Algoritmo Genético consistiu
fundamentalmente na definição de uma representação dos
cromossomas, da função de avaliação e dos operadores
genéticos. Em mineração de dados por regras de associação
é necessário considerar-se atributos quantitativos e
categóricos. Atributos quantitativos representam variáveis
contínuas (faixa de valores) e atributos categóricos
variáveis discretas. Na representação definida, cada
cromossoma representa uma regra e cada gene corresponde a
um atributo do BD, que pode ser quantitativo ou categórico
conforme a aplicação. A função de avaliação associa um
valor numérico à regra encontrada, refletindo assim uma
medida da qualidade desta solução. A Mineração de Dados
por AG é um problema de otimização onde a função de
avaliação deve apontar para as melhores regras de
associação. A acurácia e a abrangência são medidas de
desempenho e, em alguns casos, se mantém nulas durante
parte da evolução. Assim, a função de avaliação deve ser
uma medida que destaca cromossomas contendo regras
promissoras em apresentar acurácia e abrangência
diferentes de zero. Foram implementadas 10 funções de
avaliação. Os operadores genéticos utilizados (crossover e
mutação) buscam recombinar as cláusulas das regras, de
modo a procurar obter novas regras com maior acurácia e
abrangência dentre as já encontradas. Foram implementados
e testados 4 operadores de cruzamento e 2 de mutação.
A implementação de uma ferramenta de modelagem de AG
aplicada à Mineração de Dados, denominada Rule-Evolver,
avaliou o modelo proposto para o problema de classificação
de registros. O Rule-Evolver analisa um Banco de Dados e
extrai as regras de associação que melhor diferenciem um
grupo de registros em relação a todos os registros do
Banco de Dados. Suas características principais são:
seleção de atributos do BD; informações estatísticas dos
atributos; escolha de uma função de avaliação entre as 10
implementadas; escolha dos operadores genéticos;
visualização gráfica de desempenho do sistema; e
interpretação de regras. Um operador genético é escolhido
a cada reprodução em função de uma taxa preestabelecida
pelo usuário. Esta taxa pode permanecer fixa ou variar
durante o processo evolutivo. As funções de avaliação
também podem ser alteradas (acrescidas de uma recompensa)
em função da abrangência e da acurácia da regra. O Rule-
Evolver possui uma interface entre o BD e o AG, necessária
para tor / [en] This dissertation investigates the application of Genetic
Algorithms (GAs) to the process of implicit knowledge
discovery over databases (KDD - Knowledge Discovery
Database). The objective of the work has been the
assessment of the Genetic Algorithms (GA) performance in
the classification process of database registers. In the
context of Genetic Algorithms, this classification process
consists in the evolution of association rules that
characterise, through its accuracy and range, a particular
group of database registers. This work has encompassed
four main steps: a study over the area of Knowledge
Discovery Databases; the GA model definition applied to
Data Mining; the implementation of the Data Mining Rule
Evolver; and the case studies.
The study over the KDD area included the overall process
of useful knowledge discovery; the problem definition;
data organisation; data pre-processing; data encoding;
data improvement; data mining; and results´
interpretation. Particularly, the investigation emphasied
the data mining procedure, techniques and algorithms
(neural Networks, rule Induction, Statistics Models and
Genetic Algorithms). A survey over the mais research
projects in this area was developed from this work.
The Genetic Algorithm modelling encompassed fundamentally,
the definition of the chromosome representation, the
fitness evaluation function and the genetic operators.
Quantitative and categorical attributes must be taken into
account within data mining through association rules.
Quantitative attribites represent continuous variables
(range of values), whereas categorical attributes are
discrete variable. In the representation employed in this
work, each chromosome represents a rule and each gene
corresponds to a database attribute, which can be
quantitative or categorical, depending on the application.
The evaluation function associates a numerical value to
the discovered rule, reflecting, therefore, the fitness
evaluation function should drive the process towards the
best association rules. The accuracy and range are
performance statistics and, in some cases, their values
stay nil during part of the evolutionary process.
Therefore, the fitness evaluation function should reward
chromosomes containing promising rules, which present
accuracy and range different of zero. Ten fitness
evaluation functions have been implemented. The genetic
operators used in this work, crossover and mutation, seek
to recombine rules´clauses in such a way to achieve rules
of more accuracy and broader range when comparing the ones
already sampled. Four splicing operators and two mutation
operators have been experimented.
The GA modeling tool implementation applied to Data Mining
called Rule Evolever, evaluated the proposed model to the
problem of register classification. The Rule Evolver
analyses the database and extracts association rules that
can better differentiate a group of registers comparing to
the overall database registers. Its main features are:
database attributes selection; attributes statistical
information; evaluation function selection among ten
implemented ones; genetic operators selection; graphical
visualization of the system performance; and rules
interpretation. A particular genetic operator is selected
at each reproduction step, according to a previously
defined rate set by the user. This rate may be kept fix or
may very along the evolutionary process. The evolutionary
process. The evaluation functions may also be changed (a
rewarding may be included) according to the rule´s range
and accuracy. The Rule Evolver implements as interface
between the database and the GA, endowing the KDD process
and the Data Mining phase with flexibility. In order to
optimise the rules´ search process and to achieve better
quality rules, some evolutionary techniques have been
implemented (linear rank and elitism), and different
random initialisation methods have been used as well;
global averag
|
87 |
Algorithmes automatiques pour la fouille visuelle de données et la visualisation de règles d’association : application aux données aéronautiques / Automatic algorithms for visual data mining and association rules visualization : application to aeronautical dataBothorel, Gwenael 18 November 2014 (has links)
Depuis quelques années, nous assistons à une véritable explosion de la production de données dans de nombreux domaines, comme les réseaux sociaux ou le commerce en ligne. Ce phénomène récent est renforcé par la généralisation des périphériques connectés, dont l'utilisation est devenue aujourd'hui quasi-permanente. Le domaine aéronautique n'échappe pas à cette tendance. En effet, le besoin croissant de données, dicté par l'évolution des systèmes de gestion du trafic aérien et par les événements, donne lieu à une prise de conscience sur leur importance et sur une nouvelle manière de les appréhender, qu'il s'agisse de stockage, de mise à disposition et de valorisation. Les capacités d'hébergement ont été adaptées, et ne constituent pas une difficulté majeure. Celle-ci réside plutôt dans le traitement de l'information et dans l'extraction de connaissances. Dans le cadre du Visual Analytics, discipline émergente née des conséquences des attentats de 2001, cette extraction combine des approches algorithmiques et visuelles, afin de bénéficier simultanément de la flexibilité, de la créativité et de la connaissance humaine, et des capacités de calculs des systèmes informatiques. Ce travail de thèse a porté sur la réalisation de cette combinaison, en laissant à l'homme une position centrale et décisionnelle. D'une part, l'exploration visuelle des données, par l'utilisateur, pilote la génération des règles d'association, qui établissent des relations entre elles. D'autre part, ces règles sont exploitées en configurant automatiquement la visualisation des données concernées par celles-ci, afin de les mettre en valeur. Pour cela, ce processus bidirectionnel entre les données et les règles a été formalisé, puis illustré, à l'aide d'enregistrements de trafic aérien récent, sur la plate-forme Videam que nous avons développée. Celle-ci intègre, dans un environnement modulaire et évolutif, plusieurs briques IHM et algorithmiques, permettant l'exploration interactive des données et des règles d'association, tout en laissant à l'utilisateur la maîtrise globale du processus, notamment en paramétrant et en pilotant les algorithmes. / In the past few years, we have seen a large scale data production in many areas, such as social networks and e-business. This recent phenomenon is enhanced by the widespread use of devices, which are permanently connected. The aeronautical field is also involved in this trend. Indeed, its growing need for data, which is driven by air trafic management systems evolution and by events, leads to a widescale focus on its key role and on new ways to manage it. It deals with storage, availability and exploitation. Data hosting capacity, that has been adapted, is not a major challenge. The issue is now in data processing and knowledge extraction from it. Visual Analytics is an emerging field, stemming from the September 2001 events. It combines automatic and visual approaches, in order to benefit simultaneously from human flexibility, creativity and knowledge, and also from processing capacities of computers. This PhD thesis has focused on this combination, by giving to the operator a centered and decisionmaking role. On the one hand, the visual data exploration drives association rules extraction. They correspond to links between the data. On the other hand, these rules are exploited by automatically con_gurating the visualization of the concerned data, in order to highlight it. To achieve this, a bidirectional process has been formalized, between data and rules. It has been illustrated by air trafic recordings, thanks to the Videam platform, that we have developed. By integrating several HMI and algorithmic applications in a modular and upgradeable environment, it allows interactive exploration of both data and association rules. This is done by giving to human the mastering of the global process, especially by setting and driving algorithms.
|
88 |
Análise associativa: identificação de padrões de associação entre o perfil socioeconômico dos alunos do ensino básico e os resultados nas provas de matemática / Association analysis: identification of patterns related to the socioeconomic profilesLyvia Aloquio 20 February 2014 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Nos dias atuais, a maioria das operações feitas por empresas e organizações é armazenada
em bancos de dados que podem ser explorados por pesquisadores com o objetivo
de se obter informações úteis para auxílio da tomada de decisão. Devido ao grande volume
envolvido, a extração e análise dos dados não é uma tarefa simples. O processo geral de
conversão de dados brutos em informações úteis chama-se Descoberta de Conhecimento
em Bancos de Dados (KDD - Knowledge Discovery in Databases). Uma das etapas deste
processo é a Mineração de Dados (Data Mining), que consiste na aplicação de algoritmos
e técnicas estatísticas para explorar informações contidas implicitamente em grandes bancos
de dados. Muitas áreas utilizam o processo KDD para facilitar o reconhecimento de
padrões ou modelos em suas bases de informações. Este trabalho apresenta uma aplicação
prática do processo KDD utilizando a base de dados de alunos do 9 ano do ensino básico
do Estado do Rio de Janeiro, disponibilizada no site do INEP, com o objetivo de descobrir
padrões interessantes entre o perfil socioeconômico do aluno e seu desempenho obtido em
Matemática na Prova Brasil 2011. Neste trabalho, utilizando-se da ferramenta chamada
Weka (Waikato Environment for Knowledge Analysis), foi aplicada a tarefa de mineração
de dados conhecida como associação, onde se extraiu regras por intermédio do algoritmo
Apriori. Neste estudo foi possível descobrir, por exemplo, que alunos que já foram reprovados
uma vez tendem a tirar uma nota inferior na prova de matemática, assim como
alunos que nunca foram reprovados tiveram um melhor desempenho. Outros fatores,
como a sua pretensão futura, a escolaridade dos pais, a preferência de matemática, o
grupo étnico o qual o aluno pertence, se o aluno lê sites frequentemente, também influenciam
positivamente ou negativamente no aprendizado do discente. Também foi feita uma
análise de acordo com a infraestrutura da escola onde o aluno estuda e com isso, pôde-se
afirmar que os padrões descobertos ocorrem independentemente se estes alunos estudam
em escolas que possuem infraestrutura boa ou ruim. Os resultados obtidos podem ser
utilizados para traçar perfis de estudantes que tem um melhor ou um pior desempenho
em matemática e para a elaboração de políticas públicas na área de educação, voltadas
ao ensino fundamental. / Nowadays, most of the transactions made by companies and organizations is stored
in databases that can be explored by researchers in order to obtain useful information to
aid decision making. Due to the large volume involved, the extraction and analysis of data
is not a simple task. The general process of converting raw data into useful information
is called Knowledge Discovery in Databases (KDD). One step in this process is the Data
Mining, which involves the application of algorithms and statistical techniques to exploit
information contained implicitly in large databases. Many areas use the KDD process to
facilitate the recognition of patterns or models on their bases of information. This work
presents a practical application of KDD process using the database of students in the 9th
grade of elementary education in the State of Rio de Janeiro, available in INEP site, with
the aim of finding interesting patterns between the socioeconomic profile of the student
and his/her performance obtained in Mathematics. The tool called Weka was used and
the Apriori algorithm was applied to extracting association rules. This study revealed,
for example, that students who have been reproved once tend to get a lower score on the
math test, as well as students who had never been disapproved have had superior performance.
Other factors like student future perspectives, ethnic group, parent's schooling,
satisfaction in mathematics studying, and the frequency of access to Internet also affect
positively or negatively the students learning. An analysis related to the schools infrastructure
was made, with the conclusion that patterns do not change regardless of the
student studying in good or bad infrastructure schools. The results obtained can be used
to trace the students profiles which have a better or a worse performance in mathematics
and to the development of public policies in education, aimed at elementary education.
|
89 |
Análise associativa: identificação de padrões de associação entre o perfil socioeconômico dos alunos do ensino básico e os resultados nas provas de matemática / Association analysis: identification of patterns related to the socioeconomic profilesLyvia Aloquio 20 February 2014 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Nos dias atuais, a maioria das operações feitas por empresas e organizações é armazenada
em bancos de dados que podem ser explorados por pesquisadores com o objetivo
de se obter informações úteis para auxílio da tomada de decisão. Devido ao grande volume
envolvido, a extração e análise dos dados não é uma tarefa simples. O processo geral de
conversão de dados brutos em informações úteis chama-se Descoberta de Conhecimento
em Bancos de Dados (KDD - Knowledge Discovery in Databases). Uma das etapas deste
processo é a Mineração de Dados (Data Mining), que consiste na aplicação de algoritmos
e técnicas estatísticas para explorar informações contidas implicitamente em grandes bancos
de dados. Muitas áreas utilizam o processo KDD para facilitar o reconhecimento de
padrões ou modelos em suas bases de informações. Este trabalho apresenta uma aplicação
prática do processo KDD utilizando a base de dados de alunos do 9 ano do ensino básico
do Estado do Rio de Janeiro, disponibilizada no site do INEP, com o objetivo de descobrir
padrões interessantes entre o perfil socioeconômico do aluno e seu desempenho obtido em
Matemática na Prova Brasil 2011. Neste trabalho, utilizando-se da ferramenta chamada
Weka (Waikato Environment for Knowledge Analysis), foi aplicada a tarefa de mineração
de dados conhecida como associação, onde se extraiu regras por intermédio do algoritmo
Apriori. Neste estudo foi possível descobrir, por exemplo, que alunos que já foram reprovados
uma vez tendem a tirar uma nota inferior na prova de matemática, assim como
alunos que nunca foram reprovados tiveram um melhor desempenho. Outros fatores,
como a sua pretensão futura, a escolaridade dos pais, a preferência de matemática, o
grupo étnico o qual o aluno pertence, se o aluno lê sites frequentemente, também influenciam
positivamente ou negativamente no aprendizado do discente. Também foi feita uma
análise de acordo com a infraestrutura da escola onde o aluno estuda e com isso, pôde-se
afirmar que os padrões descobertos ocorrem independentemente se estes alunos estudam
em escolas que possuem infraestrutura boa ou ruim. Os resultados obtidos podem ser
utilizados para traçar perfis de estudantes que tem um melhor ou um pior desempenho
em matemática e para a elaboração de políticas públicas na área de educação, voltadas
ao ensino fundamental. / Nowadays, most of the transactions made by companies and organizations is stored
in databases that can be explored by researchers in order to obtain useful information to
aid decision making. Due to the large volume involved, the extraction and analysis of data
is not a simple task. The general process of converting raw data into useful information
is called Knowledge Discovery in Databases (KDD). One step in this process is the Data
Mining, which involves the application of algorithms and statistical techniques to exploit
information contained implicitly in large databases. Many areas use the KDD process to
facilitate the recognition of patterns or models on their bases of information. This work
presents a practical application of KDD process using the database of students in the 9th
grade of elementary education in the State of Rio de Janeiro, available in INEP site, with
the aim of finding interesting patterns between the socioeconomic profile of the student
and his/her performance obtained in Mathematics. The tool called Weka was used and
the Apriori algorithm was applied to extracting association rules. This study revealed,
for example, that students who have been reproved once tend to get a lower score on the
math test, as well as students who had never been disapproved have had superior performance.
Other factors like student future perspectives, ethnic group, parent's schooling,
satisfaction in mathematics studying, and the frequency of access to Internet also affect
positively or negatively the students learning. An analysis related to the schools infrastructure
was made, with the conclusion that patterns do not change regardless of the
student studying in good or bad infrastructure schools. The results obtained can be used
to trace the students profiles which have a better or a worse performance in mathematics
and to the development of public policies in education, aimed at elementary education.
|
90 |
Association rules analysis for objects hierarchyPietruszewski, Przemyslaw January 2006 (has links)
Association rules are one of the most popular methods of data mining. This technique allows to discover interesting dependences between objects. The thesis concerns on association rules for hierarchy of objects. As a multi–level structure is used DBLP database, which contains bibliographic descriptions of scientific papers conferences and journals in computer science. The main goal of thesis is investigation of interesting patterns of co-authorship with respect to different levels of hierarchy. To reach this goal own extracting method is proposed. / p.pietruszewski@op.pl
|
Page generated in 0.1092 seconds