• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 65
  • 47
  • 24
  • 8
  • 5
  • 5
  • 3
  • 3
  • 3
  • 1
  • Tagged with
  • 173
  • 173
  • 115
  • 114
  • 41
  • 40
  • 34
  • 29
  • 28
  • 27
  • 25
  • 25
  • 22
  • 22
  • 19
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

An Incremental Approach to Discovering Regional Network Access Patterns

Tzeng, Yung-Shuen 18 July 2001 (has links)
This thesis proposes an incremental algorithm to discover regional network access patterns from traffic data of a regional network. Because the size of network traffic database is very large, we need to develop a fast algorithm of association rules in order to efficiently generate user access patterns. Attributed relational graph is used to represent user access patterns on the network. The change of relational graph indicates the access pattern of a regional network is changed. In order to keep the network access pattern up to date without spending great computation costs, we propose an incremental procedure to generalize network access patterns from time to time. The results can be used for supporting network administrators to easily keep track of network usage patterns and better manage regional networks
82

Mining Associations Using Directed Hypergraphs

Simha, Ramanuja N. 01 January 2011 (has links)
This thesis proposes a novel directed hypergraph based model for any database. We introduce the notion of association rules for multi-valued attributes, which is an adaptation of the definition of quantitative association rules known in the literature. The association rules for multi-valued attributes are integrated in building the directed hypergraph model. This model allows to capture attribute-level associations and their strength. Basing on this model, we provide association-based similarity notions between any two attributes and present a method for finding clusters of similar attributes. We then propose algorithms to identify a subset of attributes known as a leading indicator that influences the values of almost all other attributes. Finally, we present an association-based classifier that can be used to predict values of attributes. We demonstrate the effectiveness of our proposed model, notions, algorithms, and classifier through experiments on a financial time-series data set (S&P 500).
83

Data Mining For Rule Discovery In Relational Databases

Toprak, Serkan 01 September 2004 (has links) (PDF)
Data is mostly stored in relational databases today. However, most data mining algorithms are not capable of working on data stored in relational databases directly. Instead they require a preprocessing step for transforming relational data into algorithm specified form. Moreover, several data mining algorithms provide solutions for single relations only. Therefore, valuable hidden knowledge involving multiple relations remains undiscovered. In this thesis, an implementation is developed for discovering multi-relational association rules in relational databases. The implementation is based on a framework providing a representation of patterns in relational databases, refinement methods of patterns, and primitives for obtaining necessary record counts from database to calculate measures for patterns. The framework exploits meta-data of relational databases for pruning search space of patterns. The implementation extends the framework by employing Apriori algorithm for further pruning the search space and discovering relational recursive patterns. Apriori algorithm is used for finding large itemsets of tables, which are used to refine patterns. Apriori algorithm is modified by changing support calculation method for itemsets. A method for determining recursive relations is described and a solution is provided for handling recursive patterns using aliases. Additionally, continuous attributes of tables are discretized utilizing equal-depth partitioning. The implementation is tested with gene localization prediction task of KDD Cup 2001 and results are compared to those of the winner approach.
84

Mineração de padrões sequenciais e geração de regras de associação envolvendo temporalidade

João, Rafael Stoffalette 07 May 2015 (has links)
Submitted by Aelson Maciera (aelsoncm@terra.com.br) on 2017-08-07T19:16:02Z No. of bitstreams: 1 DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-07T19:18:39Z (GMT) No. of bitstreams: 1 DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-07T19:18:50Z (GMT) No. of bitstreams: 1 DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5) / Made available in DSpace on 2017-08-07T19:28:30Z (GMT). No. of bitstreams: 1 DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5) Previous issue date: 2015-05-07 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Data mining aims at extracting useful information from a Database (DB). The mining process enables, also, to analyze the data (e.g. correlations, predictions, chronological relationships, etc.). The work described in this document proposes an approach to deal with temporal knowledge extraction from a DB and describes the implementation of this approach, as the computational system called S_MEMIS+AR. The system focuses on the process of finding frequent temporal patterns in a DB and generating temporal association rules, based on the elements contained in the frequent patterns identified. At the end of the process performs an analysis of the temporal relationships between time intervals associated with the elements contained in each pattern using the binary relationships described by the Allen´s Interval Algebra. Both, the S_MEMISP+AR and the algorithm that the system implements, were subsidized by the Apriori, the MEMISP and the ARMADA approaches. Three experiments considering two different approaches were conducted with the S_MEMISP+AR, using a DB of sale records of products available in a supermarket. Such experiments were conducted to show that each proposed approach, besides inferring new knowledge about the data domain and corroborating results that reinforce the implicit knowledge about the data, also promotes, in a global way, the refinement and extension of the knowledge about the data. / A mineração de dados tem como objetivo principal a extração de informações úteis a partir de uma Base de Dados (BD). O processo de mineração viabiliza, também, a realização de análises dos dados (e.g, identificação de correlações, predições, relações cronológicas, etc.). No trabalho descrito nesta dissertação é proposta uma abordagem à extração de conhecimento temporal a partir de uma BD e detalha a implementação dessa abordagem por meio de um sistema computacional chamado S_MEMISP+AR. De maneira simplista, o sistema tem como principal tarefa realizar uma busca por padrões temporais em uma base de dados, com o objetivo de gerar regras de associação temporais entre elementos de padrões identificados. Ao final do processo, uma análise das relações temporais entre os intervalos de duração dos elementos que compõem os padrões é feita, com base nas relações binárias descritas pelo formalismo da Álgebra Intervalar de Allen. O sistema computacional S_MEMISP+AR e o algoritmo que o sistema implementa são subsidiados pelas propostas Apriori, ARMADA e MEMISP. Foram realizados três experimentos distintos, adotando duas abordagens diferentes de uso do S_MEMISP+AR, utilizando uma base de dados contendo registros de venda de produtos disponibilizados em um supermercado. Tais experimentos foram apresentados como forma de evidenciar que cada uma das abordagens, além de inferir novo conhecimento sobre o domínio de dados e corroborar resultados que reforçam o conhecimento implícito já existente sobre os dados, promovem, de maneira global, o refinamento e extensão do conhecimento sobre os dados.
85

Integrating network analysis and data mining techniques into effective framework for Web mining and recommendation : a framework for Web mining and recommendation

Nagi, Mohamad January 2015 (has links)
The main motivation for the study described in this dissertation is to benefit from the development in technology and the huge amount of available data which can be easily captured, stored and maintained electronically. We concentrate on Web usage (i.e., log) mining and Web structure mining. Analysing Web log data will reveal valuable feedback reflecting how effective the current structure of a web site is and to help the owner of a web site in understanding the behaviour of the web site visitors. We developed a framework that integrates statistical analysis, frequent pattern mining, clustering, classification and network construction and analysis. We concentrated on the statistical data related to the visitors and how they surf and pass through the various pages of a given web site to land at some target pages. Further, the frequent pattern mining technique was used to study the relationship between the various pages constituting a given web site. Clustering is used to study the similarity of users and pages. Classification suggests a target class for a given new entity by comparing the characteristics of the new entity to those of the known classes. Network construction and analysis is also employed to identify and investigate the links between the various pages constituting a Web site by constructing a network based on the frequency of access to the Web pages such that pages get linked in the network if they are identified in the result of the frequent pattern mining process as frequently accessed together. The knowledge discovered by analysing a web site and its related data should be considered valuable for online shoppers and commercial web site owners. Benefitting from the outcome of the study, a recommendation system was developed to suggest pages to visitors based on their profiles as compared to similar profiles of other visitors. The conducted experiments using popular datasets demonstrate the applicability and effectiveness of the proposed framework for Web mining and recommendation. As a by product of the proposed method, we demonstrate how it is effective in another domain for feature reduction by concentrating on gene expression data analysis as an application with some interesting results reported in Chapter 5.
86

[en] CLASSIFICATION OF DATABASE REGISTERS THROUGH EVOLUTION OF ASSOCIATION RULES USING GENETIC ALGORITHMS / [pt] CLASSIFICAÇÃO DE REGISTROS EM BANCO DE DADOS POR EVOLUÇÃO DE REGRAS DE ASSOCIAÇÃO UTILIZANDO ALGORITMOS GENÉTICOS

CARLOS HENRIQUE PEREIRA LOPES 19 October 2005 (has links)
[pt] Esta dissertação investiga a utilização de Algoritmos Genéticos (AG) no processo de descoberta de conhecimento implícito em Banco de Dados (KDD - Knowledge Discovery Database). O objetivo do trabalho foi avaliar o desempenho de Algoritmos Genéticos no processo de classificação de registros em Bancos de Dados (BD). O processo de classificação no contexto de Algoritmos Genéticos consiste na evolução de regras de associação que melhor caracterizem, através de sua acurácia e abrangência, um determinado grupo de registros do BD. O trabalho consistiu de 4 etapas principais: um estudo sobre a área de Knowledge Discovery Database (KDD); a definição de um modelo de AG aplicado à Mineração de Dados (Data Mining); a implementação de uma ferramenta (Rule-Evolver) de Mineração de Dados; e o estudo de casos. O estudo sobre a área de KDD envolveu todo o processo de descoberta de conhecimento útil em banco de dados: definição do problema; seleção dos dados; limpeza dos dados; pré-processamento dos dados; codificação dos dados; enriquecimento dos dados; mineração dos dados e a interpretação dos resultados. Em particular, o estudo destacou a fase de Mineração de Dados e os algoritmos e técnicas empregadas (Redes Neurais, Indução de regras, Modelos Estatísticos e Algoritmos Genéticos). Deste estudo resultou um survey sobre os principais projetos de pesquisa na área. A modelagem do Algoritmo Genético consistiu fundamentalmente na definição de uma representação dos cromossomas, da função de avaliação e dos operadores genéticos. Em mineração de dados por regras de associação é necessário considerar-se atributos quantitativos e categóricos. Atributos quantitativos representam variáveis contínuas (faixa de valores) e atributos categóricos variáveis discretas. Na representação definida, cada cromossoma representa uma regra e cada gene corresponde a um atributo do BD, que pode ser quantitativo ou categórico conforme a aplicação. A função de avaliação associa um valor numérico à regra encontrada, refletindo assim uma medida da qualidade desta solução. A Mineração de Dados por AG é um problema de otimização onde a função de avaliação deve apontar para as melhores regras de associação. A acurácia e a abrangência são medidas de desempenho e, em alguns casos, se mantém nulas durante parte da evolução. Assim, a função de avaliação deve ser uma medida que destaca cromossomas contendo regras promissoras em apresentar acurácia e abrangência diferentes de zero. Foram implementadas 10 funções de avaliação. Os operadores genéticos utilizados (crossover e mutação) buscam recombinar as cláusulas das regras, de modo a procurar obter novas regras com maior acurácia e abrangência dentre as já encontradas. Foram implementados e testados 4 operadores de cruzamento e 2 de mutação. A implementação de uma ferramenta de modelagem de AG aplicada à Mineração de Dados, denominada Rule-Evolver, avaliou o modelo proposto para o problema de classificação de registros. O Rule-Evolver analisa um Banco de Dados e extrai as regras de associação que melhor diferenciem um grupo de registros em relação a todos os registros do Banco de Dados. Suas características principais são: seleção de atributos do BD; informações estatísticas dos atributos; escolha de uma função de avaliação entre as 10 implementadas; escolha dos operadores genéticos; visualização gráfica de desempenho do sistema; e interpretação de regras. Um operador genético é escolhido a cada reprodução em função de uma taxa preestabelecida pelo usuário. Esta taxa pode permanecer fixa ou variar durante o processo evolutivo. As funções de avaliação também podem ser alteradas (acrescidas de uma recompensa) em função da abrangência e da acurácia da regra. O Rule- Evolver possui uma interface entre o BD e o AG, necessária para tor / [en] This dissertation investigates the application of Genetic Algorithms (GAs) to the process of implicit knowledge discovery over databases (KDD - Knowledge Discovery Database). The objective of the work has been the assessment of the Genetic Algorithms (GA) performance in the classification process of database registers. In the context of Genetic Algorithms, this classification process consists in the evolution of association rules that characterise, through its accuracy and range, a particular group of database registers. This work has encompassed four main steps: a study over the area of Knowledge Discovery Databases; the GA model definition applied to Data Mining; the implementation of the Data Mining Rule Evolver; and the case studies. The study over the KDD area included the overall process of useful knowledge discovery; the problem definition; data organisation; data pre-processing; data encoding; data improvement; data mining; and results´ interpretation. Particularly, the investigation emphasied the data mining procedure, techniques and algorithms (neural Networks, rule Induction, Statistics Models and Genetic Algorithms). A survey over the mais research projects in this area was developed from this work. The Genetic Algorithm modelling encompassed fundamentally, the definition of the chromosome representation, the fitness evaluation function and the genetic operators. Quantitative and categorical attributes must be taken into account within data mining through association rules. Quantitative attribites represent continuous variables (range of values), whereas categorical attributes are discrete variable. In the representation employed in this work, each chromosome represents a rule and each gene corresponds to a database attribute, which can be quantitative or categorical, depending on the application. The evaluation function associates a numerical value to the discovered rule, reflecting, therefore, the fitness evaluation function should drive the process towards the best association rules. The accuracy and range are performance statistics and, in some cases, their values stay nil during part of the evolutionary process. Therefore, the fitness evaluation function should reward chromosomes containing promising rules, which present accuracy and range different of zero. Ten fitness evaluation functions have been implemented. The genetic operators used in this work, crossover and mutation, seek to recombine rules´clauses in such a way to achieve rules of more accuracy and broader range when comparing the ones already sampled. Four splicing operators and two mutation operators have been experimented. The GA modeling tool implementation applied to Data Mining called Rule Evolever, evaluated the proposed model to the problem of register classification. The Rule Evolver analyses the database and extracts association rules that can better differentiate a group of registers comparing to the overall database registers. Its main features are: database attributes selection; attributes statistical information; evaluation function selection among ten implemented ones; genetic operators selection; graphical visualization of the system performance; and rules interpretation. A particular genetic operator is selected at each reproduction step, according to a previously defined rate set by the user. This rate may be kept fix or may very along the evolutionary process. The evolutionary process. The evaluation functions may also be changed (a rewarding may be included) according to the rule´s range and accuracy. The Rule Evolver implements as interface between the database and the GA, endowing the KDD process and the Data Mining phase with flexibility. In order to optimise the rules´ search process and to achieve better quality rules, some evolutionary techniques have been implemented (linear rank and elitism), and different random initialisation methods have been used as well; global averag
87

Algorithmes automatiques pour la fouille visuelle de données et la visualisation de règles d’association : application aux données aéronautiques / Automatic algorithms for visual data mining and association rules visualization : application to aeronautical data

Bothorel, Gwenael 18 November 2014 (has links)
Depuis quelques années, nous assistons à une véritable explosion de la production de données dans de nombreux domaines, comme les réseaux sociaux ou le commerce en ligne. Ce phénomène récent est renforcé par la généralisation des périphériques connectés, dont l'utilisation est devenue aujourd'hui quasi-permanente. Le domaine aéronautique n'échappe pas à cette tendance. En effet, le besoin croissant de données, dicté par l'évolution des systèmes de gestion du trafic aérien et par les événements, donne lieu à une prise de conscience sur leur importance et sur une nouvelle manière de les appréhender, qu'il s'agisse de stockage, de mise à disposition et de valorisation. Les capacités d'hébergement ont été adaptées, et ne constituent pas une difficulté majeure. Celle-ci réside plutôt dans le traitement de l'information et dans l'extraction de connaissances. Dans le cadre du Visual Analytics, discipline émergente née des conséquences des attentats de 2001, cette extraction combine des approches algorithmiques et visuelles, afin de bénéficier simultanément de la flexibilité, de la créativité et de la connaissance humaine, et des capacités de calculs des systèmes informatiques. Ce travail de thèse a porté sur la réalisation de cette combinaison, en laissant à l'homme une position centrale et décisionnelle. D'une part, l'exploration visuelle des données, par l'utilisateur, pilote la génération des règles d'association, qui établissent des relations entre elles. D'autre part, ces règles sont exploitées en configurant automatiquement la visualisation des données concernées par celles-ci, afin de les mettre en valeur. Pour cela, ce processus bidirectionnel entre les données et les règles a été formalisé, puis illustré, à l'aide d'enregistrements de trafic aérien récent, sur la plate-forme Videam que nous avons développée. Celle-ci intègre, dans un environnement modulaire et évolutif, plusieurs briques IHM et algorithmiques, permettant l'exploration interactive des données et des règles d'association, tout en laissant à l'utilisateur la maîtrise globale du processus, notamment en paramétrant et en pilotant les algorithmes. / In the past few years, we have seen a large scale data production in many areas, such as social networks and e-business. This recent phenomenon is enhanced by the widespread use of devices, which are permanently connected. The aeronautical field is also involved in this trend. Indeed, its growing need for data, which is driven by air trafic management systems evolution and by events, leads to a widescale focus on its key role and on new ways to manage it. It deals with storage, availability and exploitation. Data hosting capacity, that has been adapted, is not a major challenge. The issue is now in data processing and knowledge extraction from it. Visual Analytics is an emerging field, stemming from the September 2001 events. It combines automatic and visual approaches, in order to benefit simultaneously from human flexibility, creativity and knowledge, and also from processing capacities of computers. This PhD thesis has focused on this combination, by giving to the operator a centered and decisionmaking role. On the one hand, the visual data exploration drives association rules extraction. They correspond to links between the data. On the other hand, these rules are exploited by automatically con_gurating the visualization of the concerned data, in order to highlight it. To achieve this, a bidirectional process has been formalized, between data and rules. It has been illustrated by air trafic recordings, thanks to the Videam platform, that we have developed. By integrating several HMI and algorithmic applications in a modular and upgradeable environment, it allows interactive exploration of both data and association rules. This is done by giving to human the mastering of the global process, especially by setting and driving algorithms.
88

Análise associativa: identificação de padrões de associação entre o perfil socioeconômico dos alunos do ensino básico e os resultados nas provas de matemática / Association analysis: identification of patterns related to the socioeconomic profiles

Lyvia Aloquio 20 February 2014 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Nos dias atuais, a maioria das operações feitas por empresas e organizações é armazenada em bancos de dados que podem ser explorados por pesquisadores com o objetivo de se obter informações úteis para auxílio da tomada de decisão. Devido ao grande volume envolvido, a extração e análise dos dados não é uma tarefa simples. O processo geral de conversão de dados brutos em informações úteis chama-se Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases). Uma das etapas deste processo é a Mineração de Dados (Data Mining), que consiste na aplicação de algoritmos e técnicas estatísticas para explorar informações contidas implicitamente em grandes bancos de dados. Muitas áreas utilizam o processo KDD para facilitar o reconhecimento de padrões ou modelos em suas bases de informações. Este trabalho apresenta uma aplicação prática do processo KDD utilizando a base de dados de alunos do 9 ano do ensino básico do Estado do Rio de Janeiro, disponibilizada no site do INEP, com o objetivo de descobrir padrões interessantes entre o perfil socioeconômico do aluno e seu desempenho obtido em Matemática na Prova Brasil 2011. Neste trabalho, utilizando-se da ferramenta chamada Weka (Waikato Environment for Knowledge Analysis), foi aplicada a tarefa de mineração de dados conhecida como associação, onde se extraiu regras por intermédio do algoritmo Apriori. Neste estudo foi possível descobrir, por exemplo, que alunos que já foram reprovados uma vez tendem a tirar uma nota inferior na prova de matemática, assim como alunos que nunca foram reprovados tiveram um melhor desempenho. Outros fatores, como a sua pretensão futura, a escolaridade dos pais, a preferência de matemática, o grupo étnico o qual o aluno pertence, se o aluno lê sites frequentemente, também influenciam positivamente ou negativamente no aprendizado do discente. Também foi feita uma análise de acordo com a infraestrutura da escola onde o aluno estuda e com isso, pôde-se afirmar que os padrões descobertos ocorrem independentemente se estes alunos estudam em escolas que possuem infraestrutura boa ou ruim. Os resultados obtidos podem ser utilizados para traçar perfis de estudantes que tem um melhor ou um pior desempenho em matemática e para a elaboração de políticas públicas na área de educação, voltadas ao ensino fundamental. / Nowadays, most of the transactions made by companies and organizations is stored in databases that can be explored by researchers in order to obtain useful information to aid decision making. Due to the large volume involved, the extraction and analysis of data is not a simple task. The general process of converting raw data into useful information is called Knowledge Discovery in Databases (KDD). One step in this process is the Data Mining, which involves the application of algorithms and statistical techniques to exploit information contained implicitly in large databases. Many areas use the KDD process to facilitate the recognition of patterns or models on their bases of information. This work presents a practical application of KDD process using the database of students in the 9th grade of elementary education in the State of Rio de Janeiro, available in INEP site, with the aim of finding interesting patterns between the socioeconomic profile of the student and his/her performance obtained in Mathematics. The tool called Weka was used and the Apriori algorithm was applied to extracting association rules. This study revealed, for example, that students who have been reproved once tend to get a lower score on the math test, as well as students who had never been disapproved have had superior performance. Other factors like student future perspectives, ethnic group, parent's schooling, satisfaction in mathematics studying, and the frequency of access to Internet also affect positively or negatively the students learning. An analysis related to the schools infrastructure was made, with the conclusion that patterns do not change regardless of the student studying in good or bad infrastructure schools. The results obtained can be used to trace the students profiles which have a better or a worse performance in mathematics and to the development of public policies in education, aimed at elementary education.
89

Análise associativa: identificação de padrões de associação entre o perfil socioeconômico dos alunos do ensino básico e os resultados nas provas de matemática / Association analysis: identification of patterns related to the socioeconomic profiles

Lyvia Aloquio 20 February 2014 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Nos dias atuais, a maioria das operações feitas por empresas e organizações é armazenada em bancos de dados que podem ser explorados por pesquisadores com o objetivo de se obter informações úteis para auxílio da tomada de decisão. Devido ao grande volume envolvido, a extração e análise dos dados não é uma tarefa simples. O processo geral de conversão de dados brutos em informações úteis chama-se Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases). Uma das etapas deste processo é a Mineração de Dados (Data Mining), que consiste na aplicação de algoritmos e técnicas estatísticas para explorar informações contidas implicitamente em grandes bancos de dados. Muitas áreas utilizam o processo KDD para facilitar o reconhecimento de padrões ou modelos em suas bases de informações. Este trabalho apresenta uma aplicação prática do processo KDD utilizando a base de dados de alunos do 9 ano do ensino básico do Estado do Rio de Janeiro, disponibilizada no site do INEP, com o objetivo de descobrir padrões interessantes entre o perfil socioeconômico do aluno e seu desempenho obtido em Matemática na Prova Brasil 2011. Neste trabalho, utilizando-se da ferramenta chamada Weka (Waikato Environment for Knowledge Analysis), foi aplicada a tarefa de mineração de dados conhecida como associação, onde se extraiu regras por intermédio do algoritmo Apriori. Neste estudo foi possível descobrir, por exemplo, que alunos que já foram reprovados uma vez tendem a tirar uma nota inferior na prova de matemática, assim como alunos que nunca foram reprovados tiveram um melhor desempenho. Outros fatores, como a sua pretensão futura, a escolaridade dos pais, a preferência de matemática, o grupo étnico o qual o aluno pertence, se o aluno lê sites frequentemente, também influenciam positivamente ou negativamente no aprendizado do discente. Também foi feita uma análise de acordo com a infraestrutura da escola onde o aluno estuda e com isso, pôde-se afirmar que os padrões descobertos ocorrem independentemente se estes alunos estudam em escolas que possuem infraestrutura boa ou ruim. Os resultados obtidos podem ser utilizados para traçar perfis de estudantes que tem um melhor ou um pior desempenho em matemática e para a elaboração de políticas públicas na área de educação, voltadas ao ensino fundamental. / Nowadays, most of the transactions made by companies and organizations is stored in databases that can be explored by researchers in order to obtain useful information to aid decision making. Due to the large volume involved, the extraction and analysis of data is not a simple task. The general process of converting raw data into useful information is called Knowledge Discovery in Databases (KDD). One step in this process is the Data Mining, which involves the application of algorithms and statistical techniques to exploit information contained implicitly in large databases. Many areas use the KDD process to facilitate the recognition of patterns or models on their bases of information. This work presents a practical application of KDD process using the database of students in the 9th grade of elementary education in the State of Rio de Janeiro, available in INEP site, with the aim of finding interesting patterns between the socioeconomic profile of the student and his/her performance obtained in Mathematics. The tool called Weka was used and the Apriori algorithm was applied to extracting association rules. This study revealed, for example, that students who have been reproved once tend to get a lower score on the math test, as well as students who had never been disapproved have had superior performance. Other factors like student future perspectives, ethnic group, parent's schooling, satisfaction in mathematics studying, and the frequency of access to Internet also affect positively or negatively the students learning. An analysis related to the schools infrastructure was made, with the conclusion that patterns do not change regardless of the student studying in good or bad infrastructure schools. The results obtained can be used to trace the students profiles which have a better or a worse performance in mathematics and to the development of public policies in education, aimed at elementary education.
90

Association rules analysis for objects hierarchy

Pietruszewski, Przemyslaw January 2006 (has links)
Association rules are one of the most popular methods of data mining. This technique allows to discover interesting dependences between objects. The thesis concerns on association rules for hierarchy of objects. As a multi–level structure is used DBLP database, which contains bibliographic descriptions of scientific papers conferences and journals in computer science. The main goal of thesis is investigation of interesting patterns of co-authorship with respect to different levels of hierarchy. To reach this goal own extracting method is proposed. / p.pietruszewski@op.pl

Page generated in 0.4959 seconds