Spelling suggestions: "subject:"priori"" "subject:"apriori""
21 |
Análise associativa: identificação de padrões de associação entre o perfil socioeconômico dos alunos do ensino básico e os resultados nas provas de matemática / Association analysis: identification of patterns related to the socioeconomic profilesLyvia Aloquio 20 February 2014 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Nos dias atuais, a maioria das operações feitas por empresas e organizações é armazenada
em bancos de dados que podem ser explorados por pesquisadores com o objetivo
de se obter informações úteis para auxílio da tomada de decisão. Devido ao grande volume
envolvido, a extração e análise dos dados não é uma tarefa simples. O processo geral de
conversão de dados brutos em informações úteis chama-se Descoberta de Conhecimento
em Bancos de Dados (KDD - Knowledge Discovery in Databases). Uma das etapas deste
processo é a Mineração de Dados (Data Mining), que consiste na aplicação de algoritmos
e técnicas estatísticas para explorar informações contidas implicitamente em grandes bancos
de dados. Muitas áreas utilizam o processo KDD para facilitar o reconhecimento de
padrões ou modelos em suas bases de informações. Este trabalho apresenta uma aplicação
prática do processo KDD utilizando a base de dados de alunos do 9 ano do ensino básico
do Estado do Rio de Janeiro, disponibilizada no site do INEP, com o objetivo de descobrir
padrões interessantes entre o perfil socioeconômico do aluno e seu desempenho obtido em
Matemática na Prova Brasil 2011. Neste trabalho, utilizando-se da ferramenta chamada
Weka (Waikato Environment for Knowledge Analysis), foi aplicada a tarefa de mineração
de dados conhecida como associação, onde se extraiu regras por intermédio do algoritmo
Apriori. Neste estudo foi possível descobrir, por exemplo, que alunos que já foram reprovados
uma vez tendem a tirar uma nota inferior na prova de matemática, assim como
alunos que nunca foram reprovados tiveram um melhor desempenho. Outros fatores,
como a sua pretensão futura, a escolaridade dos pais, a preferência de matemática, o
grupo étnico o qual o aluno pertence, se o aluno lê sites frequentemente, também influenciam
positivamente ou negativamente no aprendizado do discente. Também foi feita uma
análise de acordo com a infraestrutura da escola onde o aluno estuda e com isso, pôde-se
afirmar que os padrões descobertos ocorrem independentemente se estes alunos estudam
em escolas que possuem infraestrutura boa ou ruim. Os resultados obtidos podem ser
utilizados para traçar perfis de estudantes que tem um melhor ou um pior desempenho
em matemática e para a elaboração de políticas públicas na área de educação, voltadas
ao ensino fundamental. / Nowadays, most of the transactions made by companies and organizations is stored
in databases that can be explored by researchers in order to obtain useful information to
aid decision making. Due to the large volume involved, the extraction and analysis of data
is not a simple task. The general process of converting raw data into useful information
is called Knowledge Discovery in Databases (KDD). One step in this process is the Data
Mining, which involves the application of algorithms and statistical techniques to exploit
information contained implicitly in large databases. Many areas use the KDD process to
facilitate the recognition of patterns or models on their bases of information. This work
presents a practical application of KDD process using the database of students in the 9th
grade of elementary education in the State of Rio de Janeiro, available in INEP site, with
the aim of finding interesting patterns between the socioeconomic profile of the student
and his/her performance obtained in Mathematics. The tool called Weka was used and
the Apriori algorithm was applied to extracting association rules. This study revealed,
for example, that students who have been reproved once tend to get a lower score on the
math test, as well as students who had never been disapproved have had superior performance.
Other factors like student future perspectives, ethnic group, parent's schooling,
satisfaction in mathematics studying, and the frequency of access to Internet also affect
positively or negatively the students learning. An analysis related to the schools infrastructure
was made, with the conclusion that patterns do not change regardless of the
student studying in good or bad infrastructure schools. The results obtained can be used
to trace the students profiles which have a better or a worse performance in mathematics
and to the development of public policies in education, aimed at elementary education.
|
22 |
Metody pro získávání asociačních pravidel z dat / Methods for Mining Association Rules from DataUhlíř, Martin January 2007 (has links)
The aim of this thesis is to implement Multipass-Apriori method for mining association rules from text data. After the introduction to the field of knowledge discovery, the specific aspects of text mining are mentioned. In the mining process, preprocessing is a very important problem, use of stemming and stop words dictionary is necessary in this case. Next part of thesis deals with meaning, usage and generating of association rules. The main part is focused on the description of Multipass-Apriori method, which was implemented. On the ground of executed tests the most optimal way of dividing partitions was set and also the best way of sorting the itemsets. As a part of testing, Multipass-Apriori method was compared with Apriori method.
|
23 |
pcApriori: Scalable apriori for multiprocessor systemsSchlegel, Benjamin, Kiefer, Tim, Kissinger, Thomas, Lehner, Wolfgang 16 September 2022 (has links)
Frequent-itemset mining is an important part of data mining. It is a computational and memory intensive task and has a large number of scientific and statistical application areas. In many of them, the datasets can easily grow up to tens or even several hundred gigabytes of data. Hence, efficient algorithms are required to process such amounts of data. In the recent years, there have been proposed many efficient sequential mining algorithms, which however cannot exploit current and future systems providing large degrees of parallelism. Contrary, the number of parallel frequent-itemset mining algorithms is rather small and most of them do not scale well as the number of threads is largely increased. In this paper, we present a highly-scalable mining algorithm that is based on the well-known Apriori algorithm; it is optimized for processing very large datasets on multiprocessor systems. The key idea of pcApriori is to employ a modified producer--consumer processing scheme, which partitions the data during processing and distributes it to the available threads. We conduct many experiments on large datasets. pcApriori scales almost linear on our test system comprising 32 cores.
|
24 |
Parallel Mining of Association Rules Using a Lattice Based ApproachThomas, Wessel Morant 01 January 2009 (has links)
The discovery of interesting patterns from database transactions is one of the major problems in knowledge discovery in database. One such interesting pattern is the association rules extracted from these transactions. Parallel algorithms are required for the mining of association rules due to the very large databases used to store the transactions. In this paper we present a parallel algorithm for the mining of association rules. We implemented a parallel algorithm that used a lattice approach for mining association rules. The Dynamic Distributed Rule Mining (DDRM) is a lattice-based algorithm that partitions the lattice into sublattices to be assigned to processors for processing and identification of frequent itemsets. Experimental results show that DDRM utilizes the processors efficiently and performed better than the prefix-based and partition algorithms that use a static approach to assign classes to the processors. The DDRM algorithm scales well and shows good speedup.
|
25 |
Mining Medical Data in a Clinical EnvironmentIvanovskiy, Tim V. 07 July 2006 (has links)
The availability of new treatments for a disease depends on the success of clinical trials. In order for a clinical trial to be successful and approved, medical researchers must first recruit patients with a specific set of conditions in order to test the effectiveness of the proposed treatment. In the past, the accrual process was tedious and time-consuming. Since accruals rely heavily on the ability of physicians and their staff to be familiar with the protocol eligibility criteria, candidates tend to be missed. This can result and has resulted in unsuccessful trials.A recent project at the University of South Florida aimed to assist research physicians at H. Lee Moffitt Cancer Center & Research Institute, Tampa, Florida, with a screening process by utilizing a web-based expert system, Moffitt Expedited Accrual Network System (MEANS). This system allows physicians to determine the eligibility of a patient for several clinical trials simultaneously.We have implemented this web-based expert system at the H. Lee Moffitt Cancer Center & Research Gastroenterology (GI) Clinic. Based on our findings and staff feedback, the system has undergone many optimizations. We used data mining techniques to analyze the medical data of current gastrointestinal patients. The use of the Apriori algorithm allowed us to discover new rules (implications) in the patient data. All of the discovered implications were checked for medical validity by a physician, and those that were determined to be valid were entered into the expert system. Additional analysis of the data allowed us to streamline the system and decrease the number of mouse clicks required for screening. We also used a probability-based method to reorder the questions, which decreased the amount of data entry required to determine a patient's ineligibility.
|
26 |
Didelių duomenų sekų analizės problemos / Data mining problemsAmbraziūnas, Valdas 11 June 2004 (has links)
The main goal of these thesis is to compare association rules finding algorithms and to indicate the usability of finding association rules in business area. In order to achieve this goal, the theoretical analysis of three algorithms is done:
1. The Apriori algorithm – the most well known association rule algorithm – based on the property: “Any subset of a large itemset must be large”. This algorithm assumes that the database is memory-resident. The maximum number of database scans is one more than the cardinality of the largest large itemset.
2. The Sampling algorithm deals with the database sample prior the full database scan. The database sample is drawn such that it can be memory-resident. The Sampling algorithm reduces the number of database scans to one in the best case and two in the worst case.
3. The Partitioning algorithm divides database into partitions and bases on the property: “A large itemset must be large in at least one of the partitions”. This algorithm reduces the number of database scans to two and divides the database into partitions such that each partition can be placed into main memory.
There are created programs for all three algorithms plus the program for the full set of itemsets algorithm. Programs are created in C++ language. In order to achieve topmost performance, the GUI is missed.
Nine test data sets are created to compare the algorithms. Six of them contains real life data from telecommunications business area. Datasets varies from the... [to full text]
|
27 |
Applying the Apriori and FP-Growth Association Algorithms to Liver Cancer DataPinheiro, Fabiola M. R. 27 August 2013 (has links)
Cancer is the leading cause of deaths globally. Although liver cancer ranks only
fourth in incidence worldwide among all types of cancer, its survivability rate is the
lowest. Liver cancer is often diagnosed at an advanced stage, because in the early stages
of the disease patients usually do not have signs or symptoms. After initial diagnosis,
therapeutic options are limited and tend to be effective only for small size tumors with
limited spread and minimal vascular invasion. As a result, long-term patient survival
remains minimal, and has not improved in the past three decades. In order to reduce
morbidity and mortality from liver cancer, improvement in early diagnosis and the
evaluation of current treatments are essential.
This study tested the applicability of the Apriori and FP-Growth association data
mining algorithms to liver cancer patient data, obtained from the British Columbia
Cancer Agency. The data was used to develop association rules which indicate what
combinations of factors are most commonly observed with liver cancer incidence as well
as with increased or decreased rates of mortality.
Ideally, these association rules will be applied in future studies using liver cancer
data extracted from other Electronic Health Record (EHR) systems. The main objective
of making these rules available is to facilitate early detection guidelines for liver cancer
and to evaluate current treatment options. / Graduate / 0566 / 0984 / fabiola@uvic.ca
|
28 |
Uma arquitetura de software para descoberta de regras de associação multidimensional, multinível e de outliers em cubos OLAP: um estudo de caso com os algoritmos APriori e FPGrowthMoreira Tanuro, Carla 31 January 2010 (has links)
Made available in DSpace on 2014-06-12T15:55:26Z (GMT). No. of bitstreams: 2
arquivo2236_1.pdf: 2979608 bytes, checksum: 3c3ed256a9de67bd5b716bb15d15cb6c (MD5)
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2010 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / O processo tradicional de descoberta de conhecimento em bases de dados (KDD
Knowledge Discovery in Databases) não contempla etapas de processamento
multidimensional e multinível (i.e., processamento OLAP - OnLine Analytical
Processing) para minerar cubos de dados. Por conseqüência, a maioria das abordagens
de OLAM (OLAP Mining) propõe adaptações no algoritmo minerador. Dado que esta
abordagem provê uma solução fortemente acoplada ao algoritmo minerador, ela impede
que as adaptações para mineração multidimensional e multinível sejam utilizadas com
outros algoritmos. Além disto, grande parte das propostas de OLAM para regras de
associação não considera o uso de um servidor OLAP e não tira proveito de todo o
potencial multidimensional e multinível presentes nos cubos OLAP. Por estes motivos,
algum retrabalho (e.g., re-implementação de operações OLAP) é realizado e padrões
possivelmente fortes decorrentes de generalizações não são identificados.
Diante desse cenário, este trabalho propõe a arquitetura DOLAM (Decoupled
OLAM) para mineração desacoplada de regras de associação multidimensional,
multinível e de outliers em cubos OLAP. A arquitetura DOLAM deve ser inserida no
processo de KDD (Knowledge Discovery in Databases) como uma etapa de
processamento que fica entre as etapas de Pré-Processamento e Transformação de
Dados. A arquitetura DOLAM define e implementa três componentes: 1) Detector de
Outliers, 2) Explorador de Subcubos e 3) Expansor de Ancestrais. A partir de uma
consulta do usuário, estes componentes são capazes de, respectivamente: 1) identificar
ruídos significativos nas células do resultado; 2) explorar, recursivamente, todas as
células do resultado, de forma a contemplar todas as possibilidades de combinações
multidimensional e multinível e 3) recuperar todos os antecessores (generalizações) das
células do resultado. O componente central da arquitetura é o Expansor de Ancestrais -
o único de uso obrigatório. Ressalta-se que, a partir desses componentes, o
processamento OLAM fica desacoplado do algoritmo minerador e permite realizar
descobertas mais abrangentes, as quais, por conseqüência, podem retornar padrões
potencialmente mais fortes. Como prova de conceito, foi realizado um estudo de caso
com dados reais de uma empresa de micro-crédito. O estudo de caso foi implementado
em Java, fez uso do servidor OLAP Mondrian e utilizou as implementações dos
algoritmos para mineração de regras de associação APriori e FP-Growth do pacote de
software Weka
|
29 |
Identification of Discriminating Motifs in Heart Rate Time Series Data of Soccer PlayersRavindranathan, Sampurna January 2018 (has links)
No description available.
|
30 |
Analysis of Classes of Singular Boundary Value ProblemsKo, Eunkyung 11 August 2012 (has links)
In this dissertation we study positive solutions to a singular p-Laplacian elliptic boundary value problem on a bounded domain with smooth boundary when a positive parameter varies. Our main focus is the analysis of a challenging class of singular p-Laplacian problems. We establish the existence of a positive solution for all positive values of the parameter and the existence of at least two positive solutions for a certain explicit range of the parameter. In the Laplacian case, we also prove the uniqueness of the positive solution for large values of the parameter. We extend our existence and multiplicity results to classes of singular systems and to the case when a domain is an exterior domain. We prove our existence and multiplicity results by the method of sub and supersolutions and our uniqueness result by establishing apriori and boundary estimates. Such results are well known in the literature for the nonsingular case. In this study, we extend these results to the more difficult singular case.
|
Page generated in 0.0592 seconds