Spelling suggestions: "subject:"association "" "subject:"asssociation ""
591 |
Efficient Mining Approaches for Coherent Association RulesLin, Yui-Kai 29 August 2012 (has links)
The goal of data mining is to help market managers find relationships among items from large datasets to increase profits. Among the mining techniques, the Apriori algorithm is the most basic and important for association rule mining. Although a lot of mining approaches have been proposed based on the Apriori algorithm, most of them focus on positive association rules, such as R1: ¡§If milk is bought, then bread is bought¡¨. However, rule R1 may confuses users and makes wrong decision if the negative relation rules are not considered. For example, the rule such as R2: ¡§If milk is not bought, then bread is bought¡¨ may also be found. Then, the rule R2 conflicts with the positive rule R1. So, if two rules such as ¡§If milk is bought, then bread is bought¡¨ and ¡§If milk is not bought, then bread is not bought¡¨ are found at the same time, the rules which is called coherent rule may be more valuable.In this thesis, we thus propose two algorithms for solving this problem. The first proposed algorithm is named Highly Coherent Rule Mining algorithm (HCRM), which takes the properties of propositional logic into consideration and is based on Apriori approach for finding coherent rules. The lower and upper bounds of itemsets are also tightened to remove unnecessary check. Besides, in order to improve the efficiency of the mining process, the second algorithm, namely Projection-based Coherent Mining Algorithm (PCA), based on data projection is proposed for speeding up the execution time. Experiments are conducted on real and simulation datasets to demonstrate the performance of the proposed approaches and the results show that both HCRM and PCA can find more reliable rules and PCA is more efficient.
|
592 |
IDENTIFICATION OF DROUGHT-RELATED QUANTITATIVE TRAIT LOCI (QTLs) IN SUGARCANE (Saccharum spp.) USING GENIC MARKERSSharma, Vivek 2009 May 1900 (has links)
Population based association studies in crops that were established by domestication and
early breeding can be a valuable basis for the identification of QTLs. A case control
design in a population is an ideal way to identify maximum candidate sites contributing
to a complex polygenic trait such as drought. In the current study, marker loci associated
with drought related QTLs were identified in sugarcane (Saccharum spp), one of the
most complex crop genomes, with its polyploid nature (>8), chromosome number
(>100) and interspecific origin. The objectives of this investigation were: 1)
development of genic markers, which can be used for marker-assisted selection of
drought tolerant genotypes of sugarcane. 2) genotypic characterization of sugarcane
population at drought related loci using EST-SSR markers. Using 55 microsatellite
markers, 56 polymorphisms were scored among 80 modern sugarcane genotypes.
Homogeneity of the population was confirmed by determining the distribution of allele
frequencies obtained by random genomic microsatellite markers. This analysis was
conducted in the STRUCTURE program and the population was divided in 3 subgroups
based on the allelic distribution. Phenotypic data to evaluate drought tolerance among
the genotypes was collected by measuring chlorophyll content, chlorophyll fluorescence,
leaf temperature and leaf relative water content. A generalized linear model in SPSS was
used to find association between marker loci and phenotypic data. Markers with
significant association (P 0.001 level) with the trait were subjected to linear regression
to screen the spurious associations. Based on the results, 21 EST-SSR markers and 11 TRAP markers related to drought-defining physiological parameters were considered as
genuine associations in this study. Fifty-six polymorphisms produced by 13 EST-SSR
primers were used to produce genetic similarity matrix for 80 genotypes. Dendrogram
prepared from this genetic similarity matrix will be useful in selecting parents carrying
diversity at drought specific loci.
|
593 |
Voluntary Associations and Their Involvement in Collaborative Forest ManagementLu, Jiaying 2010 December 1900 (has links)
Voluntary associations representing numerous types of recreation users and
environmental issues have flourished across the landscape in America. However, the
literature has not sufficiently studied these associations and their role in collaborative
natural resource management. A lack of understanding of voluntary associations has not
only limited managers’ ability to accommodate changing values of the American public,
but also resulted in tremendous costs for land management agencies.
This dissertation was aimed at gaining a better understanding of outdoor
recreation and environmental voluntary associations and their involvement in
collaborative forest management. Five objectives guided this study: (1) assessing the
organizational characteristics of voluntary associations; (2) exploring organizational
concerns about forest management issues; (3) examining organizational leaders’
experiences in collaborating with the Forest Service; (4) evaluating the perceived
effectiveness of collaboration efforts with the Forest Service, and (5) developing and
testing a social psychological model to predict members’ participation in organizational
activities.
To achieve our research goals, a case-study approach utilizing a mixed-methods
research framework was employed. The Sam Houston National Forest (SHNF) located
in New Waverly, Texas served as the geographic focus of this research. Semi-structured
interviews and a web-based survey were conducted with members in selected voluntary
associations that are currently involved in collaborative forest management at SHNF.
The findings identified stakeholder attributes and interests, validated assumptions
held regarding voluntary groups and assessed collaboration effectiveness, and helped to
uncover alternative explanations for members’ differential participation in voluntary
associations. The study offers a conceptual bridge linking several areas of study
including inter-organizational collaboration, environmental communication, outdoor
recreation studies, and volunteerism.
|
594 |
Breeding Maize for Drought Tolerance: Diversity Characterization and Linkage Disequilibrium of Maize Paralogs ZmLOX4 and ZmLOX5De La Fuente, Gerald 2012 May 1900 (has links)
Maize production is limited agronomically by the availability of water and nutrients during the growing season. Of these two limiting factors, water availability is predicted to increase in importance as climate change and the growing urban landscape continue to stress limited supplies of freshwater. Historically, efforts to breed maize for water-limited environments have been extensive; especially in the areas of root architecture and flowering physiology. As progress has been made and new traits have been discovered and selected for, the different responses to drought stress at specific developmental stages of the maize plant have been selected as a whole when drought tolerance is evaluated. Herein we attempt to define the characteristics of the maize drought response during different developmental stages of the maize plant that can be altered through plant breeding. Towards breeding for drought tolerance, 400 inbred lines from a diversity panel were amplified and sequenced at the ZmLOX4 and ZmLOX5 loci in an effort to characterize their linkage disequilibrium and genetic diversity. Understanding these characteristics is essential for an association mapping study that accompanies this project, searching for novel and natural allelic diversity to improve drought tolerance and aflatoxin resistance in maize.
This study is among the first to investigate genetic diversity at important gene paralogs ZmLOX4 and ZmLOX5 believed to be highly conserved among all Eukaryotes. We show very little genetic diversity and very low linkage disequilibrium in these genes, but also identified one natural variant line with knocked out ZmLOX5, a variant line missing ZmLOX5, and five line variants with a duplication of ZmLOX5. Tajima's D test suggests that both ZmLOX4 and ZmLOX5 have both been under neutral selection. Further investigation of haplotype data revealed that ZmLOX12, a member of the ZmLOX family, showed strong LD that extends much further than expected in maize. Linkage disequilibrium patterns at these loci of interest are crucial to quantify for future candidate gene association mapping studies. Knockout and copy number variants of ZmLOX5, while not a surprising find, are under further investigation for crop improvement.
|
595 |
A Meaningful Candidate Approach to Mining Bi-Directional Traversal Patterns on the WWWChen, Jiun-rung 27 July 2004 (has links)
Since the World Wide Web (WWW) appeared, more and more useful information has
been available on the WWW. In order to find the information, one application of data
mining techniques on the WWW, referred to as Web mining, has become a research
area with increasing importance. Mining traversal patterns is one of the important
topics in Web mining. It focuses on how to find the Web page sequences which are
frequently browsed by users. Although the algorithms for mining association rules
(e.g., Apriori and DHP algorithms) could be applied to mine traversal patterns, they
do not utilize the property of Web transactions and generate too many invalid candidate
patterns. Thus, they could not provide good performance. Wu et al. proposed
an algorithm for mining traversal patterns, SpeedTracer, which utilizes the property
of Web transactions, i.e., the continuous property of the traversal patterns in the Web
structure. Although they decrease the number of candidate patterns generated in the
mining process, they do not efficiently utilize the property of Web transactions to
decrease the number of checks while checking the subsets of each candidate pattern.
In this thesis, we design three algorithms, which improve the SpeedTracer algorithm,
for mining traversal patterns. For the first algorithm, SpeedTracer*-I, it utilizes the
property of Web transactions to directly generate and count all candidate patterns
from user sessions. Moreover, it utilizes this property to improve the checking step,
when candidate patterns are generated. Next, according to the SpeedTracer*-I algorithm,
we propose SpeedTracer*-II and SpeedTracer*-III algorithms. In these two
algorithms, we improve the performance of the SpeedTracer*-I algorithm by decreasing
the number of times to scan the database. In the SpeedTracer*-II algorithm,
given a parameter n, we apply the SpeedTracer*-I algorithm to find Ln first, and
use Ln to generate all Ck, where k > n. After generating all candidate patterns, we
scan the database once to count all candidate patterns and then the frequent patterns
could be determined. In the SpeedTracer*-III algorithm, given a parameter n, we also
apply the SpeedTracer*-I algorithm to find Ln first, and directly generate and count
Ck from user sessions based on Ln, where k > n. The simulation results show that
the performance of the SpeedTracer*-I algorithm is better than that of the Speed-
Tracer algorithm in terms of the processing time. The simulation results also show
that SpeedTracer*-II and SpeedTracer*-III algorithms outperform SpeedTracer and
SpeedTracer*-I algorithms, because the former two algorithms need less times to scan
the database than the latter two algorithms. Moreover, from our simulation results,
we show that all of our proposed algorithms could provide better performance than
Apriori-like algorithms (e.g., FS and FDLP algorithms) in terms of the processing
time.
|
596 |
Research and Development of DSP Based Human HeadtrackerCheng, Kai-wen 27 July 2004 (has links)
The thesis shows the development of DSP-based ¡§human head-tracker¡¨ system. It uses CCD camera to capture images, and detects in the image sequence. When someone interrupts, the system will lock on his head and shows the locked image on the LCD screen. The Head-tracker system includes three sub-systems : ¡§Motion Detector¡¨, ¡§Ellipse Algorism¡¨, and ¡§Visual Probability Data Association Filter¡¨. From the results of experiment, it can meet the expectation and gain good performance and robustness.
|
597 |
A Sliding-Window Approach to Mining Maximal Large Itemsets for Large DatabasesChang, Yuan-feng 28 July 2004 (has links)
Mining association rules, means a process of nontrivial extraction of implicit,
previously and potentially useful information from data in databases. Mining maximal
large itemsets is a further work of mining association rules, which aims to find
the set of all subsets of large (frequent) itemsets that could be representative of all large
itemsets. Previous algorithms to mining maximal large itemsets can be classified into two approaches: exhausted and
shortcut. The shortcut approach could generate smaller number of
candidate itemsets than the exhausted approach,
resulting in better performance in terms of time and storage space.
On the other hand, when updates to the transaction databases occur,
one possible approach is to re-run the mining algorithm on the whole
database. The other approach is incremental mining, which aims for efficient maintenance of discovered association rules
without re-running the mining algorithms. However,
previous algorithms for mining maximal large itemsets based on the shortcut approach
can not support incremental mining for mining maximal large itemsets.
While the algorithms for incremental mining, {it e.g.}, the SWF
algorithm, could not efficiently support mining maximal large
itemsets, since it is based on the exhausted approach.
Therefore, in this thesis, we focus on the design of an
algorithm which could provide good performance for both mining maximal itemsets and incremental mining.
Based on some observations, for example, ``{it if an itemset is large, all its
subsets must be large; therefore, those subsets need not to be examined
further}", we propose a Sliding-Window approach, the SWMax algorithm, for
efficiently mining maximal large itemsets and incremental mining. Our
SWMax algorithm is a two-passes partition-based approach. We will find all candidate
1-itemsets ($C_1$), candidate 3-itemsets ($C_3$), large 1-itemsets ($L_1$),
and large 3-itemsets ($L_3$) in the first pass.
We generate the virtual maximal large itemsets after the first pass. Then, we use $L_1$ to generate $C_2$, use $L_3$
to generate $C_4$, use $C_4$ to generate $C_5$, until there is no
$C_k$ generated. In the second pass, we use the virtual maximal large itemsets to
prune $C_k$, and decide the maximal large itemsets.
For incremental mining, we consider two cases: (1)
data insertion, (2) data deletion. Both in Case 1 and Case 2, if an itemset
with size equal to 1 is not large in the original database, it could not be found in the
updated database based on the SWF algorithm. That is, a missing case
could occur in the incremental mining process of the SWF
algorithm, because the SWF algorithm only keeps the $C_2$ information.
While our SWMax algorithm could support incremental mining
correctly, since $C_1$ and $C_3$ are maintained in our algorithm.
We generate some synthetic databases to simulate the real transaction
databases in our simulation. From our simulation, the
results show that our SWMax algorithm could generate fewer number of candidates
and needs less time than the SWF algorithm.
|
598 |
An Analysis of Collective Action on National Teachers' Association R.O.CHsieh, Pi-Ying 29 July 2004 (has links)
Collective Action , National Teachers' Association R.O.C
|
599 |
Targeted Advertising Based on GP-association rulesTsai, Chai-wen 13 August 2004 (has links)
Targeting a small portion of customers for advertising has long been recognized by businesses. In this thesis we proposed a novel approach to promoting products with no prior transaction records. This approach starts with discovering the GP-association rules between customer types and product genres that had occurred frequently in transaction records. Customers are characterized by demographic attributes, some of these attributes have concept hierarchies and products can be generalized through some product taxonomy. Based on GP-association rules set, we developed a comprehensive algorithm to locating a short list of prospective customers for a given promotion product. The new approach was evaluated using the patron¡¦s circulation data from OPAC system of our university library. We measured the accuracy of estimated method and the effectiveness of targeted advertising in different parameters. The result shows that our approach achieved higher accuracy and effectiveness than other methods.
|
600 |
A Class-rooted FP-tree Approach to Data ClassificationChen, Chien-hung 29 June 2005 (has links)
Classification, an important problem of data mining, is one of useful techniques for prediction. The goal of the classification problem is to construct a classifier from a given database for training, and to predict new data with the unknown class. Classification has been widely applied to many areas, such as medical diagnosis and weather prediction. The decision tree is the most popular model among classifiers, since it can generate understandable rules and perform classification without requiring any computation. However, a major drawback of the decision tree model is that it only examines a single attribute at a time. In the real world, attributes in some databases are dependent on each other. Thus, we may improve the accuracy of the decision tree by discovering the correlation between attributes. The CAM method applies the method of mining association rules, like the Apriori method, for discovering the attribute dependence. However, traditional methods for mining association rules are inefficient in the classification applications and could have five problems: (1) the combinatorial explosion problem, (2) invalid candidates, (3) unsuitable minimal support, (4) the ignored meaningful class values, and (5) itemsets without class data. The FP-growth avoids the first two problems. However, it is still suffered from the remaining three problems. Moreover, one more problem occurs: Unnecessary nodes for the classification problem which make the FP-tree incompact and huge. Furthermore, the workload of the CAM method is expensive due to too many times of database scanning, and the attribute combination problem causes some misclassification. Therefore, in this thesis, we present an efficient and accurate decision tree building method which resolves the above six problems and reduces the overhead of database scanning in the CAM method. We build a structure named class-rooted FP-tree which is a tree similar to the FP-tree, except the root of the tree is always a class item. Instead of using a static minimal support applied in the FP-growth method, we decide the minimal support dynamically, which can avoid some misjudgement of large itemsets used for the classification problem. In the decision tree building phase, we provide a pruning strategy that can reduce the times of database scanning. We also solve the attribute combination problem in the CAM method and improve the accuracy. From our simulation, we show that the performance of the proposed class-rooted FP-tree mining method is better than that of other mining association rule methods in terms of storage usage. Our simulation also shows the performance improvement of our method in terms of the times of database scanning and classification accuracy as compared with the CAM method. Therefore, the mining strategy of our proposed method is applicable to any method for building decision tree, and provides high accuracy in the real world.
|
Page generated in 0.0784 seconds