• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 33
  • 5
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 58
  • 58
  • 58
  • 34
  • 12
  • 11
  • 7
  • 6
  • 6
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Use of data mining for investigation of crime patterns

Padhye, Manoday D. January 2006 (has links)
Thesis (M.S.)--West Virginia University, 2006. / Title from document title page. Document formatted into pages; contains viii, 108 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 80-81).
12

Validating cohesion metrics by mining open source software data with association rules

Singh, Pariksha January 2008 (has links)
Dissertation submitted for the fulfillment of the requirement for the degree of Masters in Information Technology, Department of Information Technology, Faculty of Accounting and Informatics, Durban University of Technology, 2008. / Competitive pressure on the software industry encourages organizations to examine the effectiveness of their software development and evolutionary processes. Therefore it is important that software is measured in order to improve the quality. The question is not whether we should measure software but how it should be measured. Software measurement has been in existence for over three decades and it is still in the process of becoming a mature science. The many influences of new software development technologies have led to a diverse growth in software measurement technologies which have resulted in various definitions and validation techniques. An important aspect of software measurement is the measurement of the design, which nowadays often means the measurement of object oriented design. Chidamer and Kemerer (1994) designed a metric suite for object oriented design, which has provided a new foundation for metrics and acts as a starting point for further development of the software measurement science. This study documents theoretical object oriented cohesion metrics and calculates those metrics for classes extracted from a sample of open source software packages. For each open source software package, the following data is recorded: software size, age, domain, number of developers, number of bugs, support requests, feature requests, etc. The study then tests by means of association rules which theoretical cohesion metrics are validated hypothesis: that older software is more cohesive than younger software, bigger packages is less cohesive than smaller packages, and the smaller the software program the more maintainable it is. This study attempts to validate existing theoretical object oriented cohesion metrics by mining open source software data with association rules.
13

Validating cohesion metrics by mining open source software data with association rules

Singh, Pariksha January 2008 (has links)
Dissertation submitted for the fulfillment of the requirement for the degree of Masters in Information Technology, Department of Information Technology, Faculty of Accounting and Informatics, Durban University of Technology, 2008. / Competitive pressure on the software industry encourages organizations to examine the effectiveness of their software development and evolutionary processes. Therefore it is important that software is measured in order to improve the quality. The question is not whether we should measure software but how it should be measured. Software measurement has been in existence for over three decades and it is still in the process of becoming a mature science. The many influences of new software development technologies have led to a diverse growth in software measurement technologies which have resulted in various definitions and validation techniques. An important aspect of software measurement is the measurement of the design, which nowadays often means the measurement of object oriented design. Chidamer and Kemerer (1994) designed a metric suite for object oriented design, which has provided a new foundation for metrics and acts as a starting point for further development of the software measurement science. This study documents theoretical object oriented cohesion metrics and calculates those metrics for classes extracted from a sample of open source software packages. For each open source software package, the following data is recorded: software size, age, domain, number of developers, number of bugs, support requests, feature requests, etc. The study then tests by means of association rules which theoretical cohesion metrics are validated hypothesis: that older software is more cohesive than younger software, bigger packages is less cohesive than smaller packages, and the smaller the software program the more maintainable it is. This study attempts to validate existing theoretical object oriented cohesion metrics by mining open source software data with association rules.
14

Exploratory Analysis of Human Sleep Data

Laxminarayan, Parameshvyas 19 January 2004 (has links)
In this thesis we develop data mining techniques to analyze sleep irregularities in humans. We investigate the effects of several demographic, behavioral and emotional factors on sleep progression and on patient's susceptibility to sleep-related and other disorders. Mining is performed over subjective and objective data collected from patients visiting the UMass Medical Center and the Day Kimball Hospital for treatment. Subjective data are obtained from patient responses to questions posed in a sleep questionnaire. Objective data comprise observations and clinical measurements recorded by sleep technicians using a suite of instruments together called polysomnogram. We create suitable filters to capture significant events within sleep epochs. We propose and employ a Window-based Association Rule Mining Algorithm to discover associations among sleep progression, pathology, demographics and other factors. This algorithm is a modified and extended version of the Set-and-Sequences Association Rule Mining Algorithm developed at WPI to support the mining of association rules from complex data types. We analyze both the medical as well as the statistical significance of the associations discovered by our algorithm. We also develop predictive classification models using logistic regression and compare the results with those obtained through association rule mining.
15

Fuzzy Association Rule Mining From Spatio-temporal Data: An Analysis Of Meteorological Data In Turkey

Unal Calargun, Seda 01 January 2008 (has links) (PDF)
Data mining is the extraction of interesting non-trivial, implicit, previously unknown and potentially useful information or patterns from data in large databases. Association rule mining is a data mining method that seeks to discover associations among transactions encoded within a database. Data mining on spatio-temporal data takes into consideration the dynamics of spatially extended systems for which large amounts of spatial data exist, given that all real world spatial data exists in some temporal context. We need fuzzy sets in mining association rules from spatio-temporal databases since fuzzy sets handle the numerical data better by softening the sharp boundaries of data which models the uncertainty embedded in the meaning of data. In this thesis, fuzzy association rule mining is performed on spatio-temporal data using data cubes and Apriori algorithm. A methodology is developed for fuzzy spatio-temporal data cube construction. Besides the performance criteria interpretability, precision, utility, novelty, direct-to-the-point and visualization are defined to be the metrics for the comparison of association rule mining techniques. Fuzzy association rule mining using spatio-temporal data cubes and Apriori algorithm performed within the scope of this thesis are compared using these metrics. Real meteorological data (precipitation and temperature) for Turkey recorded between 1970 and 2007 are analyzed using data cube and Apriori algorithm in order to generate the fuzzy association rules.
16

Proximity based association rules for spatial data mining in genomes

Saha, Surya 08 August 2009 (has links)
Our knowledge discovery algorithm employs a combination of association rule mining and graph mining to identify frequent spatial proximity relationships in genomic data where the data is viewed as a one-dimensional space. We apply mining techniques and metrics from association rule mining to identify frequently co-occurring features in genomes followed by graph mining to extract sets of co-occurring features. Using a case study of ab initio repeat finding, we have shown that our algorithm, ProxMiner, can be successfully applied to identify weakly conserved patterns among features in genomic data. The application of pairwise spatial relationships increases the sensitivity of our algorithm while the use of a confidence threshold based on false discovery rate reduces the noise in our results. Unlike available defragmentation algorithms, ProxMiner discovers associations among ab initio repeat families to identify larger more complete repeat families. ProxMiner will increase the effectiveness of repeat discovery techniques for newly sequenced genomes where ab initio repeat finders are only able to identify partial repeat families. In this dissertation, we provide two detailed examples of ProxMiner-discovered novel repeat families and one example of a known rice repeat family that has been extended by ProxMiner. These examples encompass some of the different types of repeat families that can be discovered by our algorithm. We have also discovered many other potentially interesting novel repeat families that can be further studied by biologists.
17

Improving the Scalability of an Exact Approach for Frequent Item Set Hiding

LaMacchia, Carolyn 01 January 2013 (has links)
Technological advances have led to the generation of large databases of organizational data recognized as an information-rich, strategic asset for internal analysis and sharing with trading partners. Data mining techniques can discover patterns in large databases including relationships considered strategically relevant to the owner of the data. The frequent item set hiding problem is an area of active research to study approaches for hiding the sensitive knowledge patterns before disclosing the data outside the organization. Several methods address hiding sensitive item sets including an exact approach that generates an extension to the original database that, when combined with the original database, limits the discovery of sensitive association rules without impacting other non-sensitive information. To generate the database extension, this method formulates a constraint optimization problem (COP). Solving the COP formulation is the dominant factor in the computational resource requirements of the exact approach. This dissertation developed heuristics that address the scalability of the exact hiding method. The heuristics are directed at improving the performance of COP solver by reducing the size of the COP formulation without significantly affecting the quality of the solutions generated. The first heuristic decomposes the COP formulation into multiple smaller problem instances that are processed separately by the COP solver to generate partial extensions of the database. The smaller database extensions are then combined to form a database extension that is close to the database extension generated with the original, larger COP formulation. The second heuristic evaluates the revised border used to formulate the COP and reduces the number of variables and constraints by selectively substituting multiple item sets with composite variables. Solving the COP with fewer variables and constraints reduces the computational cost of the processing. Results of heuristic processing were compared with an existing exact approach based on the size of the database extension, the ability to hide sensitive data, and the impact on nonsensitive data.
18

A data mining framework for targeted category promotions

Reutterer, Thomas, Hornik, Kurt, March, Nicolas, Gruber, Kathrin 06 1900 (has links) (PDF)
This research presents a new approach to derive recommendations for segment-specific, targeted marketing campaigns on the product category level. The proposed methodological framework serves as a decision support tool for customer relationship managers or direct marketers to select attractive product categories for their target marketing efforts, such as segment-specific rewards in loyalty programs, cross-merchandising activities, targeted direct mailings, customized supplements in catalogues, or customized promotions. The proposed methodology requires cus- tomers' multi-category purchase histories as input data and proceeds in a stepwise manner. It combines various data compression techniques and integrates an opti- mization approach which suggests candidate product categories for segment-specific targeted marketing such that cross-category spillover effects for non-promoted categories are maximized. To demonstrate the empirical performance of our pro- posed procedure, we examine the transactions from a real-world loyalty program of a major grocery retailer. A simple scenario-based analysis using promotion responsiveness reported in previous empirical studies and prior experience by domain experts suggests that targeted promotions might boost profitability between 15 % and 128 % relative to an undifferentiated standard campaign.
19

A Formal Concept Analysis Approach to Association Rule Mining: The QuICL Algorithms

Smith, David T. 01 January 2009 (has links)
Association rule mining (ARM) is the task of identifying meaningful implication rules exhibited in a data set. Most research has focused on extracting frequent item (FI) sets and thus fallen short of the overall ARM objective. The FI miners fail to identify the upper covers that are needed to generate a set of association rules whose size can be exploited by an end user. An alternative to FI mining can be found in formal concept analysis (FCA), a branch of applied mathematics. FCA derives a concept lattice whose concepts identify closed FI sets and connections identify the upper covers. However, most FCA algorithms construct a complete lattice and therefore include item sets that are not frequent. An iceberg lattice, on the other hand, is a concept lattice whose concepts contain only FI sets. Only three algorithms to construct an iceberg lattice were found in literature. Given that an iceberg concept lattice provides an analysis tool to succinctly identify association rules, this study investigated additional algorithms to construct an iceberg concept lattice. This report presents the development and analysis of the Quick Iceberg Concept Lattice (QuICL) algorithms. These algorithms provide incremental construction of an iceberg lattice. QuICL uses recursion instead of iteration to navigate the lattice and establish connections, thereby eliminating costly processing incurred by past algorithms. The QuICL algorithms were evaluated against leading FI miners and FCA construction algorithms using benchmarks cited in literature. Results demonstrate that QuICL provides performance on the order of FI miners yet additionally derive the upper covers. QuICL, when combined with known algorithms to extract a basis of association rules from a lattice, offer a "best known" ARM solution. Beyond this, the QuICL algorithms have proved to be very efficient, providing an order of magnitude gains over other incremental lattice construction algorithms. For example, on the Mushroom data set, QuICL completes in less than 3 seconds. Past algorithms exceed 200 seconds. On T10I4D100k, QuICL completes in less than 120 seconds. Past algorithms approach 10,000 seconds. QuICL is proved to be the "best known" all around incremental lattice construction algorithm. Runtime complexity is shown to be O(l d i) where l is the cardinality of the lattice, d is the average degree of the lattice, and i is a mean function on the frequent item extents.
20

Apriori Sets And Sequences: Mining Association Rules from Time Sequence Attributes

Pray, Keith A 06 May 2004 (has links)
We introduce an algorithm for mining expressive temporal relationships from complex data. Our algorithm, AprioriSetsAndSequences (ASAS), extends the Apriori algorithm to data sets in which a single data instance may consist of a combination of attribute values that are nominal sequences, time series, sets, and traditional relational values. Datasets of this type occur naturally in many domains including health care, financial analysis, complex system diagnostics, and domains in which multi-sensors are used. AprioriSetsAndSequences identifies predefined events of interest in the sequential data attributes. It then mines for association rules that make explicit all frequent temporal relationships among the occurrences of those events and relationships of those events and other data attributes. Our algorithm inherently handles different levels of time granularity in the same data set. We have implemented AprioriSetsAndSequences within the Weka environment and have applied it to computer performance, stock market, and clinical sleep disorder data. We show that AprioriSetsAndSequences produces rules that express significant temporal relationships that describe patterns of behavior observed in the data set.

Page generated in 0.1097 seconds