Spelling suggestions: "subject:"data mining. algorithms."" "subject:"data mining. a.lgorithms.""
11 |
Next Generation Outlier DetectionWang, Ye 08 September 2014 (has links)
No description available.
|
12 |
A Hybrid heuristic-exhaustive search approach for rule extractionRodic, Daniel 29 May 2006 (has links)
The topic of this thesis is knowledge discovery and artificial intelligence based knowledge discovery algorithms. The knowledge discovery process and associated problems are discussed, followed by an overview of three classes of artificial intelligence based knowledge discovery algorithms. Typical representatives of each of these classes are presented and discussed in greater detail. Then a new knowledge discovery algorithm, called Hybrid Classifier System (HCS), is presented. The guiding concept behind the new algorithm was simplicity. The new knowledge discovery algorithm is loosely based on schemata theory. It is evaluated against one of the discussed algorithms from each class, namely: CN2; C4.5, BRAINNE and BGP. Results are discussed and compared. A comparison was done using a benchmark of classification problems. These results show that the new knowledge discovery algorithm performs satisfactory, yielding accurate, crisp rule sets. Probably the main strength of the HCS algorithm is its simplicity, so it can be the foundation for many possible future extensions. Some of the possible extensions of the new proposed algorithm are suggested in the final part of this thesis. / Dissertation (MSc)--University of Pretoria, 2007. / Computer Science / unrestricted
|
13 |
Large Data Clustering And Classification Schemes For Data MiningBabu, T Ravindra 12 1900 (has links)
Data Mining deals with extracting valid, novel, easily understood by humans, potentially useful and general abstractions from large data. A data is large when number of patterns, number of features per pattern or both are large. Largeness of data is characterized by its size which is beyond the capacity of main memory of a computer. Data Mining is an interdisciplinary field involving database systems, statistics, machine learning, visualization and computational aspects. The focus of data mining algorithms is scalability and efficiency. Large data clustering and classification is an important activity in Data Mining. The clustering algorithms are predominantly iterative requiring multiple scans of dataset, which is very expensive when data is stored on the disk.
In the current work we propose different schemes that have both theoretical validity and practical utility in dealing with such a large data. The schemes broadly encompass data compaction, classification, prototype selection, use of domain knowledge and hybrid intelligent systems. The proposed approaches can be broadly classified as (a) compressing the data by some means in a non-lossy manner; cluster as well as classify the patterns in their compressed form directly through a novel algorithm, (b) compressing the data in a lossy fashion such that a very high degree of compression and abstraction is obtained in terms of 'distinct subsequences'; classify the data in such compressed form to improve the prediction accuracy, (c) with the help of incremental clustering, a lossy compression scheme and rough set approach, obtain simultaneous prototype and feature selection, (d) demonstrate that prototype selection and data-dependent techniques can reduce number of comparisons in multiclass classification scenario using SVMs, and (e) by making use of domain knowledge of the problem and data under consideration, we show that we obtaina very high classification accuracy with less number of iterations with AdaBoost.
The schemes have pragmatic utility. The prototype selection algorithm is incremental, requiring a single dataset scan and has linear time and space requirements. We provide results obtained with a large, high dimensional handwritten(hw) digit data. The compression algorithm is based on simple concepts, where we demonstrate that classification of the compressed data improves computation time required by a factor 5 with prediction accuracy with both compressed and original data being exactly the same as 92.47%. With the proposed lossy compression scheme and pruning methods, we demonstrate that even with a reduction of distinct sequences by a factor of 6 (690 to 106), the prediction accuracy improves. Specifically, with original data containing 690 distinct subsequences, the classification accuracy is 92.47% and with appropriate choice of parameters for pruning, the number of distinct subsequences reduces to 106 with corresponding classification accuracy as 92.92%. The best classification accuracy of 93.3% is obtained with 452 distinct subsequences. With the scheme of simultaneous feature and prototype selection, we improved classification accuracy to better than that obtained with kNNC, viz., 93.58%, while significantly reducing the number of features and prototypes, achieving a compaction of 45.1%. In case of hybrid schemes based on SVM, prototypes and domain knowledge based tree(KB-Tree), we demonstrated reduction in SVM training time by 50% and testing time by about 30% as compared to complete data and improvement of classification accuracy to 94.75%. In case of AdaBoost the classification accuracy is 94.48%, which is better than those obtained with NNC and kNNC on the entire data; the training timing is reduced because of use of prototypes instead of the complete data. Another important aspect of the work is to devise a KB-Tree (with maximum depth of 4), that classifies a 10-category data in just 4 comparisons.
In addition to hw data, we applied the schemes to Network Intrusion Detection Data (10% dataset of KDDCUP99) and demonstrated that the proposed schemes provided less overall cost than the reported values.
|
14 |
Association rules search in large data bases / Susietumo taisyklių paieška didelėse duomenų bazėseSavulionienė, Loreta 19 May 2014 (has links)
The impact of information technology is an integral part of modern life. Any activity is related to information and data accumulation and storage, therefore, quick analysis of information is necessary. Today, the traditional data processing and data reports are no longer sufficient. The need of generating new information and knowledge from given data is understandable; therefore, new facts and knowledge, which allow us to forecast customer behaviour or financial transactions, diagnose diseases, etc., can be generated applying data mining techniques. The doctoral dissertation analyses modern data mining algorithms for estimating frequent sub-sequences and association rules. The dissertation proposes a new stochastic algorithm for mining frequent sub-sequences, its modifications SDPA1 and SDPA2 and stochastic algorithm for discovery of association rules, and presents the evaluation of the algorithm errors. These algorithms are approximate, but allow us to combine two important tests, i.e. time and accuracy. The algorithms have been tested using real and simulated databases. / Informacinių technologijų įtaka neatsiejama nuo šiuolaikinio gyvenimo. Bet kokia veiklos sritis yra susijusi su informacijos, duomenų kaupimu, saugojimu. Šiandien nebepakanka tradicinio duomenų apdorojimo bei įvairių ataskaitų formavimo. Duomenų tyrybos technologijų taikymas leidžia iš turimų duomenų išgauti naujus faktus ar žinias, kurios leidžia prognozuoti veiklą, pavyzdžiui, pirkėjų elgesį ar finansines tendencijas, diagnozuoti ligas ir pan. Disertacijoje nagrinėjami duomenų tyrybos algoritmai dažniems posekiams ir susietumo taisyklėms nustatyti. Disertacijoje sukurtas naujas stochastinis dažnų posekių paieškos algoritmas, jo modifikacijos SDPA1, SDPA2 ir stochastinis susietumo taisyklių nustatymo algoritmas bei pateiktas šių algoritmų paklaidų įvertinimas. Šie algoritmai yra apytiksliai, tačiau leidžia suderinti du svarbius kriterijus laiką ir tikslumą. Šie algoritmai buvo testuojami naudojant realias bei imitacines duomenų bazes.
|
15 |
Susietumo taisyklių paieška didelėse duomenų bazėse / Association rules search in large data basesSavulionienė, Loreta 19 May 2014 (has links)
Informacinių technologijų įtaka neatsiejama nuo šiuolaikinio gyvenimo. Bet kokia veiklos sritis yra susijusi su informacijos, duomenų kaupimu, saugojimu. Šiandien nebepakanka tradicinio duomenų apdorojimo bei įvairių ataskaitų formavimo. Duomenų tyrybos technologijų taikymas leidžia iš turimų duomenų išgauti naujus faktus ar žinias, kurios leidžia prognozuoti veiklą, pavyzdžiui, pirkėjų elgesį ar finansines tendencijas, diagnozuoti ligas ir pan. Disertacijoje nagrinėjami duomenų tyrybos algoritmai dažniems posekiams ir susietumo taisyklėms nustatyti. Disertacijoje sukurtas naujas stochastinis dažnų posekių paieškos algoritmas, jo modifikacijos SDPA1, SDPA2 ir stochastinis susietumo taisyklių nustatymo algoritmas bei pateiktas šių algoritmų paklaidų įvertinimas. Šie algoritmai yra apytiksliai, tačiau leidžia suderinti du svarbius kriterijus laiką ir tikslumą. Šie algoritmai buvo testuojami naudojant realias bei imitacines duomenų bazes. / The impact of information technology is an integral part of modern life. Any activity is related to information and data accumulation and storage, therefore, quick analysis of information is necessary. Today, the traditional data processing and data reports are no longer sufficient. The need of generating new information and knowledge from given data is understandable; therefore, new facts and knowledge, which allow us to forecast customer behaviour or financial transactions, diagnose diseases, etc., can be generated applying data mining techniques. The doctoral dissertation analyses modern data mining algorithms for estimating frequent sub-sequences and association rules. The dissertation proposes a new stochastic algorithm for mining frequent sub-sequences, its modifications SDPA1 and SDPA2 and stochastic algorithm for discovery of association rules, and presents the evaluation of the algorithm errors. These algorithms are approximate, but allow us to combine two important tests, i.e. time and accuracy. The algorithms have been tested using real and simulated databases.
|
16 |
Comparisons and Applications of Quantitative Signal Detections for Adverse Drug Reactions (ADRs): An Empirical Study Based On The Food And Drug Administration (FDA) Adverse Event Reporting System (AERS) And A Large Medical Claims DatabaseCHEN, YAN 23 April 2008 (has links)
No description available.
|
17 |
Seleção adaptativa e interativa de serviços móveis em ambientes convergentes heterogêneos. / Adaptive and interactive mobile service selection in convergent and heterogeneous environments.Dutra, Rogério Garcia 19 January 2012 (has links)
A crescente mobilidade de pessoas e recursos materiais exige um esforço constante na exploração das capacidades oferecidas pelas diferentes tecnologias disponíveis, que convergem para o fornecimento de informações e serviços em qualquer lugar com movimento, a qualquer tempo, e por meio de qualquer dispositivo, objetivos da rede mundial de cooperação e comunicação, conhecida como Internet. Em resposta a essa crescente demanda, a atual Internet está evoluindo do modelo de compartilhamento de informações para o modelo de contribuição e em um futuro próximo, para o modelo de colaboração entre provedores e consumidores, denominada de Internet do Futuro. Embora a atual Internet seja extraordinariamente bem sucedida, como um meio ubíquo de comunicação, sua atual arquitetura impõe limites para o provisionamento de serviços em ambientes heterogêneos e convergentes, demandando novas soluções que superem os desafios tecnológicos necessários ao estabelecimento da Internet do Futuro. Estas novas soluções basear-se-ão nos princípios da computação orientada a serviços, formando os componentes de um novo arcabouço de serviços para a Internet do Futuro, denominada de Internet de Serviços. Na Internet de Serviços, redes de comunicação móveis, convergentes e heterogêneas serão criadas sob demanda, disponibilizando um elevado número de serviços, funcionalmente similares, porém distintos sob o ponto de vista não funcional, dificultando a seleção dos serviços que melhor atendem o nível qualidade de serviço acordado entre provedores e consumidores. Este trabalho propõe uma nova solução para o problema de seleção de serviços, combinando algoritmos, usualmente empregados para prospecção de dados, para selecionar serviços de forma dinâmica e interativa, com base em atributos não funcionais, visando suprir as necessidades de mobilidade e colaboração em ambientes convergentes e heterogêneos, como a Internet de Serviços. Desta forma, este trabalho contribui para o projeto de pesquisa da Internet de Serviços, um dos pilares fundamentais para a elaboração da nova arquitetura orientada a serviços, que servirá de arcabouço à construção da Internet do Futuro, possibilitando inúmeras aplicações como Serviços Baseados em Localização e Computação em Nuvem. / The increasing mobility of people and resources demands additional efforts in exploring new capacities, offered by different technologies, which allows the supply of information and services in any place, any time and through any device, objective of the worldwide collaboration and communication web, known as Internet. Face to this increasing demand, current Internet is evolving from sharing to contribution model and in the near future, to collaboration model between providers and consumers. Although the current Internet has been extraordinarily successful, as a ubiquitous and universal means for communication, its architecture imposes limits for services deployment in heterogeneous and convergent environments, demanding new solutions to overcome the technological issues for Internet of Future achievement. These new solutions will be based on service oriented computing principles, providing the components of the new service framework for Internet of Future, called Internet of Services. In Internet of Services, convergent and heterogeneous mobile communication networks will be created on demand, providing a huge numbers of services, similar from functional point of view, but very different from the nonfunctional point of view, creating challenges for service selection which fits the service level agreement between providers and consumers. This work proposes a new solution for service selection process, combining algorithms commonly used for data mining, to perform dynamic and interactive service selection, fulfilling the mobility and collaborative requirements in a convergent and heterogeneous environment, such as Internet of Services. Therefore, this work contributes to Internet of Services research project, one of fundamental pillars to build the service oriented architecture, which will be used as framework for Internet of Future building, allowing many applications such as Location Based Services and Cloud Computing.
|
18 |
Seleção adaptativa e interativa de serviços móveis em ambientes convergentes heterogêneos. / Adaptive and interactive mobile service selection in convergent and heterogeneous environments.Rogério Garcia Dutra 19 January 2012 (has links)
A crescente mobilidade de pessoas e recursos materiais exige um esforço constante na exploração das capacidades oferecidas pelas diferentes tecnologias disponíveis, que convergem para o fornecimento de informações e serviços em qualquer lugar com movimento, a qualquer tempo, e por meio de qualquer dispositivo, objetivos da rede mundial de cooperação e comunicação, conhecida como Internet. Em resposta a essa crescente demanda, a atual Internet está evoluindo do modelo de compartilhamento de informações para o modelo de contribuição e em um futuro próximo, para o modelo de colaboração entre provedores e consumidores, denominada de Internet do Futuro. Embora a atual Internet seja extraordinariamente bem sucedida, como um meio ubíquo de comunicação, sua atual arquitetura impõe limites para o provisionamento de serviços em ambientes heterogêneos e convergentes, demandando novas soluções que superem os desafios tecnológicos necessários ao estabelecimento da Internet do Futuro. Estas novas soluções basear-se-ão nos princípios da computação orientada a serviços, formando os componentes de um novo arcabouço de serviços para a Internet do Futuro, denominada de Internet de Serviços. Na Internet de Serviços, redes de comunicação móveis, convergentes e heterogêneas serão criadas sob demanda, disponibilizando um elevado número de serviços, funcionalmente similares, porém distintos sob o ponto de vista não funcional, dificultando a seleção dos serviços que melhor atendem o nível qualidade de serviço acordado entre provedores e consumidores. Este trabalho propõe uma nova solução para o problema de seleção de serviços, combinando algoritmos, usualmente empregados para prospecção de dados, para selecionar serviços de forma dinâmica e interativa, com base em atributos não funcionais, visando suprir as necessidades de mobilidade e colaboração em ambientes convergentes e heterogêneos, como a Internet de Serviços. Desta forma, este trabalho contribui para o projeto de pesquisa da Internet de Serviços, um dos pilares fundamentais para a elaboração da nova arquitetura orientada a serviços, que servirá de arcabouço à construção da Internet do Futuro, possibilitando inúmeras aplicações como Serviços Baseados em Localização e Computação em Nuvem. / The increasing mobility of people and resources demands additional efforts in exploring new capacities, offered by different technologies, which allows the supply of information and services in any place, any time and through any device, objective of the worldwide collaboration and communication web, known as Internet. Face to this increasing demand, current Internet is evolving from sharing to contribution model and in the near future, to collaboration model between providers and consumers. Although the current Internet has been extraordinarily successful, as a ubiquitous and universal means for communication, its architecture imposes limits for services deployment in heterogeneous and convergent environments, demanding new solutions to overcome the technological issues for Internet of Future achievement. These new solutions will be based on service oriented computing principles, providing the components of the new service framework for Internet of Future, called Internet of Services. In Internet of Services, convergent and heterogeneous mobile communication networks will be created on demand, providing a huge numbers of services, similar from functional point of view, but very different from the nonfunctional point of view, creating challenges for service selection which fits the service level agreement between providers and consumers. This work proposes a new solution for service selection process, combining algorithms commonly used for data mining, to perform dynamic and interactive service selection, fulfilling the mobility and collaborative requirements in a convergent and heterogeneous environment, such as Internet of Services. Therefore, this work contributes to Internet of Services research project, one of fundamental pillars to build the service oriented architecture, which will be used as framework for Internet of Future building, allowing many applications such as Location Based Services and Cloud Computing.
|
Page generated in 0.0644 seconds