Spelling suggestions: "subject:"data minining anda classification"" "subject:"data minining anda 1classification""
1 |
Backdoor Detection based on SVMTzeng, Zhong-Chiang 29 July 2005 (has links)
With the improvement of computer technologies and the wide use of the Internet, network security becomes more and more significant. According to the relevant statistics, malicious codes such as virus, worms, backdoors, and Trojans launch a lot of attacks. Backdoors are especially critical. Not only can it cross firewalls and antivirus software but also will steal confidential information and misuse network resources and launch attacks such as DDoS¡]Distributed Denial of Service¡^.
In this research, we analyze the properties and categories of backdoors and the application of data mining and support vector machines in intrusion detection. This research will focus on detecting the behavior of backdoor connection, and we propose a detecting architecture. The architecture is based on SVM, which is a machine learning method based on statistic theory and proposed by Vapnik to solve the problems in Neural Network techniques.
In system modules, this research chooses IPAudit as our network monitor and libsvm as a SVM classifier. The packets captured by IPAudit will be classified into interactive or non-interactive flow by libsvm, and the result will be compared with legal service lists to determine whether a connection is a backdoor connection. We compare the accuracy of SVM, C4.5, and Na
|
2 |
The role of classifiers in feature selection : number vs natureChrysostomou, Kyriacos January 2008 (has links)
Wrapper feature selection approaches are widely used to select a small subset of relevant features from a dataset. However, Wrappers suffer from the fact that they only use a single classifier when selecting the features. The problem of using a single classifier is that each classifier is of a different nature and will have its own biases. This means that each classifier will select different feature subsets. To address this problem, this thesis aims to investigate the effects of using different classifiers for Wrapper feature selection. More specifically, it aims to investigate the effects of using different number of classifiers and classifiers of different nature. This aim is achieved by proposing a new data mining method called Wrapper-based Decision Trees (WDT). The WDT method has the ability to combine multiple classifiers from four different families, including Bayesian Network, Decision Tree, Nearest Neighbour and Support Vector Machine, to select relevant features and visualise the relationships among the selected features using decision trees. Specifically, the WDT method is applied to investigate three research questions of this thesis: (1) the effects of number of classifiers on feature selection results; (2) the effects of nature of classifiers on feature selection results; and (3) which of the two (i.e., number or nature of classifiers) has more of an effect on feature selection results. Two types of user preference datasets derived from Human-Computer Interaction (HCI) are used with WDT to assist in answering these three research questions. The results from the investigation revealed that the number of classifiers and nature of classifiers greatly affect feature selection results. In terms of number of classifiers, the results showed that few classifiers selected many relevant features whereas many classifiers selected few relevant features. In addition, it was found that using three classifiers resulted in highly accurate feature subsets. In terms of nature of classifiers, it was showed that Decision Tree, Bayesian Network and Nearest Neighbour classifiers caused signficant differences in both the number of features selected and the accuracy levels of the features. A comparison of results regarding number of classifiers and nature of classifiers revealed that the former has more of an effect on feature selection than the latter. The thesis makes contributions to three communities: data mining, feature selection, and HCI. For the data mining community, this thesis proposes a new method called WDT which integrates the use of multiple classifiers for feature selection and decision trees to effectively select and visualise the most relevant features within a dataset. For the feature selection community, the results of this thesis have showed that the number of classifiers and nature of classifiers can truly affect the feature selection process. The results and suggestions based on the results can provide useful insight about classifiers when performing feature selection. For the HCI community, this thesis has showed the usefulness of feature selection for identifying a small number of highly relevant features for determining the preferences of different users.
|
3 |
A workflow for the modeling and analysis of biomedical dataMarsolo, Keith Allen, January 2007 (has links)
Thesis (Ph. D.)--Ohio State University, 2007. / Title from first page of PDF file. Includes bibliographical references (p. 229-239).
|
4 |
Příprava cvičení pro dolování znalostí z báze dat - klasifikace a predikce / Design of exercises for data mining - Classification and predictionMartiník, Jan January 2009 (has links)
My master's thesis on the topic of "Design of exercises for data mining - Classification and prediction" deals with the most frequently used methods classification and prediction. There are association rules, Bayesian classification, genetic algorithms, the nearest method neighbor, neural network and decision trees on the classification. There are linear and non-linear prediction on the prediction. This work also contains a summary of detail the issue of decision trees and a detailed algorithm for creating the decision tree, including development of individual diagrams. The proposed algorithm for creating the decision tree is tested through two tests of data dowloaded from Internet. The results are mutually compared and described differences between the two implementations. The work is written in a way that would provide the reader with a notion of the individual methods and techniques for data mining, their advantages, disadvantages and some of the issues that directly relate to this topic.
|
5 |
Mineração de Dados Educacionais: Previsão de notas parciais utilizando classificaçãoSousa, Marília Maria Bastos de Araújo Cavalcanti Feitosa Fava de, 92981772658 29 September 2017 (has links)
Submitted by Marília Sousa (mariliamariafeitoza@gmail.com) on 2018-07-26T12:25:36Z
No. of bitstreams: 3
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Dissertação Marília.pdf: 1106096 bytes, checksum: 5f4d3a102f590e08a72c6af9ef02d2e4 (MD5)
folha de aprovação.pdf: 114224 bytes, checksum: 83acb0aa4ff29dd5cc1364b9b391ac77 (MD5) / Approved for entry into archive by Secretaria PPGI (secretariappgi@icomp.ufam.edu.br) on 2018-07-26T18:20:47Z (GMT) No. of bitstreams: 3
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Dissertação Marília.pdf: 1106096 bytes, checksum: 5f4d3a102f590e08a72c6af9ef02d2e4 (MD5)
folha de aprovação.pdf: 114224 bytes, checksum: 83acb0aa4ff29dd5cc1364b9b391ac77 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2018-07-27T12:39:14Z (GMT) No. of bitstreams: 3
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Dissertação Marília.pdf: 1106096 bytes, checksum: 5f4d3a102f590e08a72c6af9ef02d2e4 (MD5)
folha de aprovação.pdf: 114224 bytes, checksum: 83acb0aa4ff29dd5cc1364b9b391ac77 (MD5) / Made available in DSpace on 2018-07-27T12:39:15Z (GMT). No. of bitstreams: 3
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Dissertação Marília.pdf: 1106096 bytes, checksum: 5f4d3a102f590e08a72c6af9ef02d2e4 (MD5)
folha de aprovação.pdf: 114224 bytes, checksum: 83acb0aa4ff29dd5cc1364b9b391ac77 (MD5)
Previous issue date: 2017-09-29 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The present work introduces the Educational Data Mining and an experiment involving prediction of partial exams. The experiment uses data of the Introduction to Computer Programming course of the Federal University of Amazonas and seeks to classify the students according to their grade, in a maximum of three classes: satisfactory, unsatisfactory and without concept (dropout students). As conclusion, there is a quantitative analysis with the predictive data. / O presente trabalho tem o intuito de apresentar a Mineração de Dados Educacionais e um experimento envolvendo previsão de provas parciais. O experimento é realizado através dos dados da disciplina de Introdução à Programação de Computadores da Universidade Federal do Amazonas e busca classificar os alunos de acordo com as notas obtidas, em no máximo três classes: satisfatório, insatisfatório e sem conceito (alunos evadidos). Como conclusão, tem-se uma análise quantitativa com os dados da previsão.
|
6 |
Automatic Patent ClassificationYehe, Nala January 2020 (has links)
Patents have a great research value and it is also beneficial to the community of industrial, commercial, legal and policymaking. Effective analysis of patent literature can reveal important technical details and relationships, and it can also explain business trends, propose novel industrial solutions, and make crucial investment decisions. Therefore, we should carefully analyze patent documents and use the value of patents. Generally, patent analysts need to have a certain degree of expertise in various research fields, including information retrieval, data processing, text mining, field-specific technology, and business intelligence. In real life, it is difficult to find and nurture such an analyst in a relatively short period of time, enabling him or her to meet the requirement of multiple disciplines. Patent classification is also crucial in processing patent applications because it will empower people with the ability to manage and maintain patent texts better and more flexible. In recent years, the number of patents worldwide has increased dramatically, which makes it very important to design an automatic patent classification system. This system can replace the time-consuming manual classification, thus providing patent analysis managers with an effective method of managing patent texts. This paper designs a patent classification system based on data mining methods and machine learning techniques and use KNIME software to conduct a comparative analysis. This paper will research by using different machine learning methods and different parts of a patent. The purpose of this thesis is to use text data processing methods and machine learning techniques to classify patents automatically. It mainly includes two parts, the first is data preprocessing and the second is the application of machine learning techniques. The research questions include: Which part of a patent as input data performs best in relation to automatic classification? And which of the implemented machine learning algorithms performs best regarding the classification of IPC keywords? This thesis will use design science research as a method to research and analyze this topic. It will use the KNIME platform to apply the machine learning techniques, which include decision tree, XGBoost linear, XGBoost tree, SVM, and random forest. The implementation part includes collection data, preprocessing data, feature word extraction, and applying classification techniques. The patent document consists of many parts such as description, abstract, and claims. In this thesis, we will feed separately these three group input data to our models. Then, we will compare the performance of those three different parts. Based on the results obtained from these three experiments and making the comparison, we suggest using the description part data in the classification system because it shows the best performance in English patent text classification. The abstract can be as the auxiliary standard for classification. However, the classification based on the claims part proposed by some scholars has not achieved good performance in our research. Besides, the BoW and TFIDF methods can be used together to extract efficiently the features words in our research. In addition, we found that the SVM and XGBoost techniques have better performance in the automatic patent classification system in our research.
|
Page generated in 0.164 seconds