• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Large Data Clustering And Classification Schemes For Data Mining

Babu, T Ravindra 12 1900 (has links)
Data Mining deals with extracting valid, novel, easily understood by humans, potentially useful and general abstractions from large data. A data is large when number of patterns, number of features per pattern or both are large. Largeness of data is characterized by its size which is beyond the capacity of main memory of a computer. Data Mining is an interdisciplinary field involving database systems, statistics, machine learning, visualization and computational aspects. The focus of data mining algorithms is scalability and efficiency. Large data clustering and classification is an important activity in Data Mining. The clustering algorithms are predominantly iterative requiring multiple scans of dataset, which is very expensive when data is stored on the disk. In the current work we propose different schemes that have both theoretical validity and practical utility in dealing with such a large data. The schemes broadly encompass data compaction, classification, prototype selection, use of domain knowledge and hybrid intelligent systems. The proposed approaches can be broadly classified as (a) compressing the data by some means in a non-lossy manner; cluster as well as classify the patterns in their compressed form directly through a novel algorithm, (b) compressing the data in a lossy fashion such that a very high degree of compression and abstraction is obtained in terms of 'distinct subsequences'; classify the data in such compressed form to improve the prediction accuracy, (c) with the help of incremental clustering, a lossy compression scheme and rough set approach, obtain simultaneous prototype and feature selection, (d) demonstrate that prototype selection and data-dependent techniques can reduce number of comparisons in multiclass classification scenario using SVMs, and (e) by making use of domain knowledge of the problem and data under consideration, we show that we obtaina very high classification accuracy with less number of iterations with AdaBoost. The schemes have pragmatic utility. The prototype selection algorithm is incremental, requiring a single dataset scan and has linear time and space requirements. We provide results obtained with a large, high dimensional handwritten(hw) digit data. The compression algorithm is based on simple concepts, where we demonstrate that classification of the compressed data improves computation time required by a factor 5 with prediction accuracy with both compressed and original data being exactly the same as 92.47%. With the proposed lossy compression scheme and pruning methods, we demonstrate that even with a reduction of distinct sequences by a factor of 6 (690 to 106), the prediction accuracy improves. Specifically, with original data containing 690 distinct subsequences, the classification accuracy is 92.47% and with appropriate choice of parameters for pruning, the number of distinct subsequences reduces to 106 with corresponding classification accuracy as 92.92%. The best classification accuracy of 93.3% is obtained with 452 distinct subsequences. With the scheme of simultaneous feature and prototype selection, we improved classification accuracy to better than that obtained with kNNC, viz., 93.58%, while significantly reducing the number of features and prototypes, achieving a compaction of 45.1%. In case of hybrid schemes based on SVM, prototypes and domain knowledge based tree(KB-Tree), we demonstrated reduction in SVM training time by 50% and testing time by about 30% as compared to complete data and improvement of classification accuracy to 94.75%. In case of AdaBoost the classification accuracy is 94.48%, which is better than those obtained with NNC and kNNC on the entire data; the training timing is reduced because of use of prototypes instead of the complete data. Another important aspect of the work is to devise a KB-Tree (with maximum depth of 4), that classifies a 10-category data in just 4 comparisons. In addition to hw data, we applied the schemes to Network Intrusion Detection Data (10% dataset of KDDCUP99) and demonstrated that the proposed schemes provided less overall cost than the reported values.
2

Χρήση υβριδικών ευφυών μεθόδων για προσαρμοστική αξιολόγηση μαθητών σε ευφυές σύστημα διδασκαλίας στο διαδίκτυο

Παπαβλασόπουλος, Κωνσταντίνος 12 January 2009 (has links)
Τα Ευφυή Συστήματα Διδασκαλίας (Intelligent Tutoring Systems) είναι συστήματα που χρησιμοποιούν μεθόδους Τεχνητής Νοημοσύνης για την παροχή εξατομικευμένης διδασκαλίας, τα τελευταία χρόνια και μέσω Διαδικτύου. Τα συστήματα αυτά προσφέρουν δηλαδή μάθηση προσαρμοζόμενη στις δυνατότητες και της ανάγκες των μαθητών-φοιτητών. Ένα σημαντικό τμήμα των συστημάτων αυτών αφορά την αξιολόγηση των μαθητών. Η αξιολόγηση αφορά τον προσδιορισμό του επιπέδου γνώσης ενός μαθητή. Αυτό συνήθως γίνεται με την μέτρηση της απόδοσης του μαθητή σ’ ένα ή περισσότερα τεστ που περιέχουν ερωτήσεις-ασκήσεις που αναφέρονται σε ένα σύνολο εννοιών και είναι διαφόρων επιπέδων δυσκολίας. Ένα σημαντικό στοιχείο στην υπόθεση αυτή είναι ο σωστός προσδιορισμός του επιπέδου δυσκολίας των ερωτήσεων-ασκήσεων. Ένα δεύτερο στοιχείο είναι ο σωστός σχεδιασμός των τεστ ώστε να ανταποκρίνεται στις ανάγκες του κάθε μαθητή, ανάλογα με την μελέτη που έχει κάνει. Ένα τρίτο στοιχείο αφορά τις ευφυείς μεθόδους που θα χρησιμοποιηθούν για την επίτευξη των παραπάνω δύο στοιχείων. Συνήθως χρησιμοποιούνται απλές μέθοδοι, όπως π.χ. κανόνες παραγωγής ή σημαντικά δίκτυα. Μια ενδιαφέρουσα ερευνητική κατεύθυνση είναι η χρήση υβριδικών ευφυών τεχνικών, δηλαδή τεχνικών που συνδυάζουν δύο τουλάχιστον γνωστές ευφυείς τεχνικές, όπως είναι π.χ. ο συνδυασμός κανόνων παραγωγής και γενετικών αλγορίθμων. Το αντικείμενο αυτής της μεταπτυχιακής διπλωματικής εργασίας είναι: (α) η εύρεση μιας μεθόδου για ρεαλιστικότερο προσδιορισμό του επιπέδου δυσκολίας των ερωτήσεων-ασκήσεων, (β) η εύρεση μιας μεθόδου για προσαρμοστικό σχεδιασμό των τεστ αξιολόγησης των μαθητών, ώστε να ανταποκρίνονται στις ανάγκες και δυνατότητες του καθενός χωριστά, (γ) η χρήση υβριδικών ευφυών τεχνικών και (δ) η εφαρμογή των παραπάνω σ’ ένα υπάρχον ευφυές σύστημα διδασκαλίας θεμάτων τεχνητής νοημοσύνης. / Intelligent Tutoring Systems (ITSs) are systems that use AI techniques in order to provide adaptive assessment. ITSs adapt the course material to the student's needs, based on his/her profile and knowledge level. An important function of such systems is student evaluation. Student evaluation refers to the evaluation of the knowledge level of a student after having dealt with a learning page. This is achieved by processing the results of the exercises offered at the end of a learning page. Estimation of the knowledge level of a concept is based, among others, on the difficulty level of the correctly answered exercises included in the test. So, the right determination of the difficulty level of an exercise is very important. Another important issue is the design of the tests in order to correspond student's needs based on their study. Feedback from the students saved in the student model should be taken into account for determination of the difficulty levels of the questions/exercises that will be chosen from each concept. A third important issue is the use of Hybrid Intelligent Methods to achieve the two mentioned issues. Most ITSs use simple methods like semantic networks or production rules. An interesting research direction is the useof hybrid AI methods which combine at least two well known AI techniques like production rules and genetic algorithms. The scope of this paper is (a) the determination of a realistic method for exercise difficulty level adaptation (b) the determination of a method for the personalized assessment of the learner according to a student model (c) the use of Hybrid Intelligent Methods and (d) the implementation of all the above in an Artificial Intelligence Teaching System of the course of "Artificial Intelligence".

Page generated in 0.1007 seconds