1 |
Large Data Clustering And Classification Schemes For Data MiningBabu, T Ravindra 12 1900 (has links)
Data Mining deals with extracting valid, novel, easily understood by humans, potentially useful and general abstractions from large data. A data is large when number of patterns, number of features per pattern or both are large. Largeness of data is characterized by its size which is beyond the capacity of main memory of a computer. Data Mining is an interdisciplinary field involving database systems, statistics, machine learning, visualization and computational aspects. The focus of data mining algorithms is scalability and efficiency. Large data clustering and classification is an important activity in Data Mining. The clustering algorithms are predominantly iterative requiring multiple scans of dataset, which is very expensive when data is stored on the disk.
In the current work we propose different schemes that have both theoretical validity and practical utility in dealing with such a large data. The schemes broadly encompass data compaction, classification, prototype selection, use of domain knowledge and hybrid intelligent systems. The proposed approaches can be broadly classified as (a) compressing the data by some means in a non-lossy manner; cluster as well as classify the patterns in their compressed form directly through a novel algorithm, (b) compressing the data in a lossy fashion such that a very high degree of compression and abstraction is obtained in terms of 'distinct subsequences'; classify the data in such compressed form to improve the prediction accuracy, (c) with the help of incremental clustering, a lossy compression scheme and rough set approach, obtain simultaneous prototype and feature selection, (d) demonstrate that prototype selection and data-dependent techniques can reduce number of comparisons in multiclass classification scenario using SVMs, and (e) by making use of domain knowledge of the problem and data under consideration, we show that we obtaina very high classification accuracy with less number of iterations with AdaBoost.
The schemes have pragmatic utility. The prototype selection algorithm is incremental, requiring a single dataset scan and has linear time and space requirements. We provide results obtained with a large, high dimensional handwritten(hw) digit data. The compression algorithm is based on simple concepts, where we demonstrate that classification of the compressed data improves computation time required by a factor 5 with prediction accuracy with both compressed and original data being exactly the same as 92.47%. With the proposed lossy compression scheme and pruning methods, we demonstrate that even with a reduction of distinct sequences by a factor of 6 (690 to 106), the prediction accuracy improves. Specifically, with original data containing 690 distinct subsequences, the classification accuracy is 92.47% and with appropriate choice of parameters for pruning, the number of distinct subsequences reduces to 106 with corresponding classification accuracy as 92.92%. The best classification accuracy of 93.3% is obtained with 452 distinct subsequences. With the scheme of simultaneous feature and prototype selection, we improved classification accuracy to better than that obtained with kNNC, viz., 93.58%, while significantly reducing the number of features and prototypes, achieving a compaction of 45.1%. In case of hybrid schemes based on SVM, prototypes and domain knowledge based tree(KB-Tree), we demonstrated reduction in SVM training time by 50% and testing time by about 30% as compared to complete data and improvement of classification accuracy to 94.75%. In case of AdaBoost the classification accuracy is 94.48%, which is better than those obtained with NNC and kNNC on the entire data; the training timing is reduced because of use of prototypes instead of the complete data. Another important aspect of the work is to devise a KB-Tree (with maximum depth of 4), that classifies a 10-category data in just 4 comparisons.
In addition to hw data, we applied the schemes to Network Intrusion Detection Data (10% dataset of KDDCUP99) and demonstrated that the proposed schemes provided less overall cost than the reported values.
|
2 |
Χρήση υβριδικών ευφυών μεθόδων για προσαρμοστική αξιολόγηση μαθητών σε ευφυές σύστημα διδασκαλίας στο διαδίκτυοΠαπαβλασόπουλος, Κωνσταντίνος 12 January 2009 (has links)
Τα Ευφυή Συστήματα Διδασκαλίας (Intelligent Tutoring Systems) είναι συστήματα που χρησιμοποιούν μεθόδους Τεχνητής Νοημοσύνης για την παροχή εξατομικευμένης διδασκαλίας, τα τελευταία χρόνια και μέσω Διαδικτύου. Τα συστήματα αυτά προσφέρουν δηλαδή μάθηση προσαρμοζόμενη στις δυνατότητες και της ανάγκες των μαθητών-φοιτητών. Ένα σημαντικό τμήμα των συστημάτων αυτών αφορά την αξιολόγηση των μαθητών. Η αξιολόγηση αφορά τον προσδιορισμό του επιπέδου γνώσης ενός μαθητή. Αυτό συνήθως γίνεται με την μέτρηση της απόδοσης του μαθητή σ’ ένα ή περισσότερα τεστ που περιέχουν ερωτήσεις-ασκήσεις που αναφέρονται σε ένα σύνολο εννοιών και είναι διαφόρων επιπέδων δυσκολίας. Ένα σημαντικό στοιχείο στην υπόθεση αυτή είναι ο σωστός προσδιορισμός του επιπέδου δυσκολίας των ερωτήσεων-ασκήσεων. Ένα δεύτερο στοιχείο είναι ο σωστός σχεδιασμός των τεστ ώστε να ανταποκρίνεται στις ανάγκες του κάθε μαθητή, ανάλογα με την μελέτη που έχει κάνει. Ένα τρίτο στοιχείο αφορά τις ευφυείς μεθόδους που θα χρησιμοποιηθούν για την επίτευξη των παραπάνω δύο στοιχείων. Συνήθως χρησιμοποιούνται απλές μέθοδοι, όπως π.χ. κανόνες παραγωγής ή σημαντικά δίκτυα. Μια ενδιαφέρουσα ερευνητική κατεύθυνση είναι η χρήση υβριδικών ευφυών τεχνικών, δηλαδή τεχνικών που συνδυάζουν δύο τουλάχιστον γνωστές ευφυείς τεχνικές, όπως είναι π.χ. ο συνδυασμός κανόνων παραγωγής και γενετικών αλγορίθμων.
Το αντικείμενο αυτής της μεταπτυχιακής διπλωματικής εργασίας είναι: (α) η εύρεση μιας μεθόδου για ρεαλιστικότερο προσδιορισμό του επιπέδου δυσκολίας των ερωτήσεων-ασκήσεων, (β) η εύρεση μιας μεθόδου για προσαρμοστικό σχεδιασμό των τεστ αξιολόγησης των μαθητών, ώστε να ανταποκρίνονται στις ανάγκες και δυνατότητες του καθενός χωριστά, (γ) η χρήση υβριδικών ευφυών τεχνικών και (δ) η εφαρμογή των παραπάνω σ’ ένα υπάρχον ευφυές σύστημα διδασκαλίας θεμάτων τεχνητής νοημοσύνης. / Intelligent Tutoring Systems (ITSs) are systems that use AI techniques in order to provide adaptive assessment. ITSs adapt the course material to the student's needs, based on his/her profile and knowledge level. An important function of such systems is student evaluation. Student evaluation refers to the evaluation of the
knowledge level of a student after having dealt with a learning page.
This is achieved by processing the results of the exercises offered at the end of a learning page. Estimation of the knowledge level of a concept is based, among others, on the difficulty level of the correctly answered exercises included in the test. So, the right determination of the difficulty level of an exercise is very important.
Another important issue is the design of the tests in order to correspond student's needs based on their study. Feedback from the students saved in the student model should be taken into account for determination of the difficulty levels of the questions/exercises that will be chosen from each concept. A third important issue is the use of Hybrid Intelligent Methods to achieve the two mentioned issues. Most ITSs use simple methods like semantic networks or production rules. An interesting research direction is the useof hybrid AI methods which combine at least two well known AI techniques like
production rules and genetic algorithms.
The scope of this paper is (a) the determination of a realistic method for exercise difficulty level adaptation (b) the determination of a
method for the personalized assessment of the learner according to a student model (c) the use of Hybrid Intelligent Methods and (d) the
implementation of all the above in an Artificial Intelligence Teaching System of the course of "Artificial Intelligence".
|
3 |
Integration of intelligent systems in development of smart adaptive systems:linguistic equation approachJuuso, E. (Esko) 19 November 2013 (has links)
Abstract
Smart adaptive systems provide advanced tools for monitoring, control, diagnostics and management of nonlinear multivariate processes. Data mining with a multitude of methodologies is a good basis for the integration of intelligent systems. Small, specialised systems have a large number of feasible solutions, but highly complex systems require domain expertise and more compact approaches at the basic level. Linguistic equation (LE) approach originating from fuzzy logic is an efficient technique for these problems. This research is focused on the smart adaptive applications, where different intelligent modules are used in a smart way.
The nonlinear scaling methodology based on advanced statistical analysis is the corner stone in representing the variable meanings in a compact way to introduce intelligent indices for control and diagnostics. The new constraint handling together with generalised norms and moments facilitates recursive parameter estimation approaches for the adaptive scaling. Well-known linear methodologies are used for the steady state, dynamic and case-based modelling in connection with the cascade and interactive structures in building complex large scale applications. To achieve insight and robustness the parameters are defined separately for the scaling and the interactions. The LE based intelligent analysers are useful in the multilevel LE control and diagnostics: the LE control is enhanced with the intelligent analysers, adaptive and model-based modules and high level control. The operating area is extended with the predefined adaptation and specific events activate appropriate control actions. The condition, stress and trend indices are used for the detection of operating conditions. The same overall structure is extended to the scheduling and managerial decision support. The linguistic representation becomes increasingly important when the human interaction is essential.
The new scaling approach is used in control and diagnostic applications and discussed in connection with previous multivariate modelling cases. The LE based intelligent analysers are the key modules of the system integration, which produces hybrid systems: fuzzy systems move gradually to higher levels, neural networks and evolutionary computing are used for tuning. The overall system is reinforced with advanced statistical analysis, signal processing, feature extraction, classification and mechanistic modelling. / Tiivistelmä
Viisaat mukautuvat järjestelmät sisältävät kehittyneitä työkaluja epälineaaristen monimuuttujaisten prosessien valvontaan, säätöön, diagnostiikkaan ja johtamiseen. Laajaan menetelmäpohjaan perustuva tiedonrikastus on pohjana älykkäiden järjestelmien yhdistämiselle. Pienille erikoistuneille järjestelmille on monia toteutettavissa olevia ratkaisuja, mutta erittäin monimutkaiset järjestelmät vaativat alan asiantuntemusta ja kompakteja lähestymistapoja perustasolla. Sumeaan logiikkaan pohjautuva lingvististen yhtälöiden (linguistic equation, LE) menetelmä on tehokas ratkaisu näissä ongelma-alueissa. Tämä tutkimus kohdistuu viisaisiin mukautuviin sovelluksiin, jossa useita älykkäitä moduuleja käytetään yhdessä viisaalla tavalla.
Kehittyneeseen tilastolliseen analyysiin perustuva epälineaarinen skaalausmenetelmä muodostaa ratkaisun kulmakiven: muuttujien merkitykset soveltuvat säädössä ja diagnostiikassa käytettävien älykkäiden indeksien kehittämiseen. Uudet rajoituksien käsittelymenetelmät yhdessä yleistettyjen normien ja momenttien kanssa mahdollistavat rekursiivisen parametriestimoinnin olosuhteisiin mukautuvassa skaalauksessa. Tunnettuja lineaarisia menetelmiä käytetään staattisessa, dynaamisessa ja tapauspohjaisessa mallintamisessa, jossa kaskadi- ja vuorovaikutusrakenteet laajentavat mallit tarvittaessa monimutkaisiin sovelluksiin. Prosessituntemuksen ja järjestelmien robustisuuden varmistamiseksi parametrit määritellään erikseen skaalausta ja vuorovaikutuksia varten.
LE-pohjaiset älykkäät analysaattorit ovat hyödyllisiä monitasoisessa säädössä ja diagnostiikassa: LE-säätöä parannetaan älykkäiden analysaattorien, adaptiivisten ja mallipohjaisten moduulien sekä ylemmän tason säädön avulla. Käyttöaluetta laajennetaan ennalta määrätyllä adaptoinnilla sekä tiettyjen tapahtumien aktivoimilla erityisillä säätötoimenpiteillä. Kunto-, rasitus- ja trendi-indeksejä käytetään olosuhteiden tunnistamiseen. Sama rakenne laajennetaan tuotannon ajoitukseen ja päätöksenteontukeen, jossa inhimillisen vuorovaikutuksen käsittely tekee lingvistisen esityksen yhä tärkeämmäksi.
Uutta skaalausmenetelmää tarkastellaan säätö- ja diagnostiikkasovelluksissa sekä vertaillaan lyhyesti sen käyttömahdollisuuksia aikaisemmin toteutetuissa monimuuttujamalleissa. LE-pohjaiset älykkäät analysaattorit ovat keskeisiä integroitaessa moduuleja hybridiratkaisuiksi: sumeat järjestelmät siirtyvät vähitellen ylemmille tasoille ja neuro- ja evoluutiolaskennassa keskitytään järjestelmien viritykseen. Kokonaisjärjestelmää vahvistetaan kehittyneellä tilastollisella analyysilla, signaalinkäsittelyllä, piirteiden erottamisella, luokittelulla ja mekanistisella mallintamisella.
|
Page generated in 0.0989 seconds