Global ETD Search

31	Δυναμική κατασκευή μεγάλης κλίμακας ταξονομίας σε Crowdsourcing περιβάλλοντα Καραμπίνας, Δημήτρης 15 May 2012 (has links) Στις μέρες μας οι χρήστες εκτός από 'καταναλωτές' πληροφορίας στο διαδίκτυο είναι και 'παραγωγοί' και διαχειριστές της. Μια συνήθης πρακτική είναι η σήμανση του περιεχομένου που διαμοιράζονται με ετικέτες (tags) και η χρήση των ετικετών αυτών σε διαδικασίες αναζήτησης ή εύρεσης περιεχομένου με παρόμοια χαρακτηριστικά. Ένα από τα εργαλεία που ευρέως χρησιμοποιούνται στις διαδικασίες αυτές είναι οι ταξονομίες (taxonomies). Οι ταξονομίες είναι δενδρικές δομές που συνίστανται από κόμβους, καθένας από τους οποίους αντιπροσωπεύει μια κατηγορία-ένοια και συνδέεται με τα παιδιά και τον γονέα του με σχέσεις 'IS-A'. Δημιουργούνται κατά βάση χειρωνακτικά, από ειδικούς, ενώ η ανανέωση και επέκτασή τους είναι αρκετά χρονοβόρες ενέργειες. Στην εργασία αυτή ασχολούμαστε με την αυτόματη εξαγωγή ταξονομίας από είσοδο που προέρχεται από μια κοινότητα χρηστών. Θεωρούμε πως οι χρήστες μας είναι ικανοί να παρέχουν σχέσεις ετικετών που περιγράφουν καταστάσεις υπερκατηγορίας-υποκατηγορίας μεταξύ θεματικών κόμβων και προσπαθούμε να τις συγκεράσουμε ώστε να προκύψει μια ταξονομία. Η τελική ταξονομία είναι συμβατή με τα 'κβάντα' πληροφορίας που έχουμε στη διάθεσή μας, επιλύει αντικρουόμενες απόψεις χρηστών όσον αφορά την τελική της δομή και αποτυπώνει την ικανότητα της κοινότητας στη διακριτοποίηση εννοιών. Προτείνουμε έναν αλγόριθμο κατασκευής μιας ταξονομίας και θα αξιολογήσουμε την απόδοσή του, χρησιμοποιώντας τόσο συνθετικά, όσο και πραγματικά δεδομένα. Γίνεται επίσης προσαρμογή και μελέτη του συστήματος σε crowdsourcing περιβάλλοντα, περιβάλλοντα δηλαδή όπου ένας μεγάλος αριθμός από χρήστες χρησιμοποιείται για την περάτωση μικρών εργασιών που χαρακτηριστικό τους είναι η αδυναμία εκτέλεσης τους από υπολογιστικά συστήματα. / Taxonomies are a useful mechanism to organize, evaluate, and search web content. As such, many popular classes of web applications, utilize them. However, their manual generation and maintenance by experts is a timecostly procedure, often resulting in platform dependent and static vocabularies. We propose a new approach for constructing taxonomies. Our idea is based on the proven, increased human involvement and desire to annotate web content (e.g., in social media and product categorization applications). We define the required input from humans in the form of explicit structural, e.g., supertype-subtype relationships between concepts. In this way, we harvest, via common annotation practices, the collective wisdom of users with respect to the (categorization of) web content they share and access. We further define the principles upon which crowdsourced taxonomy construction algorithms should be based. We show that the resulting problem is NP-Hard. We provide heuristic algorithms that aggregate human input, resolving conflicting input, and produce taxonomies. We evaluate our algorithms with real world crowdsourcing experiments and on real world taxonomies. Ταξινομία 006.312 Taxonomy Crowdsourcing
32	APISENSE® : une plate-forme répartie pour la conception, le déploiement et l’exécution de campagnes de collecte de données sur des terminaux intelligents / APISENSE® : a distributed platform for deploying, executing and managing data collection campaigns using smart devices Haderer, Nicolas 05 November 2014 (has links) Le mobile crowdsensing est une nouvelle forme de collecte de données exploitant la foule de terminaux intelligents déjà déployés à travers le monde pour collecter massivement des données environnementales ou comportementales d'une population.Ces dernières années, ce type de collecte de données a suscité l'intérêt d'un grand nombre d'acteurs industriels et académiques dans de nombreux domaines tels que l'étude de la mobilité urbaine, la surveillance de l'environnement, la santé ou l'étude des comportements socioculturels. Cependant, le mobile crowdsensing n'en n'est qu'à ses premiers stades de développement, et de nombreux défis doivent encore être relevés pour pleinement profiter de son potentiel. Ces défis incluent la protection de la vie privée des utilisateurs, les ressources énergétiques limitées des terminaux mobiles, la mise en place de modèles de récompense et de déploiement adaptés pour recruter les utilisateurs les plus à même de collecter les données désirées, ainsi que faire face à l’hétérogénéité des plateformes mobiles disponibles. Dans cette thèse, nous avons cherché à réétudier les architectures des systèmes dédiés au mobile crowdsensing pour adresser les limitations liées au développement, au déploiement et à l'exécution de campagnes de collecte de données. Les différentes contributions proposées sont articulées autour APISENSE, la plate-forme résultante des travaux de cette thèse. APISENSE a été utilisé pour réaliser une campagne de collecte de données déployée auprès d'une centaine d'utilisateurs au sein d'une étude sociologique, et évalué à travers des expériences qui démontrent la validité, l'efficacité et le passage à échelle de notre solution. / Mobile crowdsensing is a new form of data collection that takes advantage of millions smart devices already deployed throughout the world to collect massively environmental or behavioral data from a population. Recently, this type of data collection has attracted interest from a large number of industrials and academic players in many areas, such as the study of urban mobility, environmental monitoring, health or the study of sociocultural attitudes. However, mobile crowdsensing is in its early stages of development, and many challenges remain to be addressed to take full advantage of its potential. These challenges include privacy, limited energy resources of devices, development of reward and recruitment models to select appropriates mobile users and dealing with heterogeneity of mobile platforms available. In this thesis, we aim to reconsider the architectural design of current mobile crowdsensing systems to provide a simple and effective way to design, deploy and manage data collection campaigns.The main contributions of this thesis are organize around APISENSE, the resulting platform of this research. APISENSE has been used to carry out a data collection campaign deployed over hundred of users in a sociological study and evaluated through experiments demonstrating the validity, effectiveness and scalability of our solution. Collecte de données 006.312
33	Integrating text-mining approaches to identify entities and extract events from the biomedical literature Gerner, Lars Martin Anders January 2012 (has links) The amount of biomedical literature available is increasing at an exponential rate and is becoming increasingly difficult to navigate. Text-mining methods can potentially mitigate this problem, through the systematic and large-scale extraction of structured information from inherently unstructured biomedical text. This thesis reports the development of four text-mining systems that, by building on each other, has enabled the extraction of information about a large number of published statements in the biomedical literature. The first system, LINNAEUS, enables highly accurate detection ('recognition') and identification ('normalization') of species names in biomedical articles. Building on LINNAEUS, we implemented a range of improvements in the GNAT system, enabling high-throughput gene/protein detection and identification. Using gene/protein identification from GNAT, we developed the Gene Expression Text Miner (GETM), which extracts information about gene expression statements. Finally, building on GETM as a pilot project, we constructed the BioContext integrated event extraction system, which was used to extract information about over 11 million distinct biomolecular processes in 10.9 million abstracts and 230,000 full-text articles. The ability to detect negated statements in the BioContext system enables the preliminary analysis of potential contradictions in the biomedical literature. All tools (LINNAEUS, GNAT, GETM, and BioContext) are available under open-source software licenses, and LINNAEUS and GNAT are available as online web-services. All extracted data (36 million BioContext statements, 720,000 GETM statements, 72,000 contradictions, 37 million mentions of species names, 80 million mentions of gene names, and 57 million mentions of anatomical location names) is available for bulk download. In addition, the data extracted by GETM and BioContext is also available to biologists through easy-to-use search interfaces. 006.312 Biomedical text mining
34	Τεχνικές για την εξαγωγή γνώσης από την πλατφόρμα του Twitter Δήμας, Αναστάσιος 12 October 2013 (has links) Η χρήση του Twitter από ολοένα και περισσότερους ανθρώπους έχει ως συνέπεια την παραγωγή μεγάλου όγκου «υποκειμενικών» δεδομένων. Η ανάγκη για εξεύρεση τυχόν πολύτιμης κρυμμένης πληροφορίας σε αυτά τα δεδομένα, έδωσε ώθηση στην ανάπτυξη ενός νέου πεδίου έρευνας, του Sentiment Analysis, που έχει ως αντικείμενο τον εντοπισμό του συναισθήματος ενός χρήστη (ή μιας ομάδας χρηστών) ως προς κάποιο θέμα. Οι παραδοσιακοί αλγόριθμοι και μέθοδοι εντοπισμού συναισθήματος στηρίζονται στην λεκτική ανάλυση φράσεων ή προτάσεων σε «επίσημα» κείμενα και καλούνται word based approaches. Ωστόσο, το μικρό μέγεθος των κειμένων του Twitter, σε συνδυασμό με την χαλαρότητα της χρησιμοποιούμενης γλώσσας (από πλευράς χρηστών), δεν επιτρέπει την αποτελεσματική χρήση αυτών των τεχνικών. Για τον λόγο αυτό, προτιμάται η χρήση τεχνικών που βασίζονται σε χαρακτήρες (αντί για λέξεις) και καλούνται character based approaches. Στόχος της διπλωματικής εργασίας είναι η εφαρμογή της character based μεθόδου στην ανάλυση tweets πολιτικού περιεχομένου. Συγκεκριμένα, χρησιμοποιήθηκαν δεδομένα από την πολιτική σκηνή των Η.Π.Α., με σκοπό να εντοπιστεί η προτίμηση ενός χρήστη ως προς το Ρεπουμπλικανικό ή το Δημοκρατικό κόμμα μέσω σχετικών tweets. Για την ανάλυση χρησιμοποιήθηκε επιβλεπόμενη μάθηση με την βοήθεια του Naive Bayes ταξινομητή. Αρχικά, συλλέχθηκε ένα σύνολο από 7904 tweets, προερχόμενα από τους επίσημους λογαριασμούς Twitter 48 γερουσιαστών. Το σύνολο αυτό χωρίσθηκε σε δυο επιμέρους σύνολα, το σύνολο εκπαίδευσης και το σύνολο ελέγχου, ελέγχοντας για κάθε μια από τις δυο μεθόδους ανάλυσης (την word based και character based μέθοδο) την ακρίβεια της ταξινόμησης. Από τα πειράματα πρόεκυψε πως η character based μέθοδος ταξινομεί τα tweets με μεγαλύτερη ακρίβεια. Στην συνέχεια συλλέξαμε δυο νέα σύνολα έλεγχου, ένα από τον επίσημο λογαριασμό Twitter του Ρεπουμπλικανικού κόμματος και ένα από τον επίσημο λογαριασμό Twitter του Δημοκρατικού κόμματος. Αυτή την φορά, ως σύνολο εκπαίδευσης χρησιμοποιήθηκε ολόκληρο το αρχικό σύνολο από τα tweets των γερουσιαστών και ελέγχθηκε η ακρίβεια ταξινόμησης για την character based μέθοδο στα δυο νέα σύνολα ελέγχου. Αν και στην περίπτωση του Democratic Twitter account τα αποτελέσματα μπορούν να χαρακτηριστούν ως «ικανοποιητικά», μιας και η ακρίβεια της ταξινόμησης πλησίασε το 80%, για την περίπτωση του Republican Twitter account κάτι τέτοιο δεν ισχύει. Για το λόγο αυτό, προχωρήσαμε σε μια πιο διεξοδική μελέτη της δομής και του περιεχομένου αυτών tweets. Από την ανάλυση προέκυψαν ορισμένα ενδιαφέροντα αποτελέσματα για την προέλευση των χαμηλών ποσοστών στην ακρίβεια ταξινόμησης. Συγκεκριμένα, πρόεκυψε πως στην πλειοψηφία των tweets που έγιναν από τους Ρεπουμπλικάνους γερουσιαστές, δεν περιέχονταν κάποια προσωπική τους άποψη. Ήταν απλά μια αναφορά σε κάποιο άρθρο ή video που είδαν στον διαδίκτυο. Άρα, η πλειοψηφία των tweets αυτών περιέχουν «αντικειμενική» αντί για «υποκειμενική» πληροφορία. Συνεπώς, δεν είναι δυνατόν να εξαχθούν τα χαρακτηριστικά εκείνα που θα βοηθήσουν στον εντοπισμό της πολικότητας των χρηστών. / As more people enter the “social web”, social media platforms are becoming an increasingly valuable source of subjective information. The large volume of social media content available requires automatic techniques in order to process and extract any valuable information. This need recently gave rise to the field of Sentiment Analysis, also known as Opinion Mining. The goal of sentiment analysis is to identify the position of a user (or a group of users – a crowd), with respect to a particular issue or topic. Existing sentiment analysis systems aim at extracting patterns mainly from formal documents with respect to a particular language (most techniques concern English). They either search for discriminative series of words or use dictionaries that assess the meaning and sentiment of specific words and phrases. The limited size of Twitter posts in conjunction with the non-standard vocabulary and shortened words (used by its users) inserts a great deal of noise, making word based approaches ineffective. For all of the above reasons, a new approach was recommended in the literature. This new approach is not based on the study of words but rather on the study of consecutive character sequences (namely character-based approaches). In this work, we demonstrate the superiority of the character based approach over the word based one in determining political sentiment. We argue that this approach can be used in order to efficiently determine the political preference (e.g. Republican or Democrat) of voters or to identify the importance that particular issues have on particular voters. This type of feedback can be useful in the organization of political campaigns or policies. We created a corpus consisting of 7904 tweets, collected from the Twitter accounts of 48 U.S. senators. This corpus was then separated into two sets, the training set and the test set, in order to measure for each method (word and character based) the accuracy of the classification. From the experiments it was found that the character based method classified the tweets with greater accuracy. In the next test, we used two new test sets, one from the official Twitter account of the Republican Party and one from the official Twitter account of the Democratic Party. The main difference, with respect to the previous test, was the use of the total set of tweets collected from the senators’ Twitter accounts as a training set and the use of the tweets from the official Twitter accounts of each party as a test set. Even though from the official Democrat Twitter account, 80% of the tweets were correctly classified as Democrat, for the official Republican Twitter account this is not the case (56.7% accuracy). This was found to be partly because the majority of the Republican account tweets were references to online articles or videos and not the personal opinions or views of the users. In other words, such tweets cannot be characterized as personal (subjective), in order to classify the respective user as leaning towards one party or the other, but rather should be considered as objective. 006.312 Sentiment analysis Twitter Character n-grams
35	Ανάκτηση και ανάλυση χωροχρονικών δεδομένων με βάση τις συνήθειες κινητής επικοινωνίας των χρηστών Νταλιάνη, Χάρις 26 August 2014 (has links) Αποτελεί γεγονός ότι τα κινητά τηλέφωνα χρησιμοποιούνται με όλο και αυξανόμενο ρυθμό για τη συλλογή μεγάλου όγκου δεδομένων αποσκοπώντας στην ανάλυση της ανθρώπινης συμπεριφοράς. Όσο τα κινητά τηλέφωνα εξελίσσονται και τα smartphones διαμορφώνουν μια έντονα ανταγωνιστική και επικερδή αγορά, δημιουργούνται νέα δεδομένα στις λειτουργίες και τις εφαρμογές των κινητών. Η λίστα επαφών μπορεί πλέον να αποτελείται από χιλιάδες εγγραφές καθιστώντας αρκετά χρονοβόρα την αναζήτηση κάποιας επαφής. Στην παρούσα διπλωματική εργασία παρουσιάζεται η πιθανότητα ο χρήστης να καλέσει μια επαφή του βάσει της τρέχουσας τοποθεσίας και του αρχείου κλήσεών του. Η εξαγωγή αυτών των πιθανοτήτων απαιτεί την ομαδοποίηση των κλήσεων του χρήστη βάσει της τοποθεσίας του. Αυτό επιτυγχάνεται με τη χρήση κατάλληλου αλγορίθμου clustering, ο οποίος κατηγοριοποιεί γεωγραφικά τις κλήσεις του χρήστη. Επίσης, έχοντας πρόσβαση στο αρχείο κλήσεων του χρήστη επιστρέφονται από τον αλγόριθμο ποιες επαφές έχουν κληθεί από κάθε τοποθεσία που έχει κάνει κλήση ο χρήστης. Αξιοποιώντας αυτή την πληροφορία υπολογίζεται η πιθανότητα να κληθεί κάθε μία από τις επαφές του από τη συγκεκριμένη τοποθεσία. Ο αλγόριθμος που παρουσιάζεται στη διπλωματική εργασία θα μπορούσε να χρησιμοποιηθεί για την δημιουργία μιας πιο εύχρηστης λίστας επαφών. Τα περισσότερα κινητά διαθέτουν την επιλογή να ορίσει ο χρήστης ποιες επαφές του είναι αυτές που χρησιμοποιεί πιο συχνά αλλά και το αρχείο κλήσεων που εμφανίζει τις κλήσεις με χρονολογική σειρά. Σε καμία από αυτές τις επιλογές δεν αποτελεί παράγοντας η γεωγραφική τοποθεσία του χρήστη. Ο άνθρωπος ως μέλος της κοινωνίας χρησιμοποιεί το κινητό του τηλέφωνο ως μέσο επικοινωνίας και αλληλεπίδρασης με άλλους ανθρώπους. Κατά τη διάρκεια της ημέρας οι τοποθεσίες που επισκέπτεται ο χρήστης διαφέρουν ανάλογα με την επαγγελματική του ιδιότητα, την προσωπικότητα, την ηλικία αλλά την περίοδο του χρόνου. Τις ώρες που ο άνθρωπος βρίσκεται στο χώρο εργασίας του, είναι πιο πιθανό να επικοινωνεί τηλεφωνικά με συνεργάτες του ενώ το βράδυ όταν βρίσκεται στην εστία του είναι αναμενόμενο να συνομιλεί με οικείους και φίλους. Επίσης ειδικές περίοδοι του χρόνου όπως οι διακοπές των γιορτών ή οι διακοπές το καλοκαίρι, συνεπάγονται οι άνθρωποι να ταξιδεύουν στις πόλεις από όπου κατάγονται. Καθίσταται σαφές ότι σε αυτές τις περιπτώσεις η επικοινωνία του χρήστη διαφοροποιείται καλώντας επαφές που δεν χρησιμοποιεί στην καθημερινότητα του. Η τοποθεσία του ατόμου αποτελεί συνεπώς κύριο παράγοντα στην επιλογή της επαφής με την οποία θα επικοινωνήσει. Τέλος, αξιολογήθηκαν τα αποτελέσματα και η ακρίβεια της προσέγγισης που παρουσιάστηκε στη διπλωματική εργασία και εξήχθησαν τα αντίστοιχα συμπεράσματα. / It is a fact that mobile phones are used with an increasing pace to collect large amounts of data in order to analyze the human behavior. As mobile phones evolve and smartphones are shaping a highly competitive and lucrative market, new data on functions and mobile applications are created. The contact list is now composed of thousands of records making it quite time consuming to search a contact. This thesis presents the possibility for the user to call a contact under the current location and call log. The algorithm presented in this thesis could be used to create a more user-friendly contact list. Most mobile devices give the user the option to set which contacts are those used most often and the call log that shows calls in chronological order. None of these options take under consideration the geographical location of the user. Man as a member of society uses a mobile phone as a medium of communication and interaction with other people. During the day the sites the user visits vary depending on the professional status, personality, age, but the time period. The time that the man is in the workplace, they are more likely to communicate by phone with colleagues whereas in the evening when at home, it is expected to talk with relatives and friends. Also specific periods of time such as vacation holiday or summer vacation, involve people to travel to cities where they originate from. It is clear that in these cases the communication of the user differs calling contacts not used in everyday life. The location of the individual is therefore a key factor in the selection of contact which will communicate. Finally, the results and the accuracy of the approach presented in the thesis were evaluated and we exported the corresponding conclusions. Λίστα επαφών 006.312 Contact list Mobile communication
36	Data mining system for tree and network structures in medical images / Σύστημα εξόρυξης δεδομένων από τοπολογίες δένδρων και πλεγμάτων αναπαριστώμενων σε ιατρικές εικόνες Σκούρα, Αγγελική 24 November 2014 (has links) Ανατομικές δομές με δενδρική τοπολογία απαντώνται συχνά στο ανθρώπινο σώμα και οπτικοποιούνται σε ιατρικές εικόνες χρησιμοποιώντας απεικονιστικές τεχνικές με ακτίνες-χ και τη χρήση σκιαγραφικού υλικού. Χαρακτηριστικά παραδείγματα τέτοιων δομών είναι το βρογχικό δένδρο εντός των πνευμόνων το οποίο οπτικοποιείται με εικόνες αξονικής τομογραφίας και τα γαλακτοφόρα δένδρα εσωτερικά του μαστού τα οποία οπτικοποιούνται με γαλακτογραφίες. Σκοπός της παρούσας διδακτορικής διατριβής αποτελεί η ανάπτυξη ενός συνόλου αλγοριθμικών μεθόδων για την αυτοματοποίηση της ανάλυσης των ανατομικών δομών του ανθρωπίνου σώματος που έχουν τοπολογία δένδρου ή τοπολογία δικτύου. Πιο συγκεκριμένα, οι δύο βασικοί στόχοι της διατριβής είναι η ανάπτυξη μεθόδων ειδικά σχεδιασμένων για τη ψηφιακή επεξεργασία των ιατρικών εικόνων που απεικονίζουν δομές με διακλαδώσεις και η ανάπτυξη μεθοδολογικών πλαισίων για τη διερεύνηση της σχέσης μεταξύ τοπολογίας και παθοφυσιολογίας αυτού του τύπου ανατομικών δομών. Το πρώτο κεφάλαιο της διατριβής παρουσιάζει μια βιβλιογραφική ανασκόπηση σχετικά με τις ανατομικές δομές του ανθρωπίνου σώματος με τοπολογία διακλαδώσεων καθώς και το κίνητρο για την παρούσα έρευνα. Οι επιμέρους ερευνητικοί στόχοι, οι κύριες συνεισφορές και η γενικότερη απήχηση της διατριβής αναφέρονται επίσης. Το δεύτερο κεφάλαιο εστιάζει στην κατάτμηση εικόνας. Η κατάτμηση εικόνας αποτελεί το πρώτο βήμα στη διαδικασία ανάλυσης ιατρικών εικόνων και στα συστήματα αναγνώρισης προτύπων και οι αλγόριθμοι κατάτμησης αποτελούν κρίσιμα τμήματα των σύγχρονων ιατρικών διαγνωστικών συστημάτων. Παρά την πλούσια βιβλιογραφία στην περιοχή, η ανάγκη για αποδοτικές μεθοδολογίες κατάτμησης εφαρμόσιμες σε μεγάλο εύρος απεικονιστικών τεχνικών παραμένει. Προσπαθώντας να αντιμετωπιστεί αυτή η ερευνητική πρόκληση, μια καινοτόμα και πλήρως αυτοματοποιημένη μεθοδολογία για την κατάτμηση των δενδρικών ανατομικών δομών παρουσιάζεται. Η βασική ιδέα είναι ο συνδυασμός τεχνικών ανίχνευσης ακμών με μεθόδους ανάπτυξης περιοχών για να επιτευχθεί αποδοτική κατάτμηση. Η υβριδική αυτή προσέγγιση εφαρμόστηκε και αξιολογήθηκε σε δύο σύνολα δεδομένων ιατρικών εικόνων από διαφορετικές απεικονιστικές τεχνικές (γαλακτογραφίες και αγγειογραφίες) και η απόδοσή της συγκρίθηκε με τεχνικές κατάτμησης της υπάρχουσας τεχνολογικής στάθμης. Το τρίτο κεφάλαιο επικεντρώνεται στην ανίχνευση των κόμβων διακλάδωσης το οποίο συνιστά ένα σημαντικό υπολογιστικό στάδιο στα πλαίσια της επεξεργασίας των ιατρικών εικόνων που απεικονίζουν δομές δενδρικής τοπολογίας. Οι κόμβοι διακλάδωσης αποτελούν σημεία-κλειδιά για τον προσδιορισμό της θέσης του δένδρου και η σωστή ανίχνευσή τους είναι ένα σημαντική για την αυτοματοποίηση διαδικασιών επεξεργασίας εικόνας όπως ευθυγράμμιση εικόνας, κατάτμηση εικόνας και ανάλυση των προτύπων διακλάδωσης. Ωστόσο, η ανάπτυξη αυτοματοποιημένων τεχνικών για την ανίχνευση των κόμβων διακλάδωσης δυσχεραίνεται από τα διαφορετικά επίπεδα θορύβου που υπάρχουν κατά μήκος της δενδρικής δομής. Η προτεινόμενη μεθοδολογία ανίχνευσης απαρτίζεται από δύο κύρια στάδια: ανίχνευση γωνιακών σημείων σε διάφορες κλίμακες και προσδιορισμό της θέσης της διακλάδωσης. Η βασική συνεισφορά της νέας μεθοδολογίας είναι η χρήση ενός τοπικά προσαρμοζόμενου κατωφλιού κατά τη φάση της ανίχνευσης προκειμένου να αντιμετωπιστεί αποδοτικά η ανίχνευση των σημείων διακλάδωσης που βρίσκονται στα χαμηλά δενδρικά επίπεδα. Η αξιολόγηση της μεθόδου πραγματοποιήθηκε χρησιμοποιώντας ένα σύνολο δεδομένων από κλινικές γαλακτογραφίες και η απόδοσης της συγκρίνεται με αντίστοιχες τεχνικές της υπάρχουσας τεχνολογικής στάθμης. Στο τέταρτο κεφάλαιο παρουσιάζονται καινοτόμες μεθοδολογίες για τον χαρακτηρισμό και την κατηγοριοποίηση των ανατομικών δενδρικών δομών στοχεύοντας στη διερεύνηση της συσχέτισης μεταξύ τοπολογίας και παθολογίας των αντίστοιχων οργάνων. Οι μέθοδοι περιλαμβάνουν κατηγοριοποίηση χρησιμοποιώντας περιγραφικά χαρακτηριστικά της τοπολογίας όπως η δενδρική ασυμμετρία, η χωρική κατανομή των σημείων διακλάδωσης, η στρεβλότητα των κλάδων και άλλα γεωμετρικά χαρακτηριστικά του δένδρου. Επιπρόσθετα σε αυτό το κεφάλαιο, ένα νέο μεθοδολογικό πλαίσιο προτείνεται για την ανάλυση δενδρικών τοπολογιών χρησιμοποιώντας διανύσματα που κωδικοποιούν τις σχέσεις παιδιού-γονέα των κόμβων και ελαστικό ταίριασμα μεταξύ των ακολουθιών. Η υπεροχή της νέας αυτής μεθόδου έναντι των μεθόδων της υπάρχουσας τεχνολογικής στάθμης για την κατηγοριοποίηση δένδρων αξιολογήθηκε πειραματικά ως προς ευαισθησία, ειδικότητα και ακρίβεια. Στο πέμπτο κεφάλαιο μελετώνται τεχνικές συλλογικής μάθησης. Η ενοποίηση πολλαπλών αλγορίθμων μηχανικής μάθησης συνιστά σημαντική πρόοδο για τις μεθοδολογίες κατηγοριοποίησης και βασίζεται στην ιδέα του συνδυασμού των προβλέψεων ενός πλήθους κατηγοριοποιητών με σκοπό τη μεγιστοποίηση της ακρίβειας κατηγοριοποίησης. Τρεις τεχνικές συνδυαστικής μάθησης βασισμένες στην τεχνική της ενδυνάμωσης (boosting) και η χρήση ενός συνδυαστικού κανόνα που ονομάζεται Πρότυπο Απόφασης (Decision Template) χρησιμοποιούνται για τη βελτιστοποίηση της ακρίβειας που επιτυγχάνουν οι κατηγοριοποιητές βάσης. Τα πειραματικά αποτελέσματα επιβεβαιώνουν την υπεροχή των μεθόδων συλλογικής μάθησης. Κλείνοντας, τα συμπεράσματα της διατριβής παρουσιάζονται στο έκτο κεφάλαιο. Οι περιορισμοί των προτεινόμενων τεχνικών καθώς και οι προοπτικές για επιπρόσθετη ερευνητική εργασία αναλύονται. / Anatomical structures of branching topology are frequently met in the human body and are visualized in medical images using various image acquisition modalities. Examples of such structures include the bronchial tree in chest computed tomography images, the blood vessels in retinal images and the breast ductal network in x-ray galactograms. The current thesis aims at the development of a set of automated methods for the analysis of anatomical structures of tree and network topology. More specifically, the two main objectives include (i) the development of image processing methods for optimized visualization of anatomical branching structures, and (ii) the development of analysis frameworks sin order to explore the association between topology and pathophysiology of anatomical branching structures. The first chapter of the thesis presents a literature review regarding anatomical structures of the human body with branching topology and the motivation for this thesis. The specific research objectives, the main contributions and the impact of the thesis are also demonstrated. The second chapter focuses on image segmentation. Image segmentation is the first step of medical image analysis and pattern recognition systems and segmentation algorithms are critical components of today radiological diagnostic systems. Despite the large number of existing segmentation algorithms, the need for effective methodologies applicable to a range of imaging modalities still remains. Towards this challenge a novel and fully automated methodology for segmenting anatomical branching structures is presented. The main idea is the integration of edge detection techniques with region growing methods to achieve robust segmentation. The hybrid approach is applied and evaluated in two datasets of branching structures from different imaging modalities (x-ray galactograms and vasculature angiograms) and is compared to state-of-the-art segmentation techniques. The third chapter presents the image processing stage of detecting branching nodes of anatomical structures in medical images. The branching nodes are the key components for tree localization as well as topology modelling and node detection is a very important first step towards the automated processing of these structures including image registration, segmentation and analysis of branching patterns. Developing automated techniques for node detection is a very challenging task due to different levels of noise fluctuations throughout across tree levels. The proposed methodology of node detection consists of two main steps; multi-scale corner detection and branching localization. The main contribution of this work is the use of locally adaptive thresholding in the corner detection phase in order to facilitate node detection at lower tree levels. The evaluation of the methodology using a dataset of clinical galactograms and its comparison with state-of-the-art methods is also presented. In the forth chapter, novel methodologies for the classification of anatomical tree-shape structures are presented aiming at providing new insights into the association between topology and underlying pathology. The methods include classification using descriptive features of the branching topology such as the tree asymmetry index, the spatial distribution of branching nodes, the branch tortuosity and other geometry-based tree features. Additionally, in this chapter a novel framework is presented to analyze tree topologies using representative encodings of parent-child node relationships and elastic sequence matching techniques. The superiority of the new methods over state-of-the-art techniques in terms of sensitivity, specificity and accuracy is evaluated experimentally. In the fifth chapter the potential of ensemble learning schemes is explored. Ensemble schemes are important developments in classification methodology and are based on the idea to combine the predictions of multiple classifiers in order to maximize the classification accuracy. Three ensemble learning techniques based on the boosting technique and an effective combination rule named Decision Template are employed to optimize the accuracy of base classifiers. The experimental results confirm the superiority of ensemble techniques. Finally the conclusions of the thesis are presented in the sixth chapter. The limitations of the proposed approach and the perspectives for further work are discussed. Εξόρυξη δεδομένων Δενδρικές δομές 006.312 Data mining Branching structures
37	Ανάλυση οικονομικών δεδομένων με χρήση τεχνικών εξόρυξης Ζαβουδάκης, Γεώργιος 19 May 2015 (has links) Μετά την μεγάλη έξαρση της τεχνολογικής ανάπτυξης ο όγκος των δεδομένων-πληροφοριών σήμερα είναι τεράστιος και όσο περνάνε τα χρόνια θα μεγαλώνει ακόμα περισσότερο. Είναι βέβαιο λοιπόν ότι ζούμε στην κοινωνία της πληροφορίας, όπου η μετατροπή των δεδομένων σε πληροφορία απαιτείται να οδηγεί στη μετατροπή της πληροφορίας σε γνώση. Έτσι δημιουργήθηκε η ανάγκη επεξεργασίας αυτών των δεδομένων και η μετατροπή τους σε χρήσιμες πληροφορίες που θα βοηθήσουν στην λήψη αποφάσεων. Οι τεχνικές εξόρυξης αποτελούν ένα σημαντικό εργαλείο που μας βοηθά να αντλήσουμε γνώση από μεγάλους όγκους δεδομένων και αν σκεφτούμε ότι όλα αυτά μπορούν να συνδυαστούν με στατιστικές μεθόδους τότε εύκολα μπορούμε να κάνουμε ανάκτηση πληροφορίας. Η συνύπαρξη ετερόκλητων επιστημονικών πεδίων όπως της στατιστικής, της μηχανικής εκμάθησης, της θεωρίας της πληροφορίας και των υπολογιστικών διαδικασιών, έχει δημιουργήσει μια νέα επιστήμη με δυναμικά εργαλεία. Η επιστήμη αυτή καλείται «Εξόρυξη Δεδομένων (ΕΔ)» (Data Mining) και είναι μέρος της διαδικασίας «Ανακάλυψης Γνώσης από Βάσεις Δεδομένων» (Knowledge Discovery in Databases - KDD). Τα εργαλεία της ΕΔ είναι οι αλγόριθμοί της, οι οποίοι επιχειρούν να βρουν χρήσιμα και κατανοητά πρότυπα στα δεδομένα. Κύριος στόχος της παρούσας Διπλωματικής Εργασίας είναι η συγκέντρωση βασικών αλγορίθμων και μεθόδων που επιλέγουν και καθαρίζουν δεδομένα, αναγνωρίζουν πρότυπα, βελτιστοποιούν ένα σύστημα διαχείρισης και συσταδοποιούν δεδομένα. Θα δώσουμε έμφαση σε αλγορίθμους που είναι κατάλληλοι για χρονικά οικονομικά δεδομένα. Εκτός από την καταγραφή των μεθόδων και εφαρμογών της Εξόρυξης δεδομένων και της KDD, θα εφαρμόσουμε τεχνικές συσταδοποίησης σε ένα σύνολο δεδομένων, το οποίο περιλαμβάνει οικονομικά δεδομένα από τρεις διαφορετικές κατηγορίες: τιμές των μετοχών υψηλής κεφαλαιοποίησης του δείκτη Nasdaq , η διαχρονική ισοτιμία Ευρώ/δολλαρίου και η διαχρονική διαμόρφωση των τιμών του πετρελαίου/ανα βαρέλι στις διεθνείς αγορές.Η εργασία αυτή χωρίζεται σε πέντε κεφάλαια: Εισαγωγή, θεωρητικό υπόβαθρο, μεθοδολογία, υλοποίηση πρακτικής εφαρμογής και συμπεράσματα. Στο κεφάλαιο 1 κάνουμε μια πρώτη γνωριμία με την Εξόρυξη γνώσης από Δεδομένα ,στο κεφάλαιο 2 γίνεται η βιβλιογραφική ανασκόπηση και παρουσιάζεται αναλυτικά όλο το θεωρητικό υπόβαθρο των μεθόδων που θα χρησιμοποιηθούν. Στο κεφάλαιο 3 παρουσιάζονται οι μεθοδολογίες (μέθοδοι εξόρυξης για συσταδοποίηση, κατηγοριοποίηση και πρόβλεψη) που χρησιμοποιήθηκαν για τη μελέτη, ενώ στο επόμενο κεφάλαιο παρουσιάζεται μια πρακτική εφαρμογή των παραπάνω ως αποτελέσματα των μεθοδολογιών αυτών. Και τέλος, στο κεφάλαιο 5 παρουσιάζονται κάποια συμπεράσματα που μπορούμε να εξάγουμε από την υλοποίηση της πρακτικής εφαρμογής. Η εργασία αυτή έχει ως στόχο να αναδείξει την σχέση που μπορεί να υπάρξει ανάμεσα στην Οικονομική επιστήμη και σε αυτήν της Τεχνητής Νοημοσύνης, εστιάζοντας κυρίως στο κατά πόσο η δεύτερη μπορεί να δώσει λύσεις σε καίρια ζητήματα, προβλήματα αλλά και προκλήσεις που παρουσιάζονται στο σύγχρονο οικονομικό περιβάλλον. Το μέσο για την εκπλήρωση αυτού του στόχου είναι οι τεχνικές Data Mining, που στα ελληνικά σαν όρος, αποδίδονται ως Τεχνικές Εξόρυξης Δεδομένων. Για την υλοποίηση της εργασίας αυτής, σαν πηγές χρησιμοποιήθηκαν πολλά επιστημονικά βιβλία που σχετίζονται με την Οικονομία, τα Χρηματοοικονομικά, την Τεχνητή Νοημοσύνη και τις μεθόδους Data Mining, τις Πολυκριτήριες Τεχνικές Ταξινόμησης αλλά και την Στατιστική. Το αποτέλεσμα από τον συνδυασμό των παραπάνω θα παρουσιαστεί στις σελίδες που θα ακολουθήσουν. / After the great upsurge of technological development the volume of currently-information data is huge and as the years pass will grow even more. It is certain, therefore, that we live in the information society, where the transformation of data into information needed to drive the conversion of information into knowledge. This created the need to process this data and turn them into useful information that will help in decision making. The mining techniques are an important tool that helps us to draw knowledge from large volumes of data and if we think that all this can be combined with statistical methods then we can easily retrieve information. The disparate disciplines such as statistics, machine learning, information theory and computational procedures, has created a new science with powerful tools. This science is called "Data Mining (DM)» and is part of the 'Knowledge Discovery from Databases ». The tools of DM are the algorithms that are trying to find useful and understandable patterns in data. The main objective of this thesis is the concentration of basic algorithms and methods chosen and cleanse data, recognize patterns, optimize a management system and clustering data. Will emphasize algorithms that are suitable for time economic data. Besides recording the methods and applications of data mining and KDD, we apply clustering techniques to a data set, which includes financial data from three different categories: price-cap stock index Nasdaq, the timeless rate Euro / dollar and the configuration of oil prices / per barrel in international markets. This paper is divided into five chapters: Introduction, theoretical background, methodology, implementation of practical application and conclusions. In Chapter 1, we make a first acquaintance with the Mining Data, in Chapter 2 is the literature review and presented in detail all the theoretical background of the methods used. Methodologies presented in Chapter 3 (mining methods for clustering, classification and prediction) used for the study, while the next chapter presents a practical application of the above as a result of these methodologies. Finally, Chapter 5 presents some conclusions can be drawn from the implementation of the practice.This paper aims to highlight the relationship that can exist between economic science and that of Artificial Intelligence, focusing mainly on whether the latter can provide solutions to key issues, problems and challenges presented in today's economic environment . The means to achieve this objective are the technical Data Mining, which in Greek as term, rendered as Technical Data Mining. For the realization of this work, as sources used many scientific books related to the Economy, Finance, Artificial Intelligence and methods Data Mining, the Multicriteria Classification Techniques and Statistics. The result from the combination of the above will be presented in the pages that follow. Τεχνικές εξόρυξης Ανάλυση δεδομένων 006.312 Data mining Data analysis
38	From learning to e-learning : mining educational data : a novel, data-driven approach to evaluate individual differences in students' interaction with learning technology Vigentini, Lorenzo January 2010 (has links) In recent years, learning technology has become a very important addition to the toolkit of instructors at any level of education and training. Not only offered as a substitute in distance education, but often complementing traditional delivery methods, e-learning is considered an important component of modern pedagogy. Particularly in the last decade, learning technology has seen a very rapid growth following the large-scale development and deployment of e-learning financed by both Governments and commercial enterprises. These turned e-learning into one of the most profitable sectors of the new century, especially in recession times when education and retraining have become even more important and a need to maximise resources is forced by the need for savings. Interestingly, however, evaluation of e-learning has been primarily based on the consideration of users’ satisfaction and usability metrics (i.e. system engineering perspective) or on the outcomes of learning (i.e. gains in grades/task performance). Both of these are too narrow to provide a reliable effect of the real impact of learning technology on the learning processes and lead to inconsistent findings. The key purpose of this thesis is to propose a novel, data-driven framework and methodology to understand the effect of e-learning by evaluating the utility and effectiveness of e-learning systems in the context of higher education, and specifically, in the teaching of psychology courses. The concept of learning is limited to its relevance for students’ learning in courses taught using a mixture of traditional methods and online tools tailored to enhance teaching. The scope of elearning is intended in a blended method of delivery of teaching. A large sample of over 2000 students taking psychology courses in year 1 and year 2 was considered over a span of 5 five years, also providing the scope for the analysis of some longitudinal sub-samples. The analysis is accomplished using a psychologically grounded approach to evaluation, partially informed by a cognitive/ behavioural perspective (online usage) and a differential perspective (measures of cognitive and learning styles). Relations between behaviours, styles and academic performance are also considered, giving an insight and a direct comparison with existing literature. The methodology adopted draws heavily from data mining techniques to provide a rich characterisation of students/users in this particular context from the combination of three types of metrics: cognitive and learning styles, online usage and academic performance. Four different instruments are used to characterise styles: ASSIST (Approaches to learning, Entwistle), CSI (Cognitive Styles Inventory, Allinson & Hayes), TSI (Thinking Styles Inventory and the mental self-government theory, Sternberg) and VICS-WA (Verbal/Imager and Wholistc/Analytic Cognitive style, Riding, Peterson) which were intentionally selected to provide a varied set of tools. Online usage, spanning over the entire academic year for each student, is analysed applying web usage mining (WUM) techniques and is observed through different layers of interpretation accounting for behaviours from the single clicks to a student’s intentions in a single session. Academic performance was collated from the students’ records giving an insight in the end-of-year grades, but also into specific coursework submissions during the whole academic year allowing for a temporal matching of online use and assessment. The varied metrics used and data mining techniques applied provide a novel evaluation framework based on a rich profile of the learner, which in turn offers a valuable alternative to regression methods as a mean to interpret relations between metrics. Patterns emerging from styles and the way online material is used over time, proved to be valuable in discriminating differences in academic performance and useful in this context to identify significant group differences in both usage and academic performance. As a result, the understanding of the relations between e-learning usage, styles and academic performance has important practical implications to enhance students’ learning experience, in the automation of learning systems and to inform policymakers of the effects of learning technology has from a user and learner-centred approach to learning and studying. The success of the application of data mining methods offers an excellent starting point to explore further a data-driven approach to evaluation, support informed design processes of e-learning and to deliver suitable interventions to ensure better learning outcomes and provide an efficient system for institutions and organization to maximise the impact of learning technology for teaching and training. 006.312
39	Τεχνικές text mining για την συγκριτική ανάλυση νοήματος κειμένου Πλώτα, Δέσποινα 27 December 2010 (has links) Τις τελευταίες δεκαετίες έχουν παραχθεί ασύλληπτα μεγάλες ποσότητες δεδομένων από διάφορες διεργασίες που έχουν οργανωθεί με χρήση υπολογιστικών συστημάτων. Το μεγαλύτερο βέβαια ποσό των δεδομένων βρίσκεται σε μορφή κειμένων και αυτός ο τύπος των μη δομημένων στοιχείων στερείται συνήθως «τα στοιχεία για τα στοιχεία». Η ανάγκη λοιπόν για την αυτοματοποιημένη εξαγωγή χρήσιμης γνώσης από τεράστια ποσά κειμενικών στοιχείων προκειμένου να βοηθηθεί η ανθρώπινη ανάλυση είναι προφανής. Η εξόρυξη κειμένου (text mining) είναι ένας νέος ερευνητικός τομέας που προσπαθεί να επιλύσει το πρόβλημα της υπερφόρτωσης πληροφοριών με την χρησιμοποίηση των τεχνικών από την εξόρυξη από δεδομένα (data mining), την μηχανική μάθηση (machine learning), την επεξεργασία φυσικής γλώσσας (natural language processing), την ανάκτηση πληροφορίας (information retrieval), την εξαγωγή πληροφορίας (information extraction) και τη διαχείριση γνώσης (Knowledge management). Βασιζόμενοι λοιπόν σε αυτήν την τεχνική εξόρυξης κειμένου παρουσιάζουμε σε αυτή την διπλωματική εργασία μια μεθοδολογία εξαγωγής γνώσης από κείμενο με απώτερο σκοπό την απόδοση της πατρότητας δυο έργων σε συγκεκριμένο συγγραφέα. Το κύριο θέμα ενδιαφέροντος είναι το εξής: είναι η Ιλιάδα και Οδύσσεια έργα του ίδιου ποιητή; Η μεθοδολογία μας βασίζεται στην ανάλυση του «σημαινόμενου» παρά του «σημαίνοντος» στην Ιλιάδα και στην Οδύσσεια. Σε μία πρώτη φάση μετασχηματίζουμε τα δεδομένα: διατηρήθηκαν μόνο τα ουσιαστικά, τα ρήματα, τα επίθετα και τα επιρρήματα τα οποία οργανώθηκαν σε ομάδες συνωνύμων, όπου κάθε ομάδα αντιπροσωπεύει μία έννοια. Επιλέξαμε να κάνουμε ανάλυση των σχέσεων μεταξύ αυτών των εννοιών. Έτσι μετατρέψαμε όλες τις προτάσεις στο κείμενο, σε προτάσεις οι οποίες αποτελούνται μόνο από αυτές τις έννοιες, απαλείφοντας φυσικά τα διπλότυπα. Στη συνέχεια μετασχηματίσαμε το κείμενο σε μια δομημένη μορφή, ώστε να μπορέσουμε να το αποθηκεύσουμε σε «εγγραφές» μιας βάσης δεδομένων. Συγκεκριμένα, θεωρήσαμε συνεχή τμήματα κειμένου σαν τέτοιες «εγγραφές». Πειραματιστήκαμε ορίζοντας είτε μία πρόταση είτε δύο συνεχόμενες ως «εγγραφή», χρησιμοποιώντας τον Apriori αλγόριθμο για να εξάγουμε «κανόνες συσχέτισης» της μορφής «90% των εγγραφών που περιέχουν την έννοια χ περιέχουν και την έννοια y». Εξάγαμε ένα μεγάλο αριθμό ισχυρών συσχετίσεων μεταξύ ίδιων εννοιών και στα δυο ποιήματα (π.χ. «γη»-«άνδρας»). Υπάρχουν επίσης συσχετίσεις μεταξύ διαφορετικών εννοιών (π.χ. «μάχη»-«άνδρας» μόνο στην Ιλιάδα) και διαφορετικές συσχετίσεις για την ίδια έννοια (π.χ. «ήρωας»-«μάχη» στην Ιλιάδα και «ήρωας»-«κατοικία» στην Οδύσσεια). Όμως, δεν βρήκαμε καμία αντίθεση. Αυτά τα αποτελέσματα ενδεχομένως να οδηγούν στο συμπέρασμα ότι ο Όμηρος έγραψε και τα δυο έπη. / What is generally called “the Homeric question” is by far the oldest author-attribution problem. The Homeric question really encompasses several issues, e.g. are the Iliad and Odyssey each work of a single poet? In this paper we try to answer the question using a data mining technique. Data mining is an emerging research area that develops techniques for knowledge discovery in huge volumes of data. Data mining methods have been applied to a wide variety of domains, from market basket analysis to the analysis of satellite pictures and human genomes. More specifically, in this paper, we present an application of data mining in discovering whether a document is ascribed to a writer. Our methodology is based on analyzing rather the content than the syntax. More specifically, we propose a technique for mining association rules, in order to analyze associations amongst concepts. We, also demonstrate the results of the analyses which we have undertaken using this algorithm. Εξόρυξη δεδομένων Πατρότητα κειμένου 006.312 Text mining Author attribution problem
40	Εφαρμογή τεχνικών εξόρυξης γνώσης σε οικονομικά δεδομένα Ραυτόπουλος, Γιώργος 04 December 2012 (has links) Τα συστήματα υποστήριξης αποφάσεων αποτελούν το πιο σημαντικό κομμάτι στην υποδομή ενός επιχειρησιακού πληροφοριακού συστήματος, επειδή δίνουν τη δυνατότητα στις εταιρίες να μετατρέψουν μεγάλες ποσότητες επιχειρηματικών πληροφοριών σε επικερδή αποτελέσματα. Ο κύριος σκοπός της παρούσας διπλωματικής εργασίας είναι να μελετήσουμε με ποιο τρόπο μπορούν να χρησιμοποιηθούν αλγόριθμοι Εξόρυξης Γνώσης (Data Mining) για την έγκριση τραπεζικών προϊόντων βασιζόμενη σε στοιχεία των αιτούντων. Ειδικότερα, στην εργασία αυτή προσπαθούμε να αποδείξουμε την αποτελεσματικότητα των εργαλείων εξόρυξης γνώσης για την έγκριση πιστωτικών καρτών. Αρχικά γίνεται παρουσίαση και θεωρητική μελέτη των μεθόδων της Μηχανικής Μάθησης, που διέπουν την εξόρυξη γνώσης από δεδομένα. Στην συνέχεια η εργασία επικεντρώνεται στη μοντελοποίηση του προβλήματος και στην ανάδειξη των ιδιαιτεροτήτων του. Επόμενος στόχος είναι να υλοποιήσουμε και να αξιολογήσουμε την συμπεριφορά των αλγορίθμων Μηχανικής Μάθησης σε εφαρμογές έγκρισης πιστωτικών καρτών. Συγκεκριμένα θα συγκριθούν γνωστοί και αντιπροσωπευτικοί αλγόριθμοι των σημαντικότερων τεχνικών κατηγοριοποίησης, όπως είναι οι Naïve Bayes, ο C4.5, οι Μηχανές Διανυσμάτων Υποστήριξης (SVMs). Και στο τέλος θα κατασκευαστεί πρωτότυπο λογισμικό εργαλείο υποστήριξης για την έγκριση πιστωτικών καρτών. / Decision support systems are the most important piece of the infrastructure of an information system because they enable companies to convert large volumes of information into profitable business results. The main purpose of this thesis is to study how data mining algorithms can be used for the approval of banking products based on data of the applicants. Specifically, in this work we try to prove the effectiveness of mining tools for approval credit cards. Initially we present a theoretical study of machine learning methods. Then the thesis focuses on modeling the problem. The next goal was to implement and evaluate the behavior of machine learning algorithms in credit card approval. We compared known and representative algorithms of the most important classification algorithms, such as Naïve Bayes, the C4.5, Support Vector Machines are (SVMs). Finally, we built a prototype software tool support for the approval of credit cards. Εξόρυξης γνώσης Μηχανική μάθηση 006.312 Data mining Machine learning

Search results