Global ETD Search

61	Ανάλυση κυβερνητικών ΤΠΕ έργων με τεχνικές εξόρυξης δεδομένων / Analysis of governmental ICT projects using data mining techniques Βικάτος, Παντελεήμων 16 May 2014 (has links) Σκοπός της διπλωματικής εργασίας είναι η λεπτομερής ανάλυση κυβερνητικών επενδύσεων για έργα ΤΠΕ. Ο συνδυασμός της στατιστικής ανάλυσης, της συσχέτισης (correlation) και της ανάλυσης με τεχνικές εξόρυξης δεδομένων δημιούργησε χρήσιμα συμπεράσματα για τα έργα ΤΠΕ. Επίσης, περιγράφεται ένα μοντέλο αξιολόγησης με βάση τις αποκλίσεις από τους αρχικούς στόχους και την εκτίμηση των διαχειριστών των έργων (Project managers). Σημαντικό τμήμα αυτού του μοντέλου αποτελεί η πρόβλεψη της ολίσθησης του κόστους με την χρήση κατηγοριοποίησης. Τέλος η παρουσίαση της απόδοσης των ελληνικών έργων ΤΠΕ γίνεται με το σχεδιασμό ενός βελτιωμένου ταμπλό (dashboard) για την παρακολούθηση και τον έλεγχο για τις ελληνικές επενδύσεις στις ΤΠΕ. / The goal of this master thesis is the detailed analysis of governmental ICT projects. The combination of statistical, correlation and mining analysis extracts useful conclusions for ICT projects. Also a detailed description of an evaluation model is presented for evaluating the performance of ICT project and we introduce an improved ICT dashboard for monitoring and controlling for the Greek ICT investments as well as a classification model for predicting the performance’s slippage. Εξόρυξη δεδομένων Έργα ΤΠΕ Μοντέλο αξιολόγησης 006.312 Data mining ICT projects Evaluation model Classification model
62	Μέτρα ομοιότητας στην τεχνική ομαδοποίησης (clustering): εφαρμογή στην ανάλυση κειμένων (text mining) / Similarity measures in clustering: an application in text mining Παπαστεργίου, Θωμάς 17 May 2007 (has links) Ανάπτυξη ενός μέτρου ανομοιότητας μεταξύ κατηγορικών δεδομένων και η εφαρμογή του για την ομαδοποίηση κειμένων και την λύση του προβλήματος αυθεντiκότητας κειμένων. / Developement of a similarity measure for categorical data and the application of the measure in text clustering and in the authoring attribution problem. Ομαδοποίηση Μέτρα ομοιότητας Εξόρυξη κειμένου 006.312 Clustering Similarity measures Authoring attribution problem Text mining
63	A proposal for the protection of digital databases in Sri Lanka Abeysekara, Thusitha Bernad January 2013 (has links) Economic development in Sri Lanka has relied heavily on foreign and domestic investment. Digital databases are a new and attractive area for this investment. This thesis argues that investment needs protection and this is crucial to attract future investment. The thesis therefore proposes a digital database protection mechanism with a view to attracting investment in digital databases to Sri Lanka. The research examines various existing protection measures whilst mainly focusing on the sui generis right protection which confirms the protection of qualitative and/or quantitative substantial investment in the obtaining, verification or presentation of the contents of digital databases. In digital databases, this process is carried out by computer programs which establish meaningful and useful data patterns through their data mining process, and subsequently use those patterns in Knowledge Discovery within database processes. Those processes enhance the value and/or usefulness of the data/information. Computer programs need to be protected, as this thesis proposes, by virtue of patent protection because the process carried out by computer programs is that of a technical process - an area for which patents are particularly suitable for the purpose of protecting. All intellectual property concepts under the existing mechanisms address the issue of investment in databases in different ways. These include Copyright, Contract, Unfair Competition law and Misappropriation and Sui generis right protection. Since the primary objective of the thesis is to introduce a protection system for encouraging qualitative and quantitative investment in digital databases in Sri Lanka, this thesis suggests a set of mechanisms and rights which comprises of existing intellectual protection mechanisms for databases. The ultimate goal of the proposed protection mechanisms and rights is to improve the laws pertaining to the protection of digital databases in Sri Lanka in order to attract investment, to protect the rights and duties of the digital database users and owners/authors and, eventually, to bring positive economic effects to the country. Since digital database protection is a new concept in the Sri Lankan legal context, this research will provide guidelines for policy-makers, judges and lawyers in Sri Lanka and throughout the South Asian region. 006.312
64	Fouille de données stochastique pour la compréhension des dynamiques temporelles et spatiales des territoires agricoles. Contribution à une agronomie numérique / Stochastic data mining for the understanding of temporal and spatial dynamics in agricultural landscapes. Contribution to a numerical landscape agronomy Lazrak, El Ghali 19 September 2012 (has links) Cette thèse vise à développer une méthode générique de modélisation des dynamiques passées et actuelles de l'organisation territoriale de l'activité agricole (OTAA). Nous avons développé une méthode de modélisation stochastique fondée sur des modèles de Markov cachés qui permet de fouiller un corpus de données spatio-temporelles d'occupations du sol (OCS) en vue de le segmenter et de révéler des dynamiques agricoles cachées. Nous avons testé cette méthode sur des corpus d'OCS issus de sources variées et appartenant à des territoires agricoles de dimensions. Cette méthode apporte 3 contributions à la modélisation de l'OTAA : (i) la description de l'OTAA suivant une approche temporo-spatiale qui identifie des régularités temporelles, puis les localise en segmentant le territoire agricole en zones compactes de régularités temporelles similaires; (ii) la fouille des voisinages des successions d'OCS et de leurs dynamiques; (iii) l'articulation des régularités révélées par notre approche de fouille de données à l'échelle régionale avec des règles identifiées par des experts en agronomie et en écologie à des échelles plus locales en vue d'expliquer les régularités et de valider les hypothèses des experts. Nos résultats valident l'hypothèse que l'OTAA se prête bien à la représentation par un champs de Markov de successions. Cette thèse ouvre la voie à une nouvelle approche de modélisation de l'OTAA explorant le couplage entre régularités et règles, et exploitant davantage les outils d'intelligence artificielle. Elle constituerait les prémices de ce qui pourrait devenir une agronomie numérique des territoires / The purpose of this thesis is to develop a generic method for modelling the past and current dynamics of Landscape Organization of Farming Activity (LOFA). We developed a stochastic modelling method based on Hidden Markov Models that allows data mining within a corpus of spatio-temporal land use data to segment the corpus and reveal hidden agricultural dynamics. We applied this method to land use corpora from various sources belonging to two agricultural landscapes of regional dimension. This method provides three contributions to the modeling of LOFA : (i) LOFA description following a temporo-spatial approach that first identifies temporal regularities and then localizes them by segmenting the agricultural landscape into compact areas having similar temporal regularities; (ii) data mining of the neighborhood of land use successions and their dynamics; (iii) combining of the regularities revealed by our data mining approach at the regional level with rules identified by agronomy and ecology experts at more local scales to explain the regularities and validate the experts' hypotheses. Our results validate the hypothesis according to which LOFA fits well a Markov field of land-use successions. This thesis opens the door to a new LOFA modelling approach that investigates the combining of regularities and rules and that further exploits artificial intelligence tools. This work could serve as the beginning of what could become a numerical landscape agronomy Système de culture Succession de cultures Modèle de Markov caché hiérarchique Agronomie des territoires Cropping system Crop successions Hierarchical Hidden Makov Model (HHMM) Landscape agronomy 006.312
65	Découverte interactive de connaissances dans le web des données / Interactive Knowledge Discovery over Web of Data Alam, Mehwish 01 December 2015 (has links) Récemment, le « Web des documents » est devenu le « Web des données », i.e, les documents sont annotés sous forme de triplets RDF. Ceci permet de transformer des données traitables uniquement par les humains en données compréhensibles par les machines. Ces données peuvent désormais être explorées par l'utilisateur par le biais de requêtes SPARQL. Par analogie avec les moteurs de clustering web qui fournissent des classifications des résultats obtenus à partir de l'interrogation du web des documents, il est également nécessaire de réfléchir à un cadre qui permette la classification des réponses aux requêtes SPARQL pour donner un sens aux données retrouvées. La fouille exploratoire des données se concentre sur l'établissement d'un aperçu de ces données. Elle permet également le filtrage des données non-intéressantes grâce à l'implication directe des experts du domaine dans le processus. La contribution de cette thèse consiste à guider l'utilisateur dans l'exploration du Web des données à l'aide de la fouille exploratoire de web des données. Nous étudions trois axes de recherche, i.e : 1) la création des vues sur les graphes RDF et la facilitation des interactions de l'utilisateur sur ces vues, 2) l'évaluation de la qualité des données RDF et la complétion de ces données 3) la navigation et l'exploration simultanée de multiples ressources hétérogènes présentes sur le Web des données. Premièrement, nous introduisons un modificateur de solution i.e., View By pour créer des vues sur les graphes RDF et classer les réponses aux requêtes SPARQL à l'aide de l'analyse formelle des concepts. Afin de naviguer dans le treillis de concepts obtenu et d'extraire les unités de connaissance, nous avons développé un nouvel outil appelé RV-Explorer (RDF View Explorer ) qui met en oeuvre plusieurs modes de navigation. Toutefois, cette navigation/exploration révèle plusieurs incompletions dans les ensembles des données. Afin de compléter les données, nous utilisons l'extraction de règles d'association pour la complétion de données RDF. En outre, afin d'assurer la navigation et l'exploration directement sur les graphes RDF avec des connaissances de base, les triplets RDF sont groupés par rapport à cette connaissance de base et ces groupes peuvent alors être parcourus et explorés interactivement. Finalement, nous pouvons conclure que, au lieu de fournir l'exploration directe nous utilisons ACF comme un outil pour le regroupement de données RDF. Cela permet de faciliter à l'utilisateur l'exploration des groupes de données et de réduire ainsi son espace d'exploration par l'interaction. / Recently, the “Web of Documents” has become the “Web of Data”, i.e., the documents are annotated in the form of RDF making this human processable data directly processable by machines. This data can further be explored by the user using SPARQL queries. As web clustering engines provide classification of the results obtained by querying web of documents, a framework for providing classification over SPARQL query answers is also needed to make sense of what is contained in the data. Exploratory Data Mining focuses on providing an insight into the data. It also allows filtering of non-interesting parts of data by directly involving the domain expert in the process. This thesis contributes in aiding the user in exploring Linked Data with the help of exploratory data mining. We study three research directions, i.e., 1) Creating views over RDF graphs and allow user interaction over these views, 2) assessing the quality and completing RDF data and finally 3) simultaneous navigation/exploration over heterogeneous and multiple resources present on Linked Data. Firstly, we introduce a solution modifier i.e., View By to create views over RDF graphs by classifying SPARQL query answers with the help of Formal Concept Analysis. In order to navigate the obtained concept lattice and extract knowledge units, we develop a new tool called RV-Explorer (Rdf View eXplorer) which implements several navigational modes. However, this navigation/exploration reveal several incompletions in the data sets. In order to complete the data, we use association rule mining for completing RDF data. Furthermore, for providing navigation and exploration directly over RDF graphs along with background knowledge, RDF triples are clustered w.r.t. background knowledge and these clusters can then be navigated and interactively explored. Finally, it can be concluded that instead of providing direct exploration we use FCA as an aid for clustering RDF data and allow user to explore these clusters of data and enable the user to reduce his exploration space by interaction. Données ouvertes Analyse de concepts formels Pattern Structures Exploration Interactive de données RDF Linked Open Data Formal Concept Analysis Pattern Structures Interactive Exploration of RDF data 006.312
66	Prédire et influencer l'apparition des événements dans une séquence complexe / Predicting and influencing the appearance of events in a complex sequence Fahed, Lina 27 October 2016 (has links) Depuis plusieurs années, un nouveau phénomène lié aux données numériques émerge : des données de plus en plus volumineuses, variées et véloces, apparaissent et sont désormais disponibles, elles sont souvent qualifiées de données complexes. Dans cette thèse, nous focalisons sur un type particulier de données complexes : les séquences complexes d’événements, en posant la question suivante : “comment prédire au plus tôt et influencer l’apparition des événements futurs dans une séquence complexe d’événements ?”. Tout d’abord, nous traitons le problème de prédiction au plus tôt des événements. Nous proposons un algorithme de fouille de règles d’épisode DEER qui a l’originalité de maîtriser l’horizon d’apparition des événements futurs à travers d’une distance imposée au sein de règles extraites. Dans un deuxième temps, nous focalisons sur la détection de l’émergence dans un flux d’événements. Nous proposons l’algorithme EER pour la détection au plus tôt de l’émergence de nouvelles règles. Pour augmenter la fiabilité de nouvelles règles lorsque leur support est très faible, EER s’appuie sur la similarité entre ces règles et les règles déjà connues. Enfin, nous étudions l’impact porté par des événements sur d’autres dans une séquence d’événements. Nous proposons l’algorithme IE qui introduit la notion des “événements influenceurs” et étudie l’influence sur le support, la confiance et la distance à travers de trois mesures d’influence proposées. Ces travaux sont évalués et validés par une étude expérimentale menée sur un corpus de données réelles issues de blogs / For several years now, a new phenomenon related to digital data is emerging : data which are increasingly voluminous, varied and rapid, appears and becomes available, they are often referred to as complex data. In this dissertation, we focus on a particular type of data : complex sequence of events, by asking the following question : “how to predict as soon as possible and to influence the appearance of future events within a complex sequence of events?”. First of all, we focus on the problem of predicting events as soon as possible in a sequence of events. We propose DEER : an algorithm for mining episode rules, which has the originality of controlling the horizon of the appearance of future events by imposing a temporal distance within the extracted rules. In a second phase, we address the problem of emergence detection in an events stream. We propose EER : an algorithm for detecting new emergent rules as soon as possible. In order to increase the reliability of new rules, EER relies on the similarity between theses rules and previously extracted rules. At last, we study the impact carried by events on other events within a sequence of events. We propose IE : an algorithm that introduces the concept of “influencer events” and studies the influence on the support, on the confidence and on the distance through three proposed measures. Our work is evaluated and validated through an experimental study carried on a real data set of blogs messages Fouille de données Règles d’épisodes Séquence d’événements Prédiction d’événements Détection de l’émergence Événements influenceurs Data mining Episode rules Events sequence Events prediction Emergent events Influencer events 006.312 519.54 003.2
67	Apport des ontologies de domaine pour l'extraction de connaissances à partir de données biomédicales / Contribution of domain ontologies for knowledge discovery in biomedical data Personeni, Gabin 09 November 2018 (has links) Le Web sémantique propose un ensemble de standards et d'outils pour la formalisation et l'interopérabilité de connaissances partagées sur le Web, sous la forme d'ontologies. Les ontologies biomédicales et les données associées constituent de nos jours un ensemble de connaissances complexes, hétérogènes et interconnectées, dont l'analyse est porteuse de grands enjeux en santé, par exemple dans le cadre de la pharmacovigilance. On proposera dans cette thèse des méthodes permettant d'utiliser ces ontologies biomédicales pour étendre les possibilités d'un processus de fouille de données, en particulier, permettant de faire cohabiter et d'exploiter les connaissances de plusieurs ontologies biomédicales. Les travaux de cette thèse concernent dans un premier temps une méthode fondée sur les structures de patrons, une extension de l'analyse formelle de concepts pour la découverte de co-occurences de événements indésirables médicamenteux dans des données patients. Cette méthode utilise une ontologie de phénotypes et une ontologie de médicaments pour permettre la comparaison de ces événements complexes, et la découverte d'associations à différents niveaux de généralisation, par exemple, au niveau de médicaments ou de classes de médicaments. Dans un second temps, on utilisera une méthode numérique fondée sur des mesures de similarité sémantique pour la classification de déficiences intellectuelles génétiques. On étudiera deux mesures de similarité utilisant des méthodes de calcul différentes, que l'on utilisera avec différentes combinaisons d'ontologies phénotypiques et géniques. En particulier, on quantifiera l'influence que les différentes connaissances de domaine ont sur la capacité de classification de ces mesures, et comment ces connaissances peuvent coopérer au sein de telles méthodes numériques. Une troisième étude utilise les données ouvertes liées ou LOD du Web sémantique et les ontologies associées dans le but de caractériser des gènes responsables de déficiences intellectuelles. On utilise ici la programmation logique inductive, qui s'avère adaptée pour fouiller des données relationnelles comme les LOD, en prenant en compte leurs relations avec les ontologies, et en extraire un modèle prédictif et descriptif des gènes responsables de déficiences intellectuelles. L'ensemble des contributions de cette thèse montre qu'il est possible de faire coopérer avantageusement une ou plusieurs ontologies dans divers processus de fouille de données / The semantic Web proposes standards and tools to formalize and share knowledge on the Web, in the form of ontologies. Biomedical ontologies and associated data represents a vast collection of complex, heterogeneous and linked knowledge. The analysis of such knowledge presents great opportunities in healthcare, for instance in pharmacovigilance. This thesis explores several ways to make use of this biomedical knowledge in the data mining step of a knowledge discovery process. In particular, we propose three methods in which several ontologies cooperate to improve data mining results. A first contribution of this thesis describes a method based on pattern structures, an extension of formal concept analysis, to extract associations between adverse drug events from patient data. In this context, a phenotype ontology and a drug ontology cooperate to allow a semantic comparison of these complex adverse events, and leading to the discovery of associations between such events at varying degrees of generalization, for instance, at the drug or drug class level. A second contribution uses a numeric method based on semantic similarity measures to classify different types of genetic intellectual disabilities, characterized by both their phenotypes and the functions of their linked genes. We study two different similarity measures, applied with different combinations of phenotypic and gene function ontologies. In particular, we investigate the influence of each domain of knowledge represented in each ontology on the classification process, and how they can cooperate to improve that process. Finally, a third contribution uses the data component of the semantic Web, the Linked Open Data (LOD), together with linked ontologies, to characterize genes responsible for intellectual deficiencies. We use Inductive Logic Programming, a suitable method to mine relational data such as LOD while exploiting domain knowledge from ontologies by using reasoning mechanisms. Here, ILP allows to extract from LOD and ontologies a descriptive and predictive model of genes responsible for intellectual disabilities. These contributions illustrates the possibility of having several ontologies cooperate to improve various data mining processes Bioontologies Données ouvertes liées Programmation logique inductive Similarité sémantique Structures de patrons Web sémantique Bioontologies Inductive Logic Programming Linked Open Data Pattern structures Semantic similarity Semantic Web 006.332 006.312
68	Text mining : μια νέα προτεινόμενη μέθοδος με χρήση κανόνων συσχέτισης Νασίκας, Ιωάννης 14 September 2007 (has links) Η εξόρυξη κειμένου (text mining) είναι ένας νέος ερευνητικός τομέας που προσπαθεί να επιλύσει το πρόβλημα της υπερφόρτωσης πληροφοριών με τη χρησιμοποίηση των τεχνικών από την εξόρυξη από δεδομένα (data mining), την μηχανική μάθηση (machine learning), την επεξεργασία φυσικής γλώσσας (natural language processing), την ανάκτηση πληροφορίας (information retrieval), την εξαγωγή πληροφορίας (information extraction) και τη διαχείριση γνώσης (knowledge management). Στο πρώτο μέρος αυτής της διπλωματικής εργασίας αναφερόμαστε αναλυτικά στον καινούριο αυτό ερευνητικό τομέα, διαχωρίζοντάς τον από άλλους παρεμφερείς τομείς. Ο κύριος στόχος του text mining είναι να βοηθήσει τους χρήστες να εξαγάγουν πληροφορίες από μεγάλους κειμενικούς πόρους. Δύο από τους σημαντικότερους στόχους είναι η κατηγοριοποίηση και η ομαδοποίηση εγγράφων. Υπάρχει μια αυξανόμενη ανησυχία για την ομαδοποίηση κειμένων λόγω της εκρηκτικής αύξησης του WWW, των ψηφιακών βιβλιοθηκών, των ιατρικών δεδομένων, κ.λ.π.. Τα κρισιμότερα προβλήματα για την ομαδοποίηση εγγράφων είναι η υψηλή διαστατικότητα του κειμένου φυσικής γλώσσας και η επιλογή των χαρακτηριστικών γνωρισμάτων που χρησιμοποιούνται για να αντιπροσωπεύσουν μια περιοχή. Κατά συνέπεια, ένας αυξανόμενος αριθμός ερευνητών έχει επικεντρωθεί στην έρευνα για τη σχετική αποτελεσματικότητα των διάφορων τεχνικών μείωσης διάστασης και της σχέσης μεταξύ των επιλεγμένων χαρακτηριστικών γνωρισμάτων που χρησιμοποιούνται για να αντιπροσωπεύσουν το κείμενο και την ποιότητα της τελικής ομαδοποίησης. Υπάρχουν δύο σημαντικοί τύποι τεχνικών μείωσης διάστασης: οι μέθοδοι «μετασχηματισμού» και οι μέθοδοι «επιλογής». Στο δεύτερο μέρος αυτής τη διπλωματικής εργασίας, παρουσιάζουμε μια καινούρια μέθοδο «επιλογής» που προσπαθεί να αντιμετωπίσει αυτά τα προβλήματα. Η προτεινόμενη μεθοδολογία είναι βασισμένη στους κανόνες συσχέτισης (Association Rule Mining). Παρουσιάζουμε επίσης και αναλύουμε τις εμπειρικές δοκιμές, οι οποίες καταδεικνύουν την απόδοση της προτεινόμενης μεθοδολογίας. Μέσα από τα αποτελέσματα που λάβαμε διαπιστώσαμε ότι η διάσταση μειώθηκε. Όσο όμως προσπαθούσαμε, βάσει της μεθοδολογίας μας, να την μειώσουμε περισσότερο τόσο χανόταν η ακρίβεια στα αποτελέσματα. Έγινε μια προσπάθεια βελτίωσης των αποτελεσμάτων μέσα από μια διαφορετική επιλογή των χαρακτηριστικών γνωρισμάτων. Τέτοιες προσπάθειες συνεχίζονται και σήμερα. Σημαντική επίσης στην ομαδοποίηση των κειμένων είναι και η επιλογή του μέτρου ομοιότητας. Στην παρούσα διπλωματική αναφέρουμε διάφορα τέτοια μέτρα που υπάρχουν στην βιβλιογραφία, ενώ σε σχετική εφαρμογή κάνουμε σύγκριση αυτών. Η εργασία συνολικά αποτελείται από 7 κεφάλαια: Στο πρώτο κεφάλαιο γίνεται μια σύντομη ανασκόπηση σχετικά με το text mining. Στο δεύτερο κεφάλαιο περιγράφονται οι στόχοι, οι μέθοδοι και τα εργαλεία που χρησιμοποιεί η εξόρυξη κειμένου. Στο τρίτο κεφάλαιο παρουσιάζεται ο τρόπος αναπαράστασης των κειμένων, τα διάφορα μέτρα ομοιότητας καθώς και μια εφαρμογή σύγκρισης αυτών. Στο τέταρτο κεφάλαιο αναφέρουμε τις διάφορες μεθόδους μείωσης της διάστασης και στο πέμπτο παρουσιάζουμε την δικιά μας μεθοδολογία για το πρόβλημα. Έπειτα στο έκτο κεφάλαιο εφαρμόζουμε την μεθοδολογία μας σε πειραματικά δεδομένα. Η εργασία κλείνει με τα συμπεράσματα μας και κατευθύνσεις για μελλοντική έρευνα. / Text mining is a new searching field which tries to solve the problem of information overloading by using techniques from data mining, natural language processing, information retrieval, information extraction and knowledge management. At the first part of this diplomatic paper we detailed refer to this new searching field, separated it from all the others relative fields. The main target of text mining is helping users to extract information from big text resources. Two of the most important tasks are document categorization and document clustering. There is an increasing concern in document clustering due to explosive growth of the WWW, digital libraries, technical documentation, medical data, etc. The most critical problems for document clustering are the high dimensionality of the natural language text and the choice of features used to represent a domain. Thus, an increasing number of researchers have concentrated on the investigation of the relative effectiveness of various dimension reduction techniques and of the relationship between the selected features used to represent text and the quality of the final clustering. There are two important types of techniques that reduce dimension: transformation methods and selection methods. At the second part of this diplomatic paper we represent a new selection method trying to tackle these problems. The proposed methodology is based on Association Rule Mining. We also present and analyze empirical tests, which demonstrate the performance of the proposed methodology. Through the results that we obtained we found out that dimension has been reduced. However, the more we have been trying to reduce it, according to methodology, the bigger loss of precision we have been taking. There has been an effort for improving the results through a different feature selection. That kind of efforts are taking place even today. In document clustering is also important the choice of the similarity measure. In this diplomatic paper we refer several of these measures that exist to bibliography and we compare them in relative application. The paper totally has seven chapters. At the first chapter there is a brief review about text mining. At the second chapter we describe the tasks, the methods and the tools are used in text mining. At the third chapter we give the way of document representation, the various similarity measures and an application to compare them. At the fourth chapter we refer different kind of methods that reduce dimensions and at the fifth chapter we represent our own methodology for the problem. After that at the sixth chapter we apply our methodology to experimental data. The paper ends up with our conclusions and directions for future research. Εξόρυξη κειμένου Ανάκτηση πληροφορίας Στάθμιση όρων Ομαδοποίηση κειμένων Κανόνες συσχέτισης 006.312 Text mining Information retrieval Feature selection Term weighting Text clustering Association rules
69	Μεθοδολογικό πλαίσιο υποστήριξης της εξόρυξης γνώσης από δεδομένα με την χρήση αρχών της πολυκριτήριας ανάλυσης αποφάσεων Μαστρογιάννης, Νικόλαος 11 January 2010 (has links) Η εξόρυξη γνώση από δεδομένα είναι μια νέα και δυναμική τεχνολογία που βοηθάει τις επιχειρήσεις να επικεντρωθούν στην σημαντική πληροφορία που βρίσκεται μέσα στις αποθήκες δεδομένων τους, αναζητώντας κρυμμένα πρότυπα και ανακαλύπτοντας πληροφορίες που οι ειδικοί μπορεί να χάσουν ή να παραβλέψουν. Τα τελευταία χρόνια έχει αναπτυχθεί πλήθος αλγορίθμων της εξόρυξης δεδομένων, οι οποίοι ακολουθούν διαφορετικές μεθοδολογικές προσεγγίσεις, ενώ ταυτόχρονα παρουσιάζουν σημαντική ποικιλία εφαρμογών. Η προσπάθεια ωστόσο για βελτιωμένους και αποδοτικότερους αλγορίθμους συνεχίζεται. Η παρούσα διδακτορική διατριβή έχει σαν βασικό της στόχο να συνεισφέρει στην προσπάθεια αυτή, βελτιώνοντας και ενισχύοντας την θεωρητική θεμελίωση υφιστάμενων αλγορίθμων της εξόρυξης δεδομένων. Ειδικότερα, μέσα από μια διαφορετική λογική, η οποία βασίζεται σε έννοιες και διαδικασίες της πολυκριτήριας ανάλυσης αποφάσεων, και ειδικότερα της μεθόδου ELECTRE I της θεωρίας των σχέσεων υπεροχής, η διδακτορική διατριβή αναπτύσσει ένα νέο μεθοδολογικό πλαίσιο για την εξόρυξη δεδομένων. Ενσωματώνοντας στην συνέχεια αυτό το μεθοδολογικό πλαίσιο σε υφιστάμενους αλγορίθμους, δημιουργούνται ουσιαστικά νέοι, αποτελεσματικότεροι και ακριβέστεροι αλγόριθμοι, για επιμέρους διαδικασίες και εφαρμογές της εξόρυξης δεδομένων. Πιο συγκεκριμένα, το προτεινόμενο μεθοδολογικό πλαίσιο, εφαρμόστηκε, με τις αναγκαίες τροποποιήσεις, στις διαδικασίες της ταξινόμησης και της ομαδοποίησης κατηγορικών αντικειμένων, μέσω των μεθόδων CLEDM και CLEKMODES, αντίστοιχα. Τα καλά αποτελέσματα από την εφαρμογή των παραπάνω μεθόδων σε μια σειρά ευρέως χρησιμοποιούμενων βάσεων δεδομένων, σε συνδυασμό με την δυνατότητα επέκτασης του μεθοδολογικού πλαισίου και σε άλλες διαδικασίες της εξόρυξης δεδομένων, διαμορφώνουν ένα νέο «υβριδικό» πεδίο έρευνας. Το πεδίο αυτό, αφενός έχει την δυναμική παραγωγής συνεχώς καλύτερων αλγορίθμων για την εξόρυξη δεδομένων, αφετέρου μπορεί να εξερευνήσει εις βάθος και να τυποποιήσει περαιτέρω την αλληλεπίδραση της εξόρυξης δεδομένων με την πολυκριτήρια ανάλυση αποφάσεων. / Data mining is a new and advancing technology that helps corporations to focus on the most important pieces of information stored in their data warehouses. In particular, data mining searches for hidden patterns and is able to discover information that otherwise could be missed or overlooked by experts. During the last years, a series of data mining algorithms has been developed. These algorithms are based on different methodological patterns and they can be implemented to solve a large variety of applications. However, the effort to build more advanced and efficient data mining algorithms has never stopped. The goal of this PhD thesis is to significantly contribute to the above effort by enhancing and improving the theoretical framework of existing data mining algorithms. More specifically, a different theoretical perspective is introduced, that is based on concepts and procedures of multicriteria analysis and in particular the ELECTRE I method of the outranking relations theory. Consequently, the PhD thesis develops a new methodological framework for data mining that can be incorporated to existing algorithms. This incorporation essentially develops new, more effective and accurate data mining algorithms, for a series of tasks and applications. In particular, the proposed methodological framework was applied, properly modified, to the tasks of classification and clustering, through the CLEDM and CLEKMODES methods, respectively. The good results of these methods in a series of widely used databases, and the perspective of expanding the new methodological framework to other data mining tasks as well, are able to introduce a new “hybrid” research field. This research field has the potential of producing better data mining algorithms and furthermore the potential to thoroughly explore and further formalize the interaction of data mining and multicriteria analysis. Εξόρυξη δεδομένων Βάσεις δεδομένων Μέθοδος ELECTRE I Αλγόριθμοι Ταξινόμηση Ομαδοποίηση 006.312 Data mining Databases Multicriteria analysis ELECTRE I method Algorithms Classification Clustering
70	Εξόρυξη και διαχείριση κανόνων συσχέτισης με χρήση τεχνικών ανάκτησης πληροφορίας Βαρσάμης, Θεόδωρος 11 June 2013 (has links) Σε έναν κόσμο που κατακλύζεται από δεδομένα, καθίσταται αναγκαία η αποδοτική οργάνωσή τους και η μετέπειτα επεξεργασία τους, με σκοπό την εύρεση και την ανάκτηση πληροφορίας για λήψη αποφάσεων. Στα πλαίσια της προσπάθειας αυτής έχουν δημοσιευθεί διάφορες μελέτες που στοχεύουν στην ανεύρεση σχέσεων μεταξύ των δεδομένων, οι οποίες μπορούν να αναδείξουν άγνωστες μέχρι πρότινος εξαρτήσεις και να επιτρέψουν την πρόγνωση και την πρόβλεψη μελλοντικών αποτελεσμάτων και αποφάσεων. Στην εργασία αυτή μελετάμε τους πιο διαδεδομένους αλγορίθμους εύρεσης κανόνων συσχετίσεων και ακολούθως προτείνουμε ένα σχήμα που χρησιμοποιεί ως βασική δομή για την ανάκτηση πληροφορίας από βάσεις δεδομένων συναλλαγών τα αντεστραμμένα αρχεία. Στόχος μας είναι η εύκολη παραγωγή κανόνων συσχέτισης αντικειμένων, βασιζόμενη στην αποδοτική αποθήκευση και ανάκτηση των Συχνών Συνόλων Αντικειμένων (Frequent Itemsets). Αρχικά επικεντρωνόμαστε στον τρόπο εύρεσης και αποθήκευσης ενός ελάχιστου συνόλου συναλλαγών, εκμεταλλευόμενοι την πληροφορία που εμπεριέχουν τα Κλειστά Συχνά Σύνολα Αντικειμένων (Closed Frequent Itemsets) και τα Μέγιστα Συχνά Σύνολα Αντικειμένων (Maximum Frequent Itemsets). Στη συνέχεια, αξιοποιώντας την αποθηκευμένη πληροφορία στα MFI και με ελάχιστο υπολογιστικό κόστος, προτείνουμε τον αλγόριθμο MFI-drive που απαντάει σε ερωτήματα εύρεσης υπερσυνόλου και υποσυνόλου αντικειμένων, καθώς και συνόλων αντικειμένων με προκαθορισμένο βαθμό ομοιότητας σε σχέση με ένα δεδομένο σύνολο. / -- Εξόρυξη πληροφορίας Ανεστραμμένα αρχεία 006.312 Data mining Closed frequent itemsets Maximum frequent itemsets Invert index files MFI-drive

Search results