• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 19
  • 11
  • 4
  • 3
  • Tagged with
  • 44
  • 44
  • 37
  • 16
  • 16
  • 11
  • 10
  • 10
  • 6
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Lógicas probabilísticas com relações de independência: representação de conhecimento e aprendizado de máquina. / Probabilistic logics with independence relationships: knowledge representation and machine learning.

Ochoa Luna, José Eduardo 17 May 2011 (has links)
A combinação de lógica e probabilidade (lógicas probabilísticas) tem sido um tópico bastante estudado nas últimas décadas. A maioria de propostas para estes formalismos pressupõem que tanto as sentenças lógicas como as probabilidades sejam especificadas por especialistas. Entretanto, a crescente disponibilidade de dados relacionais sugere o uso de técnicas de aprendizado de máquina para produzir sentenças lógicas e estimar probabilidades. Este trabalho apresenta contribuições em termos de representação de conhecimento e aprendizado. Primeiro, uma linguagem lógica probabilística de primeira ordem é proposta. Em seguida, três algoritmos de aprendizado de lógica de descrição probabilística crALC são apresentados: um algoritmo probabilístico com ênfase na indução de sentenças baseada em classificadores Noisy-OR; um algoritmo que foca na indução de inclusões probabilísticas (componente probabilístico de crALC); um algoritmo de natureza probabilística que induz sentenças lógicas ou inclusões probabilísticas. As propostas de aprendizado são avaliadas em termos de acurácia em duas tarefas: no aprendizado de lógicas de descrição e no aprendizado de terminologias probabilísticas em crALC. Adicionalmente, são discutidas aplicações destes algoritmos em processos de recuperação de informação: duas abordagens para extensão semântica de consultas na Web usando ontologias probabilísticas são discutidas. / The combination of logic and probabilities (probabilistic logics) is a topic that has been extensively explored in past decades. The majority of work in probabilistic logics assumes that both logical sentences and probabilities are specified by experts. As relational data is increasingly available, machine learning algorithms have been used to induce both logical sentences and probabilities. This work contributes in knowledge representation and learning. First, a rst-order probabilistic logic is proposed. Then, three algorithms for learning probabilistic description logic crALC are given: a probabilistic algorithm focused on learning logical sentences and based on Noisy-OR classiers; an algorithm that aims at learning probabilistic inclusions (probabilistic component of crALC) and; an algorithm that using a probabilistic setting, induces either logical sentences or probabilistic inclusions. Evaluation of these proposals has been performed in two situations: by measuring learning accuracy of both description logics and probabilistic terminologies. In addition, these learning algorithms have been applied to information retrieval processes: two approaches for semantic query extension through probabilistic ontologies are discussed.
32

Apport des ontologies de domaine pour l'extraction de connaissances à partir de données biomédicales / Contribution of domain ontologies for knowledge discovery in biomedical data

Personeni, Gabin 09 November 2018 (has links)
Le Web sémantique propose un ensemble de standards et d'outils pour la formalisation et l'interopérabilité de connaissances partagées sur le Web, sous la forme d'ontologies. Les ontologies biomédicales et les données associées constituent de nos jours un ensemble de connaissances complexes, hétérogènes et interconnectées, dont l'analyse est porteuse de grands enjeux en santé, par exemple dans le cadre de la pharmacovigilance. On proposera dans cette thèse des méthodes permettant d'utiliser ces ontologies biomédicales pour étendre les possibilités d'un processus de fouille de données, en particulier, permettant de faire cohabiter et d'exploiter les connaissances de plusieurs ontologies biomédicales. Les travaux de cette thèse concernent dans un premier temps une méthode fondée sur les structures de patrons, une extension de l'analyse formelle de concepts pour la découverte de co-occurences de événements indésirables médicamenteux dans des données patients. Cette méthode utilise une ontologie de phénotypes et une ontologie de médicaments pour permettre la comparaison de ces événements complexes, et la découverte d'associations à différents niveaux de généralisation, par exemple, au niveau de médicaments ou de classes de médicaments. Dans un second temps, on utilisera une méthode numérique fondée sur des mesures de similarité sémantique pour la classification de déficiences intellectuelles génétiques. On étudiera deux mesures de similarité utilisant des méthodes de calcul différentes, que l'on utilisera avec différentes combinaisons d'ontologies phénotypiques et géniques. En particulier, on quantifiera l'influence que les différentes connaissances de domaine ont sur la capacité de classification de ces mesures, et comment ces connaissances peuvent coopérer au sein de telles méthodes numériques. Une troisième étude utilise les données ouvertes liées ou LOD du Web sémantique et les ontologies associées dans le but de caractériser des gènes responsables de déficiences intellectuelles. On utilise ici la programmation logique inductive, qui s'avère adaptée pour fouiller des données relationnelles comme les LOD, en prenant en compte leurs relations avec les ontologies, et en extraire un modèle prédictif et descriptif des gènes responsables de déficiences intellectuelles. L'ensemble des contributions de cette thèse montre qu'il est possible de faire coopérer avantageusement une ou plusieurs ontologies dans divers processus de fouille de données / The semantic Web proposes standards and tools to formalize and share knowledge on the Web, in the form of ontologies. Biomedical ontologies and associated data represents a vast collection of complex, heterogeneous and linked knowledge. The analysis of such knowledge presents great opportunities in healthcare, for instance in pharmacovigilance. This thesis explores several ways to make use of this biomedical knowledge in the data mining step of a knowledge discovery process. In particular, we propose three methods in which several ontologies cooperate to improve data mining results. A first contribution of this thesis describes a method based on pattern structures, an extension of formal concept analysis, to extract associations between adverse drug events from patient data. In this context, a phenotype ontology and a drug ontology cooperate to allow a semantic comparison of these complex adverse events, and leading to the discovery of associations between such events at varying degrees of generalization, for instance, at the drug or drug class level. A second contribution uses a numeric method based on semantic similarity measures to classify different types of genetic intellectual disabilities, characterized by both their phenotypes and the functions of their linked genes. We study two different similarity measures, applied with different combinations of phenotypic and gene function ontologies. In particular, we investigate the influence of each domain of knowledge represented in each ontology on the classification process, and how they can cooperate to improve that process. Finally, a third contribution uses the data component of the semantic Web, the Linked Open Data (LOD), together with linked ontologies, to characterize genes responsible for intellectual deficiencies. We use Inductive Logic Programming, a suitable method to mine relational data such as LOD while exploiting domain knowledge from ontologies by using reasoning mechanisms. Here, ILP allows to extract from LOD and ontologies a descriptive and predictive model of genes responsible for intellectual disabilities. These contributions illustrates the possibility of having several ontologies cooperate to improve various data mining processes
33

Uma abordagem híbrida relacional para a desambiguação lexical de sentido na tradução automática / A hybrid relational approach for word sense disambiguation in machine translation

Specia, Lucia 28 September 2007 (has links)
A comunicação multilíngue é uma tarefa cada vez mais imperativa no cenário atual de grande disseminação de informações em diversas línguas. Nesse contexto, são de grande relevância os sistemas de tradução automática, que auxiliam tal comunicação, automatizando-a. Apesar de ser uma área de pesquisa bastante antiga, a Tradução Automática ainda apresenta muitos problemas. Um dos principais problemas é a ambigüidade lexical, ou seja, a necessidade de escolha de uma palavra, na língua alvo, para traduzir uma palavra da língua fonte quando há várias opções de tradução. Esse problema se mostra ainda mais complexo quando são identificadas apenas variações de sentido nas opções de tradução. Ele é denominado, nesse caso, \"ambigüidade lexical de sentido\". Várias abordagens têm sido propostas para a desambiguação lexical de sentido, mas elas são, em geral, monolíngues (para o inglês) e independentes de aplicação. Além disso, apresentam limitações no que diz respeito às fontes de conhecimento que podem ser exploradas. Em se tratando da língua portuguesa, em especial, não há pesquisas significativas voltadas para a resolução desse problema. O objetivo deste trabalho é a proposta e desenvolvimento de uma nova abordagem de desambiguação lexical de sentido, voltada especificamente para a tradução automática, que segue uma metodologia híbrida (baseada em conhecimento e em córpus) e utiliza um formalismo relacional para a representação de vários tipos de conhecimentos e de exemplos de desambiguação, por meio da técnica de Programação Lógica Indutiva. Experimentos diversos mostraram que a abordagem proposta supera abordagens alternativas para a desambiguação multilíngue e apresenta desempenho superior ou comparável ao do estado da arte em desambiguação monolíngue. Adicionalmente, tal abordagem se mostrou efetiva como mecanismo auxiliar para a escolha lexical na tradução automática estatística / Crosslingual communication has become a very imperative task in the current scenario with the increasing amount of information dissemination in several languages. In this context, machine translation systems, which can facilitate such communication by providing automatic translations, are of great importance. Although research in Machine Translation dates back to the 1950\'s, the area still has many problems. One of the main problems is that of lexical ambiguity, that is, the need for lexical choice when translating a source language word that has several translation options in the target language. This problem is even more complex when only sense variations are found in the translation options, a problem named \"sense ambiguity\". Several approaches have been proposed for word sense disambiguation, but they are in general monolingual (for English) and application-independent. Moreover, they have limitations regarding the types of knowledge sources that can be exploited. Particularly, there is no significant research aiming to word sense disambiguation involving Portuguese. The goal of this PhD work is the proposal and development of a novel approach for word sense disambiguation which is specifically designed for machine translation, follows a hybrid methodology (knowledge and corpus-based), and employs a relational formalism to represent various kinds of knowledge sources and disambiguation examples, by using Inductive Logic Programming. Several experiments have shown that the proposed approach overcomes alternative approaches in multilingual disambiguation and achieves higher or comparable results to the state of the art in monolingual disambiguation. Additionally, the approach has shown to effectively assist lexical choice in a statistical machine translation system
34

Deep Learning Black Box Problem

Hussain, Jabbar January 2019 (has links)
Application of neural networks in deep learning is rapidly growing due to their ability to outperform other machine learning algorithms in different kinds of problems. But one big disadvantage of deep neural networks is its internal logic to achieve the desired output or result that is un-understandable and unexplainable. This behavior of the deep neural network is known as “black box”. This leads to the following questions: how prevalent is the black box problem in the research literature during a specific period of time? The black box problems are usually addressed by socalled rule extraction. The second research question is: what rule extracting methods have been proposed to solve such kind of problems? To answer the research questions, a systematic literature review was conducted for data collection related to topics, the black box, and the rule extraction. The printed and online articles published in higher ranks journals and conference proceedings were selected to investigate and answer the research questions. The analysis unit was a set of journals and conference proceedings articles related to the topics, the black box, and the rule extraction. The results conclude that there has been gradually increasing interest in the black box problems with the passage of time mainly because of new technological development. The thesis also provides an overview of different methodological approaches used for rule extraction methods.
35

Επαγωγικός λογικός προγραμματισμός : μια διδακτική προσέγγιση

Καραμουτζογιάννη, Ζωή 31 May 2012 (has links)
Ο Επαγωγικός Λογικός Προγραμματισμός (Inductive Logic Programming ή, σε συντομογραφία ILP) είναι ο ερευνητικός τομέας της Τεχνητής Νοημοσύνης (Artificial Intelligence) που δραστηριοποιείται στη τομή των γνωστικών περιοχών της Μάθησης Μηχανής (Machine Learning) και του Λογικού Προγραμματισμού (Logic Programming).Ο όρος επαγωγικός εκφράζει την ιδέα του συλλογισμού από το επί μέρους στο γενικό. Μέσω της επαγωγικής μάθησης μηχανής ο Επαγωγικός Λογικός Προγραμματισμός επιτυγχάνει το στόχο του που είναι η δημιουργία εργαλείων και η ανάπτυξη τεχνικών για την εξαγωγή υποθέσεων από παρατηρήσεις (παραδείγματα) και η σύνθεση-απόκτηση νέας γνώσης από εμπειρικές παρατηρήσεις. Σε αντίθεση με της περισσότερες άλλες προσεγγίσεις της επαγωγικής μάθησης ο Επαγωγικός Λογικός Προγραμματισμός ενδιαφέρεται για της ιδιότητες του συμπερασμού με κανόνες για την σύγκλιση αλγορίθμων και για την υπολογιστική πολυπλοκότητα των διαδικασιών. Ο Επαγωγικός Λογικός Προγραμματισμός ασχολείται με την ανάπτυξη τεχνικών και εργαλείων για την σχεσιακή ανάλυση δεδομένων. Εφαρμόζεται απευθείας σε δεδομένα πολλαπλών συσχετισμών για την ανακάλυψη προτύπων. Τα πρότυπα που ανακαλύπτονται από τα συστήματα στον Επαγωγικό Λογικό Προγραμματισμό προκύπτουν από κάποιο γνωστό θεωρητικό υπόβαθρο και θετικά και αρνητικά παραδείγματα και εκφράζονται ως λογικά προγράμματα. Ο Επαγωγικός Λογικός Προγραμματισμός έχει χρησιμοποιηθεί εκτεταμένα σε προβλήματα που αφορούν τη μοριακή βιολογία, την βιοχημεία και την χημεία. Ο Επαγωγικός Λογικός Προγραμματισμός διαφοροποιείται από τις άλλες μορφές Μάθησης Μηχανής, αφ’ ενός μεν λόγω της χρήσης μιας εκφραστικής γλώσσας αναπαράστασης και αφ’ ετέρου από τη δυνατότητά του να χρησιμοποιεί τη γνώση υποβάθρου. Έχουν αναπτυχθεί διάφορες μηχανισμούς υλοποίησης του ILP, εκ των οποίων η πιο πρόσφατη είναι η Progol, που βασίζεται σε ένα διερμηνέα της Prolog ο οποίος συνοδεύεται από έναν αλγόριθμο Αντίστροφης Συνεπαγωγής (Inverse Entailment). Η Progol κατασκευάζει νέες προτάσεις με τη γενίκευση των παραδειγμάτων που περιέχονται στη βάση δεδομένων που της δίνεται. Η θεωρία του Επαγωγικού Λογικού Προγραμματισμού εγγυάται ότι η Progol θα διεξάγει μια αποδεκτή αναζήτηση στο διάστημα των γενικεύσεων, βρίσκοντας το ελάχιστο σύνολο προτάσεων, από το οποίο όλα τα παραδείγματα μπορούν να προκύψουν. Σε αυτή την εργασία θα αναπτυχθούν αναλυτικά η θεωρία και οι κανόνες του Επαγωγικού Λογικού Προγραμματισμού, τα είδη των προβλημάτων που επιλύονται μέσω του Επαγωγικού Λογικού Προγραμματισμού, οι μέθοδοι που ακολουθούνται καθώς και ο τρόπος με τον οποίο αναπτύσσονται οι εφαρμογές του Επαγωγικού Λογικού Προγραμματισμού. Θα δοθούν επίσης παραδείγματα κατάλληλα για την κατανόηση των γνώσεων αυτών από ένα ακροατήριο που διαθέτει βασικές γνώσεις Λογικής και Λογικού Προγραμματισμού. / Inductive Logic Programming is a research area of Artificial Intelligence that operates in the intersection of cognitive areas of Machine Learning and Logic Programming. Through inductive machine learning, Inductive Logic Programming‟s objective is creating tools and developing techniques to extract new knowledge composing a background one and empirical observations (examples). Some methods are employed, the best known of which is the reverse implication, the reverse resolution and the inverse implication. Based on Inductive Logic Programming, some systems have been developed for knowledge production. The most widely used system is Progol, which uses an input of examples and background knowledge, whichε is stated in a kind of grammar compatible to that the programming language Prolog, and generates procedures in the same language that illustrate these examples. Other systems are FOIL, MOBAL, GOLEM and LINUS. There is also Cigol which is a programming language based on the theory of Inductive Logic Programming. These systems are used in many applications. The most important is the area of pharmacology, such as predictive toxicology, the provision of rheumatic disease and the design of drugs for Alzheimer's. Applications can also be found in programming, linguistics and games like chess.
36

Lógicas probabilísticas com relações de independência: representação de conhecimento e aprendizado de máquina. / Probabilistic logics with independence relationships: knowledge representation and machine learning.

José Eduardo Ochoa Luna 17 May 2011 (has links)
A combinação de lógica e probabilidade (lógicas probabilísticas) tem sido um tópico bastante estudado nas últimas décadas. A maioria de propostas para estes formalismos pressupõem que tanto as sentenças lógicas como as probabilidades sejam especificadas por especialistas. Entretanto, a crescente disponibilidade de dados relacionais sugere o uso de técnicas de aprendizado de máquina para produzir sentenças lógicas e estimar probabilidades. Este trabalho apresenta contribuições em termos de representação de conhecimento e aprendizado. Primeiro, uma linguagem lógica probabilística de primeira ordem é proposta. Em seguida, três algoritmos de aprendizado de lógica de descrição probabilística crALC são apresentados: um algoritmo probabilístico com ênfase na indução de sentenças baseada em classificadores Noisy-OR; um algoritmo que foca na indução de inclusões probabilísticas (componente probabilístico de crALC); um algoritmo de natureza probabilística que induz sentenças lógicas ou inclusões probabilísticas. As propostas de aprendizado são avaliadas em termos de acurácia em duas tarefas: no aprendizado de lógicas de descrição e no aprendizado de terminologias probabilísticas em crALC. Adicionalmente, são discutidas aplicações destes algoritmos em processos de recuperação de informação: duas abordagens para extensão semântica de consultas na Web usando ontologias probabilísticas são discutidas. / The combination of logic and probabilities (probabilistic logics) is a topic that has been extensively explored in past decades. The majority of work in probabilistic logics assumes that both logical sentences and probabilities are specified by experts. As relational data is increasingly available, machine learning algorithms have been used to induce both logical sentences and probabilities. This work contributes in knowledge representation and learning. First, a rst-order probabilistic logic is proposed. Then, three algorithms for learning probabilistic description logic crALC are given: a probabilistic algorithm focused on learning logical sentences and based on Noisy-OR classiers; an algorithm that aims at learning probabilistic inclusions (probabilistic component of crALC) and; an algorithm that using a probabilistic setting, induces either logical sentences or probabilistic inclusions. Evaluation of these proposals has been performed in two situations: by measuring learning accuracy of both description logics and probabilistic terminologies. In addition, these learning algorithms have been applied to information retrieval processes: two approaches for semantic query extension through probabilistic ontologies are discussed.
37

Uma abordagem híbrida relacional para a desambiguação lexical de sentido na tradução automática / A hybrid relational approach for word sense disambiguation in machine translation

Lucia Specia 28 September 2007 (has links)
A comunicação multilíngue é uma tarefa cada vez mais imperativa no cenário atual de grande disseminação de informações em diversas línguas. Nesse contexto, são de grande relevância os sistemas de tradução automática, que auxiliam tal comunicação, automatizando-a. Apesar de ser uma área de pesquisa bastante antiga, a Tradução Automática ainda apresenta muitos problemas. Um dos principais problemas é a ambigüidade lexical, ou seja, a necessidade de escolha de uma palavra, na língua alvo, para traduzir uma palavra da língua fonte quando há várias opções de tradução. Esse problema se mostra ainda mais complexo quando são identificadas apenas variações de sentido nas opções de tradução. Ele é denominado, nesse caso, \"ambigüidade lexical de sentido\". Várias abordagens têm sido propostas para a desambiguação lexical de sentido, mas elas são, em geral, monolíngues (para o inglês) e independentes de aplicação. Além disso, apresentam limitações no que diz respeito às fontes de conhecimento que podem ser exploradas. Em se tratando da língua portuguesa, em especial, não há pesquisas significativas voltadas para a resolução desse problema. O objetivo deste trabalho é a proposta e desenvolvimento de uma nova abordagem de desambiguação lexical de sentido, voltada especificamente para a tradução automática, que segue uma metodologia híbrida (baseada em conhecimento e em córpus) e utiliza um formalismo relacional para a representação de vários tipos de conhecimentos e de exemplos de desambiguação, por meio da técnica de Programação Lógica Indutiva. Experimentos diversos mostraram que a abordagem proposta supera abordagens alternativas para a desambiguação multilíngue e apresenta desempenho superior ou comparável ao do estado da arte em desambiguação monolíngue. Adicionalmente, tal abordagem se mostrou efetiva como mecanismo auxiliar para a escolha lexical na tradução automática estatística / Crosslingual communication has become a very imperative task in the current scenario with the increasing amount of information dissemination in several languages. In this context, machine translation systems, which can facilitate such communication by providing automatic translations, are of great importance. Although research in Machine Translation dates back to the 1950\'s, the area still has many problems. One of the main problems is that of lexical ambiguity, that is, the need for lexical choice when translating a source language word that has several translation options in the target language. This problem is even more complex when only sense variations are found in the translation options, a problem named \"sense ambiguity\". Several approaches have been proposed for word sense disambiguation, but they are in general monolingual (for English) and application-independent. Moreover, they have limitations regarding the types of knowledge sources that can be exploited. Particularly, there is no significant research aiming to word sense disambiguation involving Portuguese. The goal of this PhD work is the proposal and development of a novel approach for word sense disambiguation which is specifically designed for machine translation, follows a hybrid methodology (knowledge and corpus-based), and employs a relational formalism to represent various kinds of knowledge sources and disambiguation examples, by using Inductive Logic Programming. Several experiments have shown that the proposed approach overcomes alternative approaches in multilingual disambiguation and achieves higher or comparable results to the state of the art in monolingual disambiguation. Additionally, the approach has shown to effectively assist lexical choice in a statistical machine translation system
38

Learning OWL Class Expressions

Lehmann, Jens 24 June 2010 (has links) (PDF)
With the advent of the Semantic Web and Semantic Technologies, ontologies have become one of the most prominent paradigms for knowledge representation and reasoning. The popular ontology language OWL, based on description logics, became a W3C recommendation in 2004 and a standard for modelling ontologies on the Web. In the meantime, many studies and applications using OWL have been reported in research and industrial environments, many of which go beyond Internet usage and employ the power of ontological modelling in other fields such as biology, medicine, software engineering, knowledge management, and cognitive systems. However, recent progress in the field faces a lack of well-structured ontologies with large amounts of instance data due to the fact that engineering such ontologies requires a considerable investment of resources. Nowadays, knowledge bases often provide large volumes of data without sophisticated schemata. Hence, methods for automated schema acquisition and maintenance are sought. Schema acquisition is closely related to solving typical classification problems in machine learning, e.g. the detection of chemical compounds causing cancer. In this work, we investigate both, the underlying machine learning techniques and their application to knowledge acquisition in the Semantic Web. In order to leverage machine-learning approaches for solving these tasks, it is required to develop methods and tools for learning concepts in description logics or, equivalently, class expressions in OWL. In this thesis, it is shown that methods from Inductive Logic Programming (ILP) are applicable to learning in description logic knowledge bases. The results provide foundations for the semi-automatic creation and maintenance of OWL ontologies, in particular in cases when extensional information (i.e. facts, instance data) is abundantly available, while corresponding intensional information (schema) is missing or not expressive enough to allow powerful reasoning over the ontology in a useful way. Such situations often occur when extracting knowledge from different sources, e.g. databases, or in collaborative knowledge engineering scenarios, e.g. using semantic wikis. It can be argued that being able to learn OWL class expressions is a step towards enriching OWL knowledge bases in order to enable powerful reasoning, consistency checking, and improved querying possibilities. In particular, plugins for OWL ontology editors based on learning methods are developed and evaluated in this work. The developed algorithms are not restricted to ontology engineering and can handle other learning problems. Indeed, they lend themselves to generic use in machine learning in the same way as ILP systems do. The main difference, however, is the employed knowledge representation paradigm: ILP traditionally uses logic programs for knowledge representation, whereas this work rests on description logics and OWL. This difference is crucial when considering Semantic Web applications as target use cases, as such applications hinge centrally on the chosen knowledge representation format for knowledge interchange and integration. The work in this thesis can be understood as a broadening of the scope of research and applications of ILP methods. This goal is particularly important since the number of OWL-based systems is already increasing rapidly and can be expected to grow further in the future. The thesis starts by establishing the necessary theoretical basis and continues with the specification of algorithms. It also contains their evaluation and, finally, presents a number of application scenarios. The research contributions of this work are threefold: The first contribution is a complete analysis of desirable properties of refinement operators in description logics. Refinement operators are used to traverse the target search space and are, therefore, a crucial element in many learning algorithms. Their properties (completeness, weak completeness, properness, redundancy, infinity, minimality) indicate whether a refinement operator is suitable for being employed in a learning algorithm. The key research question is which of those properties can be combined. It is shown that there is no ideal, i.e. complete, proper, and finite, refinement operator for expressive description logics, which indicates that learning in description logics is a challenging machine learning task. A number of other new results for different property combinations are also proven. The need for these investigations has already been expressed in several articles prior to this PhD work. The theoretical limitations, which were shown as a result of these investigations, provide clear criteria for the design of refinement operators. In the analysis, as few assumptions as possible were made regarding the used description language. The second contribution is the development of two refinement operators. The first operator supports a wide range of concept constructors and it is shown that it is complete and can be extended to a proper operator. It is the most expressive operator designed for a description language so far. The second operator uses the light-weight language EL and is weakly complete, proper, and finite. It is straightforward to extend it to an ideal operator, if required. It is the first published ideal refinement operator in description logics. While the two operators differ a lot in their technical details, they both use background knowledge efficiently. The third contribution is the actual learning algorithms using the introduced operators. New redundancy elimination and infinity-handling techniques are introduced in these algorithms. According to the evaluation, the algorithms produce very readable solutions, while their accuracy is competitive with the state-of-the-art in machine learning. Several optimisations for achieving scalability of the introduced algorithms are described, including a knowledge base fragment selection approach, a dedicated reasoning procedure, and a stochastic coverage computation approach. The research contributions are evaluated on benchmark problems and in use cases. Standard statistical measurements such as cross validation and significance tests show that the approaches are very competitive. Furthermore, the ontology engineering case study provides evidence that the described algorithms can solve the target problems in practice. A major outcome of the doctoral work is the DL-Learner framework. It provides the source code for all algorithms and examples as open-source and has been incorporated in other projects.
39

Ontoilper: an ontology- and inductive logic programming-based method to extract instances of entities and relations from texts

Lima, Rinaldo José de, Freitas, Frederico Luiz Gonçalves de 31 January 2014 (has links)
Submitted by Nayara Passos (nayara.passos@ufpe.br) on 2015-03-13T12:33:46Z No. of bitstreams: 2 TESE Rinaldo José de Lima.pdf: 8678943 bytes, checksum: e88c290e414329ee00d2d6a35a466de0 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Approved for entry into archive by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-13T13:16:54Z (GMT) No. of bitstreams: 2 TESE Rinaldo José de Lima.pdf: 8678943 bytes, checksum: e88c290e414329ee00d2d6a35a466de0 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-13T13:16:54Z (GMT). No. of bitstreams: 2 TESE Rinaldo José de Lima.pdf: 8678943 bytes, checksum: e88c290e414329ee00d2d6a35a466de0 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2014 / CNPq, CAPES. / Information Extraction (IE) consists in the task of discovering and structuring information found in a semi-structured or unstructured textual corpus. Named Entity Recognition (NER) and Relation Extraction (RE) are two important subtasks in IE. The former aims at finding named entities, including the name of people, locations, among others, whereas the latter consists in detecting and characterizing relations involving such named entities in text. Since the approach of manually creating extraction rules for performing NER and RE is an intensive and time-consuming task, researchers have turned their attention to how machine learning techniques can be applied to IE in order to make IE systems more adaptive to domain changes. As a result, a myriad of state-of-the-art methods for NER and RE relying on statistical machine learning techniques have been proposed in the literature. Such systems typically use a propositional hypothesis space for representing examples, i.e., an attribute-value representation. In machine learning, the propositional representation of examples presents some limitations, particularly in the extraction of binary relations, which mainly demands not only contextual and relational information about the involving instances, but also more expressive semantic resources as background knowledge. This thesis attempts to mitigate the aforementioned limitations based on the hypothesis that, to be efficient and more adaptable to domain changes, an IE system should exploit ontologies and semantic resources in a framework for IE that enables the automatic induction of extraction rules by employing machine learning techniques. In this context, this thesis proposes a supervised method to extract both entity and relation instances from textual corpora based on Inductive Logic Programming, a symbolic machine learning technique. The proposed method, called OntoILPER, benefits not only from ontologies and semantic resources, but also relies on a highly expressive relational hypothesis space, in the form of logical predicates, for representing examples whose structure is relevant to the information extraction task. OntoILPER automatically induces symbolic extraction rules that subsume examples of entity and relation instances from a tailored graph-based model of sentence representation, another contribution of this thesis. Moreover, this graph-based model for representing sentences also enables the exploitation of domain ontologies and additional background knowledge in the form of a condensed set of features including lexical, syntactic, semantic, and relational ones. Differently from most of the IE methods (a comprehensive survey is presented in this thesis, including the ones that also apply ILP), OntoILPER takes advantage of a rich text preprocessing stage which encompasses various shallow and deep natural language processing subtasks, including dependency parsing, coreference resolution, word sense disambiguation, and semantic role labeling. Further mappings of nouns and verbs to (formal) semantic resources are also considered. OntoILPER Framework, the OntoILPER implementation, was experimentally evaluated on both NER and RE tasks. This thesis reports the results of several assessments conducted using six standard evaluationcorpora from two distinct domains: news and biomedical. The obtained results demonstrated the effectiveness of OntoILPER on both NER and RE tasks. Actually, the proposed framework outperforms some of the state-of-the-art IE systems compared in this thesis. / A área de Extração de Informação (IE) visa descobrir e estruturar informações dispostas em documentos semi-estruturados ou desestruturados. O Reconhecimento de Entidades Nomeadas (REN) e a Extração de Relações (ER) são duas subtarefas importantes em EI. A primeira visa encontrar entidades nomeadas, incluindo nome de pessoas e lugares, entre outros; enquanto que a segunda, consiste na detecção e caracterização de relações que envolvem as entidades nomeadas presentes no texto. Como a tarefa de criar manualmente as regras de extração para realizar REN e ER é muito trabalhosa e onerosa, pesquisadores têm voltado suas atenções na investigação de como as técnicas de aprendizado de máquina podem ser aplicadas à EI a fim de tornar os sistemas de ER mais adaptáveis às mudanças de domínios. Como resultado, muitos métodos do estado-da-arte em REN e ER, baseados em técnicas estatísticas de aprendizado de máquina, têm sido propostos na literatura. Tais sistemas normalmente empregam um espaço de hipóteses com expressividade propositional para representar os exemplos, ou seja, eles são baseado na tradicional representação atributo-valor. Em aprendizado de máquina, a representação proposicional apresenta algums fatores limitantes, principalmente na extração de relações binárias que exigem não somente informações contextuais e estruturais (relacionais) sobre as instâncias, mas também outras formas de como adicionar conhecimento prévio do problema durante o processo de aprendizado. Esta tese visa atenuar as limitações acima mencionadas, tendo como hipótese de trabalho que, para ser eficiente e mais facilmente adaptável às mudanças de domínio, os sistemas de EI devem explorar ontologias e recursos semânticos no contexto de um arcabouço para EI que permita a indução automática de regras de extração de informação através do emprego de técnicas de aprendizado de máquina. Neste contexto, a presente tese propõe um método supervisionado capaz de extrair instâncias de entidades (ou classes de ontologias) e de relações a partir de textos apoiando-se na Programação em Lógica Indutiva (PLI), uma técnica de aprendizado de máquina supervisionada capaz de induzir regras simbólicas de classificação. O método proposto, chamado OntoILPER, não só se beneficia de ontologias e recursos semânticos, mas também se baseia em um expressivo espaço de hipóteses, sob a forma de predicados lógicos, capaz de representar exemplos cuja estrutura é relevante para a tarefa de EI consideradas nesta tese. OntoILPER automaticamente induz regras simbólicas para classificar exemplos de instâncias de entidades e relações a partir de um modelo de representação de frases baseado em grafos. Tal modelo de representação é uma das constribuições desta tese. Além disso, o modelo baseado em grafos para representação de frases e exemplos (instâncias de classes e relações) favorece a integração de conhecimento prévio do problema na forma de um conjunto reduzido de atributos léxicos, sintáticos, semânticos e estruturais. Diferentemente da maioria dos métodos de EI (uma pesquisa abrangente é apresentada nesta tese, incluindo aqueles que também se aplicam a PLI), OntoILPER faz uso de várias subtarefas do Processamento de Linguagem
40

Apprentissage de connaissances structurelles à partir d’images satellitaires et de données exogènes pour la cartographie dynamique de l’environnement amazonien / Structurel Knowledge learning from satellite images and exogenous data for dynamic mapping of the amazonian environment

Bayoudh, Meriam 06 December 2013 (has links)
Les méthodes classiques d'analyse d'images satellites sont inadaptées au volume actuel du flux de données. L'automatisation de l'interprétation de ces images devient donc cruciale pour l'analyse et la gestion des phénomènes observables par satellite et évoluant dans le temps et l'espace. Ce travail vise à automatiser la cartographie dynamique de l'occupation du sol à partir d'images satellites, par des mécanismes expressifs, facilement interprétables en prenant en compte les aspects structurels de l'information géographique. Il s'inscrit dans le cadre de l'analyse d'images basée objet. Ainsi, un paramétrage supervisé d'un algorithme de segmentation d'images est proposé. Dans un deuxième temps, une méthode de classification supervisée d'objets géographiques est présentée combinant apprentissage automatique par programmation logique inductive et classement par l'approche multi-class rule set intersection. Ces approches sont appliquées à la cartographie de la bande côtière Guyanaise. Les résultats démontrent la faisabilité du paramétrage de la segmentation, mais également sa variabilité en fonction des classes de la carte de référence et des données d'entrée. Les résultats de la classification supervisée montrent qu'il est possible d'induire des règles de classification expressives, véhiculant des informations cohérentes et structurelles dans un contexte applicatif donnée et conduisant à des valeurs satisfaisantes de précision et de KAPPA (respectivement 84,6% et 0,7). Ce travail de thèse contribue ainsi à l'automatisation de la cartographie dynamique à partir d'images de télédétection et propose des perspectives originales et prometteuses. / Classical methods for satellite image analysis are inadequate for the current bulky data flow. Thus, automate the interpretation of such images becomes crucial for the analysis and management of phenomena changing in time and space, observable by satellite. Thus, this work aims at automating land cover cartography from satellite images, by expressive and easily interpretable mechanism, and by explicitly taking into account structural aspects of geographic information. It is part of the object-based image analysis framework, and assumes that it is possible to extract useful contextual knowledge from maps. Thus, a supervised parameterization methods of a segmentation algorithm is proposed. Secondly, a supervised classification of geographical objects is presented. It combines machine learning by inductive logic programming and the multi-class rule set intersection approach. These approaches are applied to the French Guiana coastline cartography. The results demonstrate the feasibility of the segmentation parameterization, but also its variability as a function of the reference map classes and of the input data. Yet, methodological developments allow to consider an operational implementation of such an approach. The results of the object supervised classification show that it is possible to induce expressive classification rules that convey consistent and structural information in a given application context and lead to reliable predictions, with overall accuracy and Kappa values equal to, respectively, 84,6% and 0,7. In conclusion, this work contributes to the automation of the dynamic cartography from remotely sensed images and proposes original and promising perpectives

Page generated in 0.1589 seconds