Global ETD Search

31	Uma abordagem híbrida relacional para a desambiguação lexical de sentido na tradução automática / A hybrid relational approach for word sense disambiguation in machine translation Lucia Specia 28 September 2007 (has links) A comunicação multilíngue é uma tarefa cada vez mais imperativa no cenário atual de grande disseminação de informações em diversas línguas. Nesse contexto, são de grande relevância os sistemas de tradução automática, que auxiliam tal comunicação, automatizando-a. Apesar de ser uma área de pesquisa bastante antiga, a Tradução Automática ainda apresenta muitos problemas. Um dos principais problemas é a ambigüidade lexical, ou seja, a necessidade de escolha de uma palavra, na língua alvo, para traduzir uma palavra da língua fonte quando há várias opções de tradução. Esse problema se mostra ainda mais complexo quando são identificadas apenas variações de sentido nas opções de tradução. Ele é denominado, nesse caso, \"ambigüidade lexical de sentido\". Várias abordagens têm sido propostas para a desambiguação lexical de sentido, mas elas são, em geral, monolíngues (para o inglês) e independentes de aplicação. Além disso, apresentam limitações no que diz respeito às fontes de conhecimento que podem ser exploradas. Em se tratando da língua portuguesa, em especial, não há pesquisas significativas voltadas para a resolução desse problema. O objetivo deste trabalho é a proposta e desenvolvimento de uma nova abordagem de desambiguação lexical de sentido, voltada especificamente para a tradução automática, que segue uma metodologia híbrida (baseada em conhecimento e em córpus) e utiliza um formalismo relacional para a representação de vários tipos de conhecimentos e de exemplos de desambiguação, por meio da técnica de Programação Lógica Indutiva. Experimentos diversos mostraram que a abordagem proposta supera abordagens alternativas para a desambiguação multilíngue e apresenta desempenho superior ou comparável ao do estado da arte em desambiguação monolíngue. Adicionalmente, tal abordagem se mostrou efetiva como mecanismo auxiliar para a escolha lexical na tradução automática estatística / Crosslingual communication has become a very imperative task in the current scenario with the increasing amount of information dissemination in several languages. In this context, machine translation systems, which can facilitate such communication by providing automatic translations, are of great importance. Although research in Machine Translation dates back to the 1950\'s, the area still has many problems. One of the main problems is that of lexical ambiguity, that is, the need for lexical choice when translating a source language word that has several translation options in the target language. This problem is even more complex when only sense variations are found in the translation options, a problem named \"sense ambiguity\". Several approaches have been proposed for word sense disambiguation, but they are in general monolingual (for English) and application-independent. Moreover, they have limitations regarding the types of knowledge sources that can be exploited. Particularly, there is no significant research aiming to word sense disambiguation involving Portuguese. The goal of this PhD work is the proposal and development of a novel approach for word sense disambiguation which is specifically designed for machine translation, follows a hybrid methodology (knowledge and corpus-based), and employs a relational formalism to represent various kinds of knowledge sources and disambiguation examples, by using Inductive Logic Programming. Several experiments have shown that the proposed approach overcomes alternative approaches in multilingual disambiguation and achieves higher or comparable results to the state of the art in monolingual disambiguation. Additionally, the approach has shown to effectively assist lexical choice in a statistical machine translation system Ambigüidade Lexical de Sentido Desambiguação Lexical de Sentido Programação Lógica Indutiva Tradução Automática Inductive Logic Programming Lexical Semantic Ambiguity Machine Translation Word Sense Disambiguation
32	Learning OWL Class Expressions Lehmann, Jens 24 June 2010 (has links) (PDF) With the advent of the Semantic Web and Semantic Technologies, ontologies have become one of the most prominent paradigms for knowledge representation and reasoning. The popular ontology language OWL, based on description logics, became a W3C recommendation in 2004 and a standard for modelling ontologies on the Web. In the meantime, many studies and applications using OWL have been reported in research and industrial environments, many of which go beyond Internet usage and employ the power of ontological modelling in other fields such as biology, medicine, software engineering, knowledge management, and cognitive systems. However, recent progress in the field faces a lack of well-structured ontologies with large amounts of instance data due to the fact that engineering such ontologies requires a considerable investment of resources. Nowadays, knowledge bases often provide large volumes of data without sophisticated schemata. Hence, methods for automated schema acquisition and maintenance are sought. Schema acquisition is closely related to solving typical classification problems in machine learning, e.g. the detection of chemical compounds causing cancer. In this work, we investigate both, the underlying machine learning techniques and their application to knowledge acquisition in the Semantic Web. In order to leverage machine-learning approaches for solving these tasks, it is required to develop methods and tools for learning concepts in description logics or, equivalently, class expressions in OWL. In this thesis, it is shown that methods from Inductive Logic Programming (ILP) are applicable to learning in description logic knowledge bases. The results provide foundations for the semi-automatic creation and maintenance of OWL ontologies, in particular in cases when extensional information (i.e. facts, instance data) is abundantly available, while corresponding intensional information (schema) is missing or not expressive enough to allow powerful reasoning over the ontology in a useful way. Such situations often occur when extracting knowledge from different sources, e.g. databases, or in collaborative knowledge engineering scenarios, e.g. using semantic wikis. It can be argued that being able to learn OWL class expressions is a step towards enriching OWL knowledge bases in order to enable powerful reasoning, consistency checking, and improved querying possibilities. In particular, plugins for OWL ontology editors based on learning methods are developed and evaluated in this work. The developed algorithms are not restricted to ontology engineering and can handle other learning problems. Indeed, they lend themselves to generic use in machine learning in the same way as ILP systems do. The main difference, however, is the employed knowledge representation paradigm: ILP traditionally uses logic programs for knowledge representation, whereas this work rests on description logics and OWL. This difference is crucial when considering Semantic Web applications as target use cases, as such applications hinge centrally on the chosen knowledge representation format for knowledge interchange and integration. The work in this thesis can be understood as a broadening of the scope of research and applications of ILP methods. This goal is particularly important since the number of OWL-based systems is already increasing rapidly and can be expected to grow further in the future. The thesis starts by establishing the necessary theoretical basis and continues with the specification of algorithms. It also contains their evaluation and, finally, presents a number of application scenarios. The research contributions of this work are threefold: The first contribution is a complete analysis of desirable properties of refinement operators in description logics. Refinement operators are used to traverse the target search space and are, therefore, a crucial element in many learning algorithms. Their properties (completeness, weak completeness, properness, redundancy, infinity, minimality) indicate whether a refinement operator is suitable for being employed in a learning algorithm. The key research question is which of those properties can be combined. It is shown that there is no ideal, i.e. complete, proper, and finite, refinement operator for expressive description logics, which indicates that learning in description logics is a challenging machine learning task. A number of other new results for different property combinations are also proven. The need for these investigations has already been expressed in several articles prior to this PhD work. The theoretical limitations, which were shown as a result of these investigations, provide clear criteria for the design of refinement operators. In the analysis, as few assumptions as possible were made regarding the used description language. The second contribution is the development of two refinement operators. The first operator supports a wide range of concept constructors and it is shown that it is complete and can be extended to a proper operator. It is the most expressive operator designed for a description language so far. The second operator uses the light-weight language EL and is weakly complete, proper, and finite. It is straightforward to extend it to an ideal operator, if required. It is the first published ideal refinement operator in description logics. While the two operators differ a lot in their technical details, they both use background knowledge efficiently. The third contribution is the actual learning algorithms using the introduced operators. New redundancy elimination and infinity-handling techniques are introduced in these algorithms. According to the evaluation, the algorithms produce very readable solutions, while their accuracy is competitive with the state-of-the-art in machine learning. Several optimisations for achieving scalability of the introduced algorithms are described, including a knowledge base fragment selection approach, a dedicated reasoning procedure, and a stochastic coverage computation approach. The research contributions are evaluated on benchmark problems and in use cases. Standard statistical measurements such as cross validation and significance tests show that the approaches are very competitive. Furthermore, the ontology engineering case study provides evidence that the described algorithms can solve the target problems in practice. A major outcome of the doctoral work is the DL-Learner framework. It provides the source code for all algorithms and examples as open-source and has been incorporated in other projects. Machine Learning OWL Description Logics Inductive Logic Programming Supervised Learning Semantic Web RDF SPARQL symbolic learning refinement operators Maschinelles Lernen OWL Beschreibungslogiken überwachtes Lernen Semantisches Web RDF SPARQL symbolisches Lernen Refinementoperatoren ddc:000
33	Ontoilper: an ontology- and inductive logic programming-based method to extract instances of entities and relations from texts Lima, Rinaldo José de, Freitas, Frederico Luiz Gonçalves de 31 January 2014 (has links) Submitted by Nayara Passos (nayara.passos@ufpe.br) on 2015-03-13T12:33:46Z No. of bitstreams: 2 TESE Rinaldo José de Lima.pdf: 8678943 bytes, checksum: e88c290e414329ee00d2d6a35a466de0 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Approved for entry into archive by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-13T13:16:54Z (GMT) No. of bitstreams: 2 TESE Rinaldo José de Lima.pdf: 8678943 bytes, checksum: e88c290e414329ee00d2d6a35a466de0 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-13T13:16:54Z (GMT). No. of bitstreams: 2 TESE Rinaldo José de Lima.pdf: 8678943 bytes, checksum: e88c290e414329ee00d2d6a35a466de0 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2014 / CNPq, CAPES. / Information Extraction (IE) consists in the task of discovering and structuring information found in a semi-structured or unstructured textual corpus. Named Entity Recognition (NER) and Relation Extraction (RE) are two important subtasks in IE. The former aims at finding named entities, including the name of people, locations, among others, whereas the latter consists in detecting and characterizing relations involving such named entities in text. Since the approach of manually creating extraction rules for performing NER and RE is an intensive and time-consuming task, researchers have turned their attention to how machine learning techniques can be applied to IE in order to make IE systems more adaptive to domain changes. As a result, a myriad of state-of-the-art methods for NER and RE relying on statistical machine learning techniques have been proposed in the literature. Such systems typically use a propositional hypothesis space for representing examples, i.e., an attribute-value representation. In machine learning, the propositional representation of examples presents some limitations, particularly in the extraction of binary relations, which mainly demands not only contextual and relational information about the involving instances, but also more expressive semantic resources as background knowledge. This thesis attempts to mitigate the aforementioned limitations based on the hypothesis that, to be efficient and more adaptable to domain changes, an IE system should exploit ontologies and semantic resources in a framework for IE that enables the automatic induction of extraction rules by employing machine learning techniques. In this context, this thesis proposes a supervised method to extract both entity and relation instances from textual corpora based on Inductive Logic Programming, a symbolic machine learning technique. The proposed method, called OntoILPER, benefits not only from ontologies and semantic resources, but also relies on a highly expressive relational hypothesis space, in the form of logical predicates, for representing examples whose structure is relevant to the information extraction task. OntoILPER automatically induces symbolic extraction rules that subsume examples of entity and relation instances from a tailored graph-based model of sentence representation, another contribution of this thesis. Moreover, this graph-based model for representing sentences also enables the exploitation of domain ontologies and additional background knowledge in the form of a condensed set of features including lexical, syntactic, semantic, and relational ones. Differently from most of the IE methods (a comprehensive survey is presented in this thesis, including the ones that also apply ILP), OntoILPER takes advantage of a rich text preprocessing stage which encompasses various shallow and deep natural language processing subtasks, including dependency parsing, coreference resolution, word sense disambiguation, and semantic role labeling. Further mappings of nouns and verbs to (formal) semantic resources are also considered. OntoILPER Framework, the OntoILPER implementation, was experimentally evaluated on both NER and RE tasks. This thesis reports the results of several assessments conducted using six standard evaluationcorpora from two distinct domains: news and biomedical. The obtained results demonstrated the effectiveness of OntoILPER on both NER and RE tasks. Actually, the proposed framework outperforms some of the state-of-the-art IE systems compared in this thesis. / A área de Extração de Informação (IE) visa descobrir e estruturar informações dispostas em documentos semi-estruturados ou desestruturados. O Reconhecimento de Entidades Nomeadas (REN) e a Extração de Relações (ER) são duas subtarefas importantes em EI. A primeira visa encontrar entidades nomeadas, incluindo nome de pessoas e lugares, entre outros; enquanto que a segunda, consiste na detecção e caracterização de relações que envolvem as entidades nomeadas presentes no texto. Como a tarefa de criar manualmente as regras de extração para realizar REN e ER é muito trabalhosa e onerosa, pesquisadores têm voltado suas atenções na investigação de como as técnicas de aprendizado de máquina podem ser aplicadas à EI a fim de tornar os sistemas de ER mais adaptáveis às mudanças de domínios. Como resultado, muitos métodos do estado-da-arte em REN e ER, baseados em técnicas estatísticas de aprendizado de máquina, têm sido propostos na literatura. Tais sistemas normalmente empregam um espaço de hipóteses com expressividade propositional para representar os exemplos, ou seja, eles são baseado na tradicional representação atributo-valor. Em aprendizado de máquina, a representação proposicional apresenta algums fatores limitantes, principalmente na extração de relações binárias que exigem não somente informações contextuais e estruturais (relacionais) sobre as instâncias, mas também outras formas de como adicionar conhecimento prévio do problema durante o processo de aprendizado. Esta tese visa atenuar as limitações acima mencionadas, tendo como hipótese de trabalho que, para ser eficiente e mais facilmente adaptável às mudanças de domínio, os sistemas de EI devem explorar ontologias e recursos semânticos no contexto de um arcabouço para EI que permita a indução automática de regras de extração de informação através do emprego de técnicas de aprendizado de máquina. Neste contexto, a presente tese propõe um método supervisionado capaz de extrair instâncias de entidades (ou classes de ontologias) e de relações a partir de textos apoiando-se na Programação em Lógica Indutiva (PLI), uma técnica de aprendizado de máquina supervisionada capaz de induzir regras simbólicas de classificação. O método proposto, chamado OntoILPER, não só se beneficia de ontologias e recursos semânticos, mas também se baseia em um expressivo espaço de hipóteses, sob a forma de predicados lógicos, capaz de representar exemplos cuja estrutura é relevante para a tarefa de EI consideradas nesta tese. OntoILPER automaticamente induz regras simbólicas para classificar exemplos de instâncias de entidades e relações a partir de um modelo de representação de frases baseado em grafos. Tal modelo de representação é uma das constribuições desta tese. Além disso, o modelo baseado em grafos para representação de frases e exemplos (instâncias de classes e relações) favorece a integração de conhecimento prévio do problema na forma de um conjunto reduzido de atributos léxicos, sintáticos, semânticos e estruturais. Diferentemente da maioria dos métodos de EI (uma pesquisa abrangente é apresentada nesta tese, incluindo aqueles que também se aplicam a PLI), OntoILPER faz uso de várias subtarefas do Processamento de Linguagem Named entity recognition Relation extraction Ontology population Ontologybased information extraction Inductive logic programming Reconhecimento de entidades nomeadas Extração de relação Povoamento de ontologias Programação em lógica indutiva
34	Apprentissage de connaissances structurelles à partir d’images satellitaires et de données exogènes pour la cartographie dynamique de l’environnement amazonien / Structurel Knowledge learning from satellite images and exogenous data for dynamic mapping of the amazonian environment Bayoudh, Meriam 06 December 2013 (has links) Les méthodes classiques d'analyse d'images satellites sont inadaptées au volume actuel du flux de données. L'automatisation de l'interprétation de ces images devient donc cruciale pour l'analyse et la gestion des phénomènes observables par satellite et évoluant dans le temps et l'espace. Ce travail vise à automatiser la cartographie dynamique de l'occupation du sol à partir d'images satellites, par des mécanismes expressifs, facilement interprétables en prenant en compte les aspects structurels de l'information géographique. Il s'inscrit dans le cadre de l'analyse d'images basée objet. Ainsi, un paramétrage supervisé d'un algorithme de segmentation d'images est proposé. Dans un deuxième temps, une méthode de classification supervisée d'objets géographiques est présentée combinant apprentissage automatique par programmation logique inductive et classement par l'approche multi-class rule set intersection. Ces approches sont appliquées à la cartographie de la bande côtière Guyanaise. Les résultats démontrent la faisabilité du paramétrage de la segmentation, mais également sa variabilité en fonction des classes de la carte de référence et des données d'entrée. Les résultats de la classification supervisée montrent qu'il est possible d'induire des règles de classification expressives, véhiculant des informations cohérentes et structurelles dans un contexte applicatif donnée et conduisant à des valeurs satisfaisantes de précision et de KAPPA (respectivement 84,6% et 0,7). Ce travail de thèse contribue ainsi à l'automatisation de la cartographie dynamique à partir d'images de télédétection et propose des perspectives originales et prometteuses. / Classical methods for satellite image analysis are inadequate for the current bulky data flow. Thus, automate the interpretation of such images becomes crucial for the analysis and management of phenomena changing in time and space, observable by satellite. Thus, this work aims at automating land cover cartography from satellite images, by expressive and easily interpretable mechanism, and by explicitly taking into account structural aspects of geographic information. It is part of the object-based image analysis framework, and assumes that it is possible to extract useful contextual knowledge from maps. Thus, a supervised parameterization methods of a segmentation algorithm is proposed. Secondly, a supervised classification of geographical objects is presented. It combines machine learning by inductive logic programming and the multi-class rule set intersection approach. These approaches are applied to the French Guiana coastline cartography. The results demonstrate the feasibility of the segmentation parameterization, but also its variability as a function of the reference map classes and of the input data. Yet, methodological developments allow to consider an operational implementation of such an approach. The results of the object supervised classification show that it is possible to induce expressive classification rules that convey consistent and structural information in a given application context and lead to reliable predictions, with overall accuracy and Kappa values equal to, respectively, 84,6% and 0,7. In conclusion, this work contributes to the automation of the dynamic cartography from remotely sensed images and proposes original and promising perpectives Télédétection Analyse d'image basée objet Segmentation Classification supervisée Apprentissage automatique Programmation logique inductive Cartes d'occupation Usage du sol Guyane française Remote sensing Object-based iage analysis Segmentation Supervised classification Machine learning Inductive logic programming Land cover Use maps French Guiana
35	Les systèmes cognitifs dans les réseaux autonomes : une méthode d'apprentissage distribué et collaboratif situé dans le plan de connaissance pour l'auto-adaptation / Cognitive systems in automatic networks : a distributed and collaborative learning method in knoledge plane for self-adapting function Mbaye, Maïssa 17 December 2009 (has links) L'un des défis majeurs pour les décennies à venir, dans le domaine des technologies de l'information et de la communication, est la réalisation du concept des réseaux autonomes. Ce paradigme a pour objectif de rendre les équipements réseaux capables de s'autogérer, c'est-à-dire qu'ils pourront s'auto-configurer, s'auto-optimiser, s'auto-protéger et s'auto-restaurer en respectant les objectifs de haut niveau de leurs concepteurs. Les architectures majeures de réseaux autonomes se basent principalement sur la notion de boucle de contrôle fermée permettant l'auto-adaptation (auto-configuration et auto-optimisation) de l'équipement réseau en fonction des événements qui surviennent sur leur environnement. Le plan de connaissance est une des approches, très mise en avant ces dernières années par le monde de la recherche, qui suggère l'utilisation des systèmes cognitifs (l'apprentissage et le raisonnement) pour fermer la boucle de contrôle. Cependant, bien que les architectures majeures de gestion autonomes intègrent des modules d'apprentissage sous forme de boite noire, peu de recherches s'intéressent véritablement au contenu de ces boites. C'est dans ce cadre que nous avons fait une étude sur l'apport potentiel de l'apprentissage et proposé une méthode d'apprentissage distribué et collaboratif. Nous proposons une formalisation du problème d'auto-adaptation sous forme d'un problème d'apprentissage d'état-actions. Cette formalisation nous permet de définir un apprentissage de stratégies d'auto-adaptation qui se base sur l'utilisation de l'historique des transitions et utilise la programmation logique inductive pour découvrir de nouvelles stratégies à partir de celles déjà découvertes. Nous définissons, aussi un algorithme de partage de la connaissance qui permet d'accélérer le processus d'apprentissage. Enfin, nous avons testé l'approche proposé dans le cadre d'un réseau DiffServ et montré sa transposition sur le contexte du transport de flux multimédia dans les réseaux sans-fil 802.11. / One of the major challenges for decades to come, in the field of information technologies and the communication, is realization of autonomic paradigm. It aims to enable network equipments to self-manage, enable them to self-configure, self-optimize, self-protect and self-heal according to high-level objectives of their designers. Major architectures of autonomic networking are based on closed control loop allowing self-adapting (self-configuring and self-optimizing) of the network equipment according to the events which arise on their environment. Knowledge plane is one approach, very emphasis these last years by researchers, which suggests the use of the cognitive systems (machine learning and the reasoning) to realize closed control loop. However, although the major autonomic architectures integrate machine learning modules as functional block, few researches are really interested in the contents of these blocks. It is in this context that we made a study on the potential contribution machine learning and proposed a method of distributed and collaborative machine learning. We propose a formalization self-adapting problem in term of learning configuration strategies (state-actions) problem. This formalization allows us to define a strategies machine learning method for self-adapting which is based on the history observed transitions and uses inductive logic programming to discover new strategies from those already discovered. We defined, also a knowledge sharing algorithm which makes network components collaborate to improve learning process. Finally, we tested our approach in DiffServ context and showed its transposition on multimedia streaming in 802.11 wireless networks. Réseaux autonomes Auto-adaptation Plan de connaissance Systèmes cognitifs Apprentissage artificiel Apprentissage distribué Apprentissage collaboratif Programmation logique inductive Distibuted machine learning Autonomic networking Self-adaptating Knowledge plane Cognitive systems Machine learning Collaborative machine learning Inductive logic programming
36	Learning OWL Class Expressions Lehmann, Jens 09 June 2010 (has links) With the advent of the Semantic Web and Semantic Technologies, ontologies have become one of the most prominent paradigms for knowledge representation and reasoning. The popular ontology language OWL, based on description logics, became a W3C recommendation in 2004 and a standard for modelling ontologies on the Web. In the meantime, many studies and applications using OWL have been reported in research and industrial environments, many of which go beyond Internet usage and employ the power of ontological modelling in other fields such as biology, medicine, software engineering, knowledge management, and cognitive systems. However, recent progress in the field faces a lack of well-structured ontologies with large amounts of instance data due to the fact that engineering such ontologies requires a considerable investment of resources. Nowadays, knowledge bases often provide large volumes of data without sophisticated schemata. Hence, methods for automated schema acquisition and maintenance are sought. Schema acquisition is closely related to solving typical classification problems in machine learning, e.g. the detection of chemical compounds causing cancer. In this work, we investigate both, the underlying machine learning techniques and their application to knowledge acquisition in the Semantic Web. In order to leverage machine-learning approaches for solving these tasks, it is required to develop methods and tools for learning concepts in description logics or, equivalently, class expressions in OWL. In this thesis, it is shown that methods from Inductive Logic Programming (ILP) are applicable to learning in description logic knowledge bases. The results provide foundations for the semi-automatic creation and maintenance of OWL ontologies, in particular in cases when extensional information (i.e. facts, instance data) is abundantly available, while corresponding intensional information (schema) is missing or not expressive enough to allow powerful reasoning over the ontology in a useful way. Such situations often occur when extracting knowledge from different sources, e.g. databases, or in collaborative knowledge engineering scenarios, e.g. using semantic wikis. It can be argued that being able to learn OWL class expressions is a step towards enriching OWL knowledge bases in order to enable powerful reasoning, consistency checking, and improved querying possibilities. In particular, plugins for OWL ontology editors based on learning methods are developed and evaluated in this work. The developed algorithms are not restricted to ontology engineering and can handle other learning problems. Indeed, they lend themselves to generic use in machine learning in the same way as ILP systems do. The main difference, however, is the employed knowledge representation paradigm: ILP traditionally uses logic programs for knowledge representation, whereas this work rests on description logics and OWL. This difference is crucial when considering Semantic Web applications as target use cases, as such applications hinge centrally on the chosen knowledge representation format for knowledge interchange and integration. The work in this thesis can be understood as a broadening of the scope of research and applications of ILP methods. This goal is particularly important since the number of OWL-based systems is already increasing rapidly and can be expected to grow further in the future. The thesis starts by establishing the necessary theoretical basis and continues with the specification of algorithms. It also contains their evaluation and, finally, presents a number of application scenarios. The research contributions of this work are threefold: The first contribution is a complete analysis of desirable properties of refinement operators in description logics. Refinement operators are used to traverse the target search space and are, therefore, a crucial element in many learning algorithms. Their properties (completeness, weak completeness, properness, redundancy, infinity, minimality) indicate whether a refinement operator is suitable for being employed in a learning algorithm. The key research question is which of those properties can be combined. It is shown that there is no ideal, i.e. complete, proper, and finite, refinement operator for expressive description logics, which indicates that learning in description logics is a challenging machine learning task. A number of other new results for different property combinations are also proven. The need for these investigations has already been expressed in several articles prior to this PhD work. The theoretical limitations, which were shown as a result of these investigations, provide clear criteria for the design of refinement operators. In the analysis, as few assumptions as possible were made regarding the used description language. The second contribution is the development of two refinement operators. The first operator supports a wide range of concept constructors and it is shown that it is complete and can be extended to a proper operator. It is the most expressive operator designed for a description language so far. The second operator uses the light-weight language EL and is weakly complete, proper, and finite. It is straightforward to extend it to an ideal operator, if required. It is the first published ideal refinement operator in description logics. While the two operators differ a lot in their technical details, they both use background knowledge efficiently. The third contribution is the actual learning algorithms using the introduced operators. New redundancy elimination and infinity-handling techniques are introduced in these algorithms. According to the evaluation, the algorithms produce very readable solutions, while their accuracy is competitive with the state-of-the-art in machine learning. Several optimisations for achieving scalability of the introduced algorithms are described, including a knowledge base fragment selection approach, a dedicated reasoning procedure, and a stochastic coverage computation approach. The research contributions are evaluated on benchmark problems and in use cases. Standard statistical measurements such as cross validation and significance tests show that the approaches are very competitive. Furthermore, the ontology engineering case study provides evidence that the described algorithms can solve the target problems in practice. A major outcome of the doctoral work is the DL-Learner framework. It provides the source code for all algorithms and examples as open-source and has been incorporated in other projects. info:eu-repo/classification/ddc/000 ddc:000
37	Representation of Compositional Relational Programs Paçacı, Görkem January 2017 (has links) Usability aspects of programming languages are often overlooked, yet have a substantial effect on programmer productivity. These issues are even more acute in the field of Inductive Synthesis, where programs are automatically generated from sample expected input and output data, and the programmer needs to be able to comprehend, and confirm or reject the suggested programs. A promising method of Inductive Synthesis, CombInduce, which is particularly suitable for synthesizing recursive programs, is a candidate for improvements in usability as the target language Combilog is not user-friendly. The method requires the target language to be strictly compositional, hence devoid of variables, yet have the expressiveness of definite clause programs. This sets up a challenging problem for establishing a user-friendly but equally expressive target language. Alternatives to Combilog, such as Quine's Predicate-functor Logic and Schönfinkel and Curry's Combinatory Logic also do not offer a practical notation: finding a more usable representation is imperative. This thesis presents two distinct approaches towards more convenient representations which still maintain compositionality. The first is Visual Combilog (VC), a system for visualizing Combilog programs. In this approach Combilog remains as the target language for synthesis, but programs can be read and modified by interacting with the equivalent diagrams instead. VC is implemented as a split-view editor that maintains the equivalent Combilog and VC representations on-the-fly, automatically transforming them as necessary. The second approach is Combilog with Name Projection (CNP), a textual iteration of Combilog that replaces numeric argument positions with argument names. The result is a language where argument names make the notation more readable, yet compositionality is preserved by avoiding variables. Compositionality is demonstrated by implementing CombInduce with CNP as the target language, revealing that programs with the same level of recursive complexity can be synthesized in CNP equally well, and establishing the underlying method of synthesis can also work with CNP. Our evaluations of the user-friendliness of both representations are supported by a range of methods from Information Visualization, Cognitive Modelling, and Human-Computer Interaction. The increased usability of both representations are confirmed by empirical user studies: an often neglected aspect of language design. Programming Syntax Logic Programming Combilog CombInduce Prolog Variable-free Point-free Tacit Compositional Relational Programming Combinatory Logic Predicate-Functor Logic Program Synthesis Meta-interpreters Meta-interpretative Synthesis Decompositional Synthesis Inductive Synthesis Inductive Logic Programming Usability Cognitive Dimensions of Notations Visual Variables Usability testing Programming Language usability Empirical evidence Information Systems Computer and Information Science Data- och informationsvetenskap

Page generated in 0.0877 seconds