Global ETD Search

1	Learning Comprehensible Theories from Structured Data Ng, Kee Siong, kee.siong@rsise.anu.edu.au January 2005 (has links) This thesis is concerned with the problem of learning comprehensible theories from structured data and covers primarily classification and regression learning. The basic knowledge representation language is set around a polymorphically-typed, higher-order logic. The general setup is closely related to the learning from propositionalized knowledge and learning from interpretations settings in Inductive Logic Programming. Individuals (also called instances) are represented as terms in the logic. A grammar-like construct called a predicate rewrite system is used to define features in the form of predicates that individuals may or may not satisfy. For learning, decision-tree algorithms of various kinds are adopted.¶ The scope of the thesis spans both theory and practice. On the theoretical side, I study in this thesis¶ 1. the representational power of different function classes and relationships between them;¶ 2. the sample complexity of some commonly-used predicate classes, particularly those involving sets and multisets;¶ 3. the computational complexity of various optimization problems associated with learning and algorithms for solving them; and¶ 4. the (efficient) learnability of different function classes in the PAC and agnostic PAC models.¶ On the practical side, the usefulness of the learning system developed is demontrated with applications in two important domains: bioinformatics and intelligent agents. Specifically, the following are covered in this thesis:¶ 1. a solution to a benchmark multiple-instance learning problem and some useful lessons that can be drawn from it;¶ 2. a successful attempt on a knowledge discovery problem in predictive toxicology, one that can serve as another proof-of-concept that real chemical knowledge can be obtained using symbolic learning;¶ 3. a reworking of an exercise in relational reinforcement learning and some new insights and techniques we learned for this interesting problem; and¶ 4. a general approach for personalizing user agents that takes full advantage of symbolic learning. machine learning logic higher-order logic comprehensible theories structured data
2	Enhancing the Usability of Complex Structured Data by Supporting Keyword Searches January 2011 (has links) abstract: As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data. / Dissertation/Thesis / Ph.D. Computer Science 2011 Computer Science Database Keyword Search Structured Data Usability XML
3	BINDING HASH TECHNIQUE FOR XML QUERY OPTIMIZATION BRANT, MICHAEL J. 20 July 2006 (has links) No description available. XML Query Processing XML Query Optimization Semi-structured data XPath
4	An artefact to analyse unstructured document data stores / by André Romeo Botes Botes, André Romeo January 2014 (has links) Structured data stores have been the dominating technologies for the past few decades. Although dominating, structured data stores lack the functionality to handle the ‘Big Data’ phenomenon. A new technology has recently emerged which stores unstructured data and can handle the ‘Big Data’ phenomenon. This study describes the development of an artefact to aid in the analysis of NoSQL document data stores in terms of relational database model constructs. Design science research (DSR) is the methodology implemented in the study and it is used to assist in the understanding, design and development of the problem, artefact and solution. This study explores the existing literature on DSR, in addition to structured and unstructured data stores. The literature review formulates the descriptive and prescriptive knowledge used in the development of the artefact. The artefact is developed using a series of six activities derived from two DSR approaches. The problem domain is derived from the existing literature and a real application environment (RAE). The reviewed literature provided a general problem statement. A representative from NFM (the RAE) is interviewed for a situation analysis providing a specific problem statement. An objective is formulated for the development of the artefact and suggestions are made to address the problem domain, assisting the artefact’s objective. The artefact is designed and developed using the descriptive knowledge of structured and unstructured data stores, combined with prescriptive knowledge of algorithms, pseudo code, continuous design and object-oriented design. The artefact evolves through multiple design cycles into a final product that analyses document data stores in terms of relational database model constructs. The artefact is evaluated for acceptability and utility. This provides credibility and rigour to the research in the DSR paradigm. Acceptability is demonstrated through simulation and the utility is evaluated using a real application environment (RAE). A representative from NFM is interviewed for the evaluation of the artefact. Finally, the study is communicated by describing its findings, summarising the artefact and looking into future possibilities for research and application. / MSc (Computer Science), North-West University, Vaal Triangle Campus, 2014 Design science research Structured data stores Unstructured data stores Artefact NoSQL Big data
5	Proposta de uma ferramenta de anotação semântica para publicação de dados estruturados na Web Calegari, Newton Juniano 02 April 2016 (has links) Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2016-09-02T14:31:38Z No. of bitstreams: 1 Newton Juniano Calegari.pdf: 2853517 bytes, checksum: e1eda2a1325986c6284a5054d724a19f (MD5) / Made available in DSpace on 2016-09-02T14:31:38Z (GMT). No. of bitstreams: 1 Newton Juniano Calegari.pdf: 2853517 bytes, checksum: e1eda2a1325986c6284a5054d724a19f (MD5) Previous issue date: 2016-04-02 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Pontifícia Universidade Católica de São Paulo / The tool proposed in this research aims at bringing together the Semantic Web technologies and content publishers, this way enabling the latter to contribute to creating structured data and metadata about texts and information they may make available on the Web. The general goal is to investigate the technical feasibility of developing a semantic annotation tool that enables content publishers to contribute to the Semantic Web ecosystem. Based on (BERNERS-LEE et al., 2001; ALESSO; SMITH, 2006; RODRÍGUEZ-ROCHA et al., 2015; GUIZZARDI, 2005; ISOTANI; BITTENCOURT, 2015), the Semantic Web is presented according to its technological stack. Considering the importance of the ontologies and vocabularies used to create Semantic Web applications, the essential subjects of the conceptual modelling and the ontology language used on the Web are presented. In order to provide the necessary concepts to use semantic annotations, this dissertation presents both the way annotations are used (manual, semi-automatic, and automatic) as well as the way these annotations are integrated with resources available on the Web. The state-of-the-art chapter describes recent projects and related work on the use of Semantic Web within Web-content publishing context. The methodology adopted by this research is based on (SANTAELLA; VIEIRA, 2008; GIL, 2002), in compliance with the exploratory approach for research. This research presents the proposal and the architecture of the semantic annotation tool, which uses shared vocabulary in order to create structured data based on textual content. In conclusion, this dissertation addresses the possibilities of future work, both in terms of the implementation of the tool in a real use case as well as in new scientific research / A proposta apresentada nesta pesquisa busca aproximar as tecnologias de Web Semântica dos usuários publicadores de conteúdo na Web, permitindo que estes contribuam com a geração de dados estruturados e metadados sobre textos e informações que venham disponibilizar na Web. O objetivo geral deste trabalho é investigar a viabilidade técnica de desenvolvimento de uma ferramenta de anotação semântica que permita aos usuários publicadores de conteúdo contribuírem para o ecossistema de Web Semântica. Com suporte de (BERNERS-LEE et al., 2001; ALESSO; SMITH, 2006; RODRÍGUEZ-ROCHA et al., 2015; GUIZZARDI, 2005; ISOTANI; BITTENCOURT, 2015) apresenta-se o tópico de Web Semântica de acordo com a pilha tecnológica que mostra o conjunto de tecnologias proposto para a sua realização. Considerando a importância de ontologias e vocabulários para a construção de aplicações de Web Semântica, são apresentados então os tópicos fundamentais de modelagem conceitual e a linguagem de ontologias para Web. Para fornecer a base necessária para a utilização de anotações semânticas são apresentados, além da definição, os modos de uso de anotações (manual, semi-automático e automático) e as formas de integrar essas anotações com recursos disponíveis nas tecnologias da Web Semântica. O estado da arte contempla trabalhos e projetos recentes sobre o uso de Web Semântica no contexto de publicação de conteúdo na Web. A metodologia é baseada na proposta apresentada por SANTAELLA; VIEIRA (2008), seguindo uma abordagem exploratória para a condução da pesquisa. É apresentada a proposta e os componentes de uma ferramenta de anotação semântica que utiliza vocabulários compartilhados para geração de dados estruturados a partir de conteúdo textual. Concluindo o trabalho, são apresentadas as possibilidades futuras, tanto da implementação da ferramenta em um cenário real, atestando sua viabilidade técnica, quanto novos trabalhos encaminhados a partir desta pesquisa Web semântica Dados estruturados Anotação semântica Semantic web Structured data Semantic annotation CNPQ::ENGENHARIAS
6	Adaptive Semi-structured Information Extraction Arpteg, Anders January 2003 (has links) <p>The number of domains and tasks where information extraction tools can be used needs to be increased. One way to reach this goal is to construct user-driven information extraction systems where novice users are able to adapt them to new domains and tasks. To accomplish this goal, the systems need to become more intelligent and able to learn to extract information without need of expert skills or time-consuming work from the user.</p><p>The type of information extraction system that is in focus for this thesis is semistructural information extraction. The term semi-structural refers to documents that not only contain natural language text but also additional structural information. The typical application is information extraction from World Wide Web hypertext documents. By making effective use of not only the link structure but also the structural information within each such document, user-driven extraction systems with high performance can be built.</p><p>The extraction process contains several steps where different types of techniques are used. Examples of such types of techniques are those that take advantage of structural, pure syntactic, linguistic, and semantic information. The first step that is in focus for this thesis is the navigation step that takes advantage of the structural information. It is only one part of a complete extraction system, but it is an important part. The use of reinforcement learning algorithms for the navigation step can make the adaptation of the system to new tasks and domains more user-driven. The advantage of using reinforcement learning techniques is that the extraction agent can efficiently learn from its own experience without need for intensive user interactions.</p><p>An agent-oriented system was designed to evaluate the approach suggested in this thesis. Initial experiments showed that the training of the navigation step and the approach of the system was promising. However, additional components need to be included in the system before it becomes a fully-fledged user-driven system.</p> / Report code: LiU-Tek-Lic-2002:73. Information extraction Artificial intelligence Semi-structured data Reinforced learning Knowledge management Computer science Datavetenskap
7	Graph-based learning for information systems Li, Xin January 2009 (has links) The advance of information technologies (IT) makes it possible to collect a massive amount of data in business applications and information systems. The increasing data volumes require more effective knowledge discovery techniques to make the best use of the data. This dissertation focuses on knowledge discovery on graph-structured data, i.e., graph-based learning. Graph-structured data refers to data instances with relational information indicating their interactions in this study. Graph-structured data exist in a variety of application areas related to information systems, such as business intelligence, knowledge management, e-commerce, medical informatics, etc. Developing knowledge discovery techniques on graph-structured data is critical to decision making and the reuse of knowledge in business applications.In this dissertation, I propose a graph-based learning framework and identify four major knowledge discovery tasks using graph-structured data: topology description, node classification, link prediction, and community detection. I present a series of studies to illustrate the knowledge discovery tasks and propose solutions for these example applications. As to the topology description task, in Chapter 2 I examine the global characteristics of relations extracted from documents. Such relations are extracted using different information processing techniques and aggregated to different analytical unit levels. As to the node classification task, Chapter 3 and Chapter 4 study the patent classification problem and the gene function prediction problem, respectively. In Chapter 3, I model knowledge diffusion and evolution with patent citation networks for patent classification. In Chapter 4, I extend the context assumption in previous research and model context graphs in gene interaction networks for gene function prediction. As to the link prediction task, Chapter 5 presents an example application in recommendation systems. I frame the recommendation problem as link prediction on user-item interaction graphs, and propose capturing graph-related features to tackle this problem. Chapter 6 examines the community detection task in the context of online interactions. In this study, I propose to take advantage of the sentiments (agreements and disagreements) expressed in users' interactions to improve community detection effectiveness. All these examples show that the graph representation allows the graph structure and node/link information to be more effectively utilized in addressing the four knowledge discovery tasks.In general, the graph-based learning framework contributes to the domain of information systems by categorizing related knowledge discovery tasks, promoting the further use of the graph representation, and suggesting approaches for knowledge discovery on graph-structured data. In practice, the proposed graph-based learning framework can be used to develop a variety of IT artifacts that address critical problems in business applications. Data mining Graph-based learning Graph-structured data Information systems Knowledge discovery Knowledge managment
8	An artefact to analyse unstructured document data stores / by André Romeo Botes Botes, André Romeo January 2014 (has links) Structured data stores have been the dominating technologies for the past few decades. Although dominating, structured data stores lack the functionality to handle the ‘Big Data’ phenomenon. A new technology has recently emerged which stores unstructured data and can handle the ‘Big Data’ phenomenon. This study describes the development of an artefact to aid in the analysis of NoSQL document data stores in terms of relational database model constructs. Design science research (DSR) is the methodology implemented in the study and it is used to assist in the understanding, design and development of the problem, artefact and solution. This study explores the existing literature on DSR, in addition to structured and unstructured data stores. The literature review formulates the descriptive and prescriptive knowledge used in the development of the artefact. The artefact is developed using a series of six activities derived from two DSR approaches. The problem domain is derived from the existing literature and a real application environment (RAE). The reviewed literature provided a general problem statement. A representative from NFM (the RAE) is interviewed for a situation analysis providing a specific problem statement. An objective is formulated for the development of the artefact and suggestions are made to address the problem domain, assisting the artefact’s objective. The artefact is designed and developed using the descriptive knowledge of structured and unstructured data stores, combined with prescriptive knowledge of algorithms, pseudo code, continuous design and object-oriented design. The artefact evolves through multiple design cycles into a final product that analyses document data stores in terms of relational database model constructs. The artefact is evaluated for acceptability and utility. This provides credibility and rigour to the research in the DSR paradigm. Acceptability is demonstrated through simulation and the utility is evaluated using a real application environment (RAE). A representative from NFM is interviewed for the evaluation of the artefact. Finally, the study is communicated by describing its findings, summarising the artefact and looking into future possibilities for research and application. / MSc (Computer Science), North-West University, Vaal Triangle Campus, 2014 Design science research Structured data stores Unstructured data stores Artefact NoSQL Big data
9	TagLine: Information Extraction for Semi-Structured Text Elements In Medical Progress Notes Finch, Dezon K. 01 January 2012 (has links) Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in semi-structure text elements. A prototype system (TagLine) was developed as a method for extracting information from the semi-structured portions of text using machine learning. Features for the learning machine were suggested by prior work, as well as by examining the text, and selecting those attributes that help distinguish the various classes of text lines. The classes were derived empirically from the text and guided by an ontology developed by the Consortium for Health Informatics Research (CHIR), a nationwide research initiative focused on medical informatics. Decision trees and Levenshtein approximate string matching techniques were tested and compared on 5,055 unseen lines of text. The performance of the decision tree method was found to be superior to the fuzzy string match method on this task. Decision trees achieved an overall accuracy of 98.5 percent, while the string match method only achieved an accuracy of 87 percent. Overall, the results for line classification were very encouraging. The labels applied to the lines were used to evaluate TagLines' performance for identifying the semi-structures text elements, including tables, slots and fillers. Results for slots and fillers were impressive while the results for tables were also acceptable. Information Extraction Machine Learning Natural Language Processing Semi-structured data Computer Sciences Library and Information Science
10	Query Languages for Semi-structured Data Maksimovic, Gordana January 2003 (has links) Semi-structured data is defined as irregular data with structure that may change rapidly or unpredictably. An example of such data can be found inside the World-Wide Web. Since the data is irregular, the user may not know the complete structure of the database. Thus, querying such data becomes a difficult issue. In order to write meaningful queries on semi-structured data, there is a need for a query language that will support the features that are presented by this data. Standard query languages, such as SQL for relational databases and OQL for object databases, are too constraining for querying semi-structured data, because they require data to conform to a fixed schema before any data is stored into the database. This paper introduces Lorel, a query language developed particularly for querying semi-structured data. Furthermore, it investigates if the standardised query languages support any of the criteria presented for semi-structured data. The result is an evaluation of three query languages, SQL, OQL and Lorel against these criteria. Semi-structured data unstructured data data on the Web database management Lorel Computer Sciences Datavetenskap (datalogi)

Search results