Global ETD Search

1	Mining Biomedical Data for Hidden Relationship Discovery Dharmavaram, Sirisha 08 1900 (has links) With an ever-growing number of publications in the biomedical domain, it becomes likely that important implicit connections between individual concepts of biomedical knowledge are overlooked. Literature based discovery (LBD) is in practice for many years to identify plausible associations between previously unrelated concepts. In this paper, we present a new, completely automatic and interactive system that creates a graph-based knowledge base to capture multifaceted complex associations among biomedical concepts. For a given pair of input concepts, our system auto-generates a list of ranked subgraphs uncovering possible previously unnoticed associations based on context information. To rank these subgraphs, we implement a novel ranking method using the context information obtained by performing random walks on the graph. In addition, we enhance the system by training a Neural Network Classifier to output the likelihood of the two concepts being likely related, which provides better insights to the end user.
2	Semantic text classification for cancer text mining Baker, Simon January 2018 (has links) Cancer researchers and oncologists benefit greatly from text mining major knowledge sources in biomedicine such as PubMed. Fundamentally, text mining depends on accurate text classification. In conventional natural language processing (NLP), this requires experts to annotate scientific text, which is costly and time consuming, resulting in small labelled datasets. This leads to extensive feature engineering and handcrafting in order to fully utilise small labelled datasets, which is again time consuming, and not portable between tasks and domains. In this work, we explore emerging neural network methods to reduce the burden of feature engineering while outperforming the accuracy of conventional pipeline NLP techniques. We focus specifically on the cancer domain in terms of applications, where we introduce two NLP classification tasks and datasets: the first task is that of semantic text classification according to the Hallmarks of Cancer (HoC), which enables text mining of scientific literature assisted by a taxonomy that explains the processes by which cancer starts and spreads in the body. The second task is that of the exposure routes of chemicals into the body that may lead to exposure to carcinogens. We present several novel contributions. We introduce two new semantic classification tasks (the hallmarks, and exposure routes) at both sentence and document levels along with accompanying datasets, and implement and investigate a conventional pipeline NLP classification approach for both tasks, performing both intrinsic and extrinsic evaluation. We propose a new approach to classification using multilevel embeddings and apply this approach to several tasks; we subsequently apply deep learning methods to the task of hallmark classification and evaluate its outcome. Utilising our text classification methods, we develop and two novel text mining tools targeting real-world cancer researchers. The first tool is a cancer hallmark text mining tool that identifies association between a search query and cancer hallmarks; the second tool is a new literature-based discovery (LBD) system designed for the cancer domain. We evaluate both tools with end users (cancer researchers) and find they demonstrate good accuracy and promising potential for cancer research.
3	Stepping Stones and Pathways:Improving Retrieval by Chains of Relationships between Documents Das Neves, Fernando Adrian 08 December 2004 (has links) The information retrieval (IR) field has been successful in developing techniques to address many types of information needs. However, there are cases in which traditional approaches to IR are not able to produce adequate results. Examples include: when a small set of (2-3) documents is needed as an answer rather than a single document, or when "query splitting" is required to satisfactorily explore the document space. We explore an alternative model of building and presenting retrieval results for such cases. In particular, we research effective methods for handling information needs that may: 1. Include multiple topics: A typical query is interpreted by current IR systems as a request to retrieve documents that each discusses all topics included in that query. We propose an alternative interpretation based on query splitting. It allows queries to be interpreted as requests to retrieve sets of documents rather than individual documents, with meaningful relationships among the members of each such set. 2. Be interpreted as parts in a chain of relationships: Suppose a query concerns topics t1 and tm. Is there a relation between topics t1 and tm that involves t2 and possibly other topics as in {t1, t2, â ¦ tm}? Thus, we propose an alternative interpretation of user queries and presentation of the results. Our interpretation has the potential to improve retrieval results whenever there is a mismatch between the user's understanding of the collection and the actual collection content. We define and refine a retrieval scheme that enhances retrieval through a framework that combines multiple sources of evidence. Query results in our interpretation are networks of document groups representing topics, each group relating to and connecting to other groups in the network that partially answer the user's information need. We devise new and more effective representations and techniques to visualize results, and incorporate the user as part of the retrieval process. We also evaluate the improvement of the query results based on multiple measures. In particular, we verify the validity of our approach through a study involving a collection of Operating Systems research papers that was specially built for this dissertation. / Ph. D. Information retrieval Literature-based discovery Combination of sources of evidence Indexing of scientific literature
4	Indirect Relatedness, Evaluation, and Visualization for Literature Based Discovery Henry, Sam 01 January 2019 (has links) The exponential growth of scientific literature is creating an increased need for systems to process and assimilate knowledge contained within text. Literature Based Discovery (LBD) is a well established field that seeks to synthesize new knowledge from existing literature, but it has remained primarily in the theoretical realm rather than in real-world application. This lack of real-world adoption is due in part to the difficulty of LBD, but also due to several solvable problems present in LBD today. Of these problems, the ones in most critical need of improvement are: (1) the over-generation of knowledge by LBD systems, (2) a lack of meaningful evaluation standards, and (3) the difficulty interpreting LBD output. We address each of these problems by: (1) developing indirect relatedness measures for ranking and filtering LBD hypotheses; (2) developing a representative evaluation dataset and applying meaningful evaluation methods to individual components of LBD; (3) developing an interactive visualization system that allows a user to explore LBD output in its entirety. In addressing these problems, we make several contributions, most importantly: (1) state of the art results for estimating direct semantic relatedness, (2) development of set association measures, (3) development of indirect association measures, (4) development of a standard LBD evaluation dataset, (5) division of LBD into discrete components with well defined evaluation methods, (6) development of automatic functional group discovery, and (7) integration of indirect relatedness measures and automatic functional group discovery into a comprehensive LBD visualization system. Our results inform future development of LBD systems, and contribute to creating more effective LBD systems. Literature Based Discovery Semantic Association Semantic Relatedness Natural Language Processing Data Mining Text Processing Text Mining Other Computer Sciences
5	New Computational Methods for Literature-Based Discovery Ding, Juncheng 05 1900 (has links) In this work, we leverage the recent developments in computer science to address several of the challenges in current literature-based discovery (LBD) solutions. First, LBD solutions cannot use semantics or are too computational complex. To solve the problems we propose a generative model OverlapLDA based on topic modeling, which has been shown both effective and efficient in extracting semantics from a corpus. We also introduce an inference method of OverlapLDA. We conduct extensive experiments to show the effectiveness and efficiency of OverlapLDA in LBD. Second, we expand LBD to a more complex and realistic setting. The settings are that there can be more than one concept connecting the input concepts, and the connectivity pattern between concepts can also be more complex than a chain. Current LBD solutions can hardly complete the LBD task in the new setting. We simplify the hypotheses as concept sets and propose LBDSetNet based on graph neural networks to solve this problem. We also introduce different training schemes based on self-supervised learning to train LBDSetNet without relying on comprehensive labeled hypotheses that are extremely costly to get. Our comprehensive experiments show that LBDSetNet outperforms strong baselines on simple hypotheses and addresses complex hypotheses. Literature-Based Discovery Text Mining Data Mining Topic Modeling Graph Neural Network Self-Supervised Learning Computer Science
6	Citationally Enhanced Semantic Literature Based Discovery Fleig, John David 01 January 2019 (has links) We are living within the age of information. The ever increasing flow of data and publications poses a monumental bottleneck to scientific progress as despite the amazing abilities of the human mind, it is woefully inadequate in processing such a vast quantity of multidimensional information. The small bits of flotsam and jetsam that we leverage belies the amount of useful information beneath the surface. It is imperative that automated tools exist to better search, retrieve, and summarize this content. Combinations of document indexing and search engines can quickly find you a document whose content best matches your query - if the information is all contained within a single document. But it doesn’t draw connections, make hypotheses, or find knowledge hidden across multiple documents. Literature-based discovery is an approach that can uncover hidden interrelationships between topics by extracting information from existing published scientific literature. The proposed study utilizes a semantic-based approach that builds a graph of related concepts between two user specified sets of topics using semantic predications. In addition, the study includes properties of bibliographically related documents and statistical properties of concepts to further enhance the quality of the proposed intermediate terms. Our results show an improvement in precision-recall when incorporating citations. data mining graph literature-based discovery (LBD) predication semantic Bioinformatics Computer Sciences Library and Information Science Life Sciences Social and Behavioral Sciences
7	A Context-Driven Subgraph Model for Literature-Based Discovery Cameron, Delroy Huborn 18 December 2014 (has links) No description available. Computer Science Biomedical Research Information Systems Semantic Predications Graph mining Path clustering Semantic relatedness Literature-based discovery
8	Finding conflicting statements in the biomedical literature Sarafraz, Farzaneh January 2012 (has links) The main archive of life sciences literature currently contains more than 18,000,000 references, and it is virtually impossible for any human to stay up-to-date with this large number of papers, even in a specific sub-domain. Not every fact that is reported in the literature is novel and distinct. Scientists report repeat experiments, or refer to previous findings. Given the large number of publications, it is not surprising that information on certain topics is repeated over a number of publications. From consensus to contradiction, there are all shades of agreement between the claimed facts in the literature, and considering the volume of the corpus, conflicting findings are not unlikely. Finding such claims is particularly interesting for scientists, as they can present opportunities for knowledge consolidation and future investigations. In this thesis we present a method to extract and contextualise statements about molecular events as expressed in the biomedical literature, and to find those that potentially conflict each other. The approach uses a system that detects event negations and speculation, and combines those with contextual features (e.g. type of event, species, and anatomical location) to build a representational model for establishing relations between different biological events, including relations concerning conflicts. In the detection of negations and speculations, rich lexical, syntactic, and semantic features have been exploited, including the syntactic command relation. Different parts of the proposed method have been evaluated in a context of the BioNLP 09 challenge. The average F-measures for event negation and speculation detection were 63% (with precision of 88%) and 48% (with precision of 64%) respectively. An analysis of a set of 50 extracted event pairs identified as potentially conflicting revealed that 32 of them showed some degree of conflict (64%); 10 event pairs (20%) needed a more complex biological interpretation to decide whether there was a conflict. We also provide an open source integrated text mining framework for extracting events and their context on a large-scale basis using a pipeline of tools that are available or have been developed as part of this research, along with 72,314 potentially conflicting molecular event pairs that have been generated by mining the entire body of accessible biomedical literature. We conclude that, whilst automated conflict mining would need more comprehensive context extraction, it is feasible to provide a support environment for biologists to browse potential conflicting statements and facilitate data and knowledge consolidation. 006.312

Search results