Spelling suggestions: "subject:"discovery"" "subject:"rediscovery""
721 |
Augmenting Dynamic Query Expansion in Microblog TextsKhandpur, Rupinder P. 17 August 2018 (has links)
Dynamic query expansion is a method of automatically identifying terms relevant to a target domain based on an incomplete query input. With the explosive growth of online media, such tools are essential for efficient search result refining to track emerging themes in noisy, unstructured text streams. It's crucial for large-scale predictive analytics and decision-making, systems which use open source indicators to find meaningful information rapidly and accurately. The problems of information overload and semantic mismatch are systemic during the Information Retrieval (IR) tasks undertaken by such systems.
In this dissertation, we develop approaches to dynamic query expansion algorithms that can help improve the efficacy of such systems using only a small set of seed queries and requires no training or labeled samples. We primarily investigate four significant problems related to the retrieval and assessment of event-related information, viz. (1) How can we adapt the query expansion process to support rank-based analysis when tracking a fixed set of entities? A scalable framework is essential to allow relative assessment of emerging themes such as airport threats. (2) What visual knowledge discovery framework to adopt that can incorporate users' feedback back into the search result refinement process? A crucial step to efficiently integrate real-time `situational awareness' when monitoring specific themes using open source indicators. (3) How can we contextualize query expansions? We focus on capturing semantic relatedness between a query and reference text so that it can quickly adapt to different target domains. (4) How can we synchronously perform knowledge discovery and characterization (unstructured to structured) during the retrieval process? We mainly aim to model high-order, relational aspects of event-related information from microblog texts. / Ph. D. / Analysis of real-time, social media can provide critical insights into ongoing societal events. Where consequences and implications of specific events include monetary losses, threats to critical infrastructure and national security, disruptions to daily life, and a potential to cause loss of life and physical property. It is imperative for developing good ‘ground truth’ to develop adequate data-driven information systems, i.e., an authoritative record of events reported in the media cataloged alongside important dimensions. Availability of high-quality ground truth events can support various analytic efforts, e.g., identifying precursors of attacks, developing predictive indicators using surrogate data sources, and tracking the progression of events over space and time. A dynamic search result refinement is useful for expanding a general set of user queries into a more relevant collection. The challenges of information overload and misalignment of context between the user query and retrieved results can overwhelm both human and machine. In this dissertation, we focus our efforts on these specific challenges.
With the ever-increasing volume of user-generated data large-scale analysis is a tedious task. Our first focus is to develop a scalable model that dynamically tracks and ranks evolving topics as they appear in social media. Then to simplify the cognitive tasks involving sense-making of evolving themes, we take a visual approach to retrieve situationally critical and emergent information effectively. This visual analytics approach learns from user’s interactions during the exploratory process and then generates a better representation of the data. Thus, improving the situational understanding and usability of underlying data models. Such features are crucial for big-data based decision & support systems.
To make the event-focused retrieval process more robust, we developed a context-rich procedure that adds new relevant key terms to the user’s original query by utilizing the linguistic structures in text. This context-awareness allows the algorithm to retrieve those relevant characteristics that can help users to gain adequate information from social media about real-world events. Online social commentary about events is very informal and can be incomplete. However, to get the complete picture and adequately describe these events we develop an approach that models the underlying relatedness of information and iteratively extract meaning and denotations from event-related texts. We learn how to express the high-order relationships between events and entities and group them to identify those attributes that best explain the events the user is trying to uncover.
In all the augmentations we develop, our strategy is to allow only very minimal human supervision using just a small set of seed event triggers and requires no training or labeled samples. We show a comprehensive evaluation of these augmentations on real-world domains - threats on airports, cyber attacks, and protests. We also demonstrate their applicability as for real-time analysis that provides vital event characteristics, and contextually consistent information can be a beneficial aid for emergency responders.
|
722 |
Product Defect Discovery and Summarization from Online User ReviewsZhang, Xuan 29 October 2018 (has links)
Product defects concern various groups of people, such as customers, manufacturers, government officials, etc. Thus, defect-related knowledge and information are essential. In keeping with the growth of social media, online forums, and Internet commerce, people post a vast amount of feedback on products, which forms a good source for the automatic acquisition of knowledge about defects. However, considering the vast volume of online reviews, how to automatically identify critical product defects and summarize the related information from the huge number of user reviews is challenging, even when we target only the negative reviews. As a kind of opinion mining research, existing defect discovery methods mainly focus on how to classify the type of product issues, which is not enough for users. People expect to see defect information in multiple facets, such as product model, component, and symptom, which are necessary to understand the defects and quantify their influence. In addition, people are eager to seek problem resolutions once they spot defects. These challenges cannot be solved by existing aspect-oriented opinion mining models, which seldom consider the defect entities mentioned above. Furthermore, users also want to better capture the semantics of review text, and to summarize product defects more accurately in the form of natural language sentences. However, existing text summarization models including neural networks can hardly generalize to user review summarization due to the lack of labeled data.
In this research, we explore topic models and neural network models for product defect discovery and summarization from user reviews. Firstly, a generative Probabilistic Defect Model (PDM) is proposed, which models the generation process of user reviews from key defect entities including product Model, Component, Symptom, and Incident Date. Using the joint topics in these aspects, which are produced by PDM, people can discover defects which are represented by those entities. Secondly, we devise a Product Defect Latent Dirichlet Allocation (PDLDA) model, which describes how negative reviews are generated from defect elements like Component, Symptom, and Resolution. The interdependency between these entities is modeled by PDLDA as well. PDLDA answers not only what the defects look like, but also how to address them using the crowd wisdom hidden in user reviews. Finally, the problem of how to summarize user reviews more accurately, and better capture the semantics in them, is studied using deep neural networks, especially Hierarchical Encoder-Decoder Models.
For each of the research topics, comprehensive evaluations are conducted to justify the effectiveness and accuracy of the proposed models, on heterogeneous datasets. Further, on the theoretical side, this research contributes to the research stream on product defect discovery, opinion mining, probabilistic graphical models, and deep neural network models. Regarding impact, these techniques will benefit related users such as customers, manufacturers, and government officials. / Ph. D. / Product defects concern various groups of people, such as customers, manufacturers, and government officials. Thus, defect-related knowledge and information are essential. In keeping with the growth of social media, online forums, and Internet commerce, people post a vast amount of feedback on products, which forms a good source for the automatic acquisition of knowledge about defects. However, considering the vast volume of online reviews, how to automatically identify critical product defects and summarize the related information from the huge number of user reviews is challenging, even when we target only the negative reviews. People expect to see defect information in multiple facets, such as product model, component, and symptom, which are necessary to understand the defects and quantify their influence. In addition, people are eager to seek problem resolutions once they spot defects. Furthermore, users also want to better summarize product defects more accurately in the form of natural language sentences. These requirements cannot be satisfied by existing methods, which seldom consider the defect entities mentioned above, or hardly generalize to user review summarization. In this research, we develop novel Machine Learning (ML) algorithms for product defect discovery and summarization. Firstly, we study how to identify product defects and their related attributes, such as Product Model, Component, Symptom, and Incident Date. Secondly, we devise a novel algorithm, which can discover product defects and the related Component, Symptom, and Resolution, from online user reviews. This method tells not only what the defects look like, but also how to address them using the crowd wisdom hidden in user reviews. Finally, we address the problem of how to summarize user reviews in the form of natural language sentences using a paraphrase-style method. On the theoretical side, this research contributes to multiple research areas in Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning. Regarding impact, these techniques will benefit related users such as customers, manufacturers, and government officials.
|
723 |
Probing Orthologue and Isoform Specific Inhibition of Kinases using In Silico Strategies: Perspectives for Improved Drug DesignSharp, Amanda Kristine 18 May 2020 (has links)
Kinases are involved in a multitude of signaling pathways, such as cellular growth, proliferation, and apoptosis, and have been discovered to be important in numerous diseases including cancer, Alzheimer's disease, cardiovascular health, rheumatoid arthritis, and fibrosis. Due to the involvement in a wide variety of disease types, kinases have been studied for exploitation and use as targets for therapeutics. There are many limitations with developing kinase target therapeutics due to the high similarity of kinase active site composition, making the utilization of new techniques to determine kinase exploitability for therapeutic design with high specificity essential for the advancement of novel drug strategies. In silico approaches have become increasingly prevalent for providing useful insight into protein structure-function relationships, offering new information to researchers about drug discovery strategies. This work utilizes streamlined computational techniques on an atomistic level to aid in the identification of orthologue and isoform exploitability, identifying new features to be utilized for future inhibitor design. By exploring two separate kinases and kinase targeting domains, we found that orthologues and isoforms contain distinct features, likely responsible for their biological roles, which can be utilized and exploited for selective drug development. In this work, we identified new exploitable features between kinase orthologues for treatment in Human African Trypanosomiasis and structural morphology differences between two kinase isoforms that can potentially be exploited for cancer therapeutic design. / Master of Science in Life Sciences / Numerous diseases such as cancer, Alzheimer's disease, cardiovascular disease, rheumatoid arthritis, and fibrosis have been attributed to different cell growth and survival pathways. Many of these pathways are controlled by a class of enzymes called kinases. Kinases are involved in almost every metabolic pathway in human cells and can act as molecular switches to turn on and off disease progression. Due to the involvement of these kinases' in a wide variety of disease types, kinases have been continually studied for the development of new drugs. Developing effective drugs for kinases requires an extensive understanding of the structural characteristics due to the high structural similarity across all kinases. In silico, or computational, techniques are useful strategies for drug development practices, offering new information into protein structure-function relationships, which in turn can be utilized in drug discovery advancements. Utilizing computational methods to explore structural features can help identify specific protein structural features, thus providing new strategies for protein specific inhibitor design. In this work, we identified new exploitable features between kinase orthologues for treatment in Human African Trypanosomiasis and structural morphology differences between two kinase isoforms that can potentially be exploited for cancer therapeutic design.
|
724 |
Defining Novel Clusters of PPAR gamma Partial Agonists for Virtual ScreeningCollins, Erin Taylor 03 June 2022 (has links)
Peroxisome proliferator-activated receptor γ (PPARγ) is associated with a wide range of diseases, including type 2 diabetes mellitus (T2D). Thiazolidinediones (TZDs) are agonists of PPARγ which have an insulin sensitizing effect, and are therefore used as a treatment for T2D. However, TZDs cause negative side effects in patients, such as weight gain, edema, and increased risk of bone fracture. Partial agonists could be an alternative to TZD-based drugs with fewer side effects. However, there is a lack of understanding of the types of PPARγ partial agonists and how they differ from full agonists. In silico techniques, like virtual screening, molecular docking, and pharmacophore modeling, allow us to determine and characterize markers of varying levels of agonism. An extensive search of the RCSB Protein Data Bank found 62 structures of PPARγ resolved with partial agonists. Cross-docking was performed and found that two PDB structures, 3TY0 and 5TWO, would be effective as receptor structures for virtual screening. By clustering known partial agonists by common pharmacophore features, we found several distinct groups of partial agonists. Interaction and pharmacophore models were created for each group of partial agonists. Virtual screening of FDA-approved compounds showed that the models were able to predict potential partial agonists of PPARγ. This study provides additional insight into the different binding modes of partial agonists of PPARγ and their characteristics. These models can be used to assist drug discovery efforts for intelligently designing novel therapeutics for T2D which have fewer negative side effects. / Master of Science in Life Sciences / The peroxisome proliferator-activated receptor γ (PPARγ) protein is associated with a wide range of diseases, including type 2 diabetes mellitus (T2D). Thiazolidinediones (TZDs) are compounds that activate PPARγ, and increase insulin sensitivity in patients with T2D. However, TZDs cause negative side effects in patients, such as weight gain, increased fluid retention, and increased risk of bone fracture. Partial agonists could be an alternative to TZD-based drugs with fewer side effects. However, there is a lack of understanding of the types of PPARγ partial agonists and how they differ from full agonists. Computational techniques allow us to investigate common features between known partial agonists. An extensive search of the RCSB Protein Data Bank found 62 structures of PPARγ which contained partial agonists. Each known partial agonist was docked into twelve complete PPARγ structures, and it was found that two structure models would be effective as receptor structures for virtual screening. A set of known partial agonists were grouped based on common chemical features, and three distinct groups of partial agonists were found. Binding criteria for each of these three groups were developed. A library of FDA-approved compounds was screened using the criteria for binding to identify potential novel partial agonists. Three potential novel partial agonists were found in the screening. This study provides additional insight into how different compounds activate PPARγ. These methods can be used to assist drug discovery efforts for intelligently designing novel therapeutics for T2D which have fewer negative side effects.
|
725 |
Exploring Protein Folding Intermediates Across Physiology and TherapyBonaldo, Valerio 08 July 2024 (has links)
In recent years, advancements in computational methodologies have shed light on the complex process that makes proteins fold into their three-dimensional shapes. These new tools have helped us understand the steps proteins take to achieve these structures, revealing the presence of metastable intermediates along the folding pathways. This newfound understanding has led to the development of a novel drug discovery strategy known as Pharmacological Protein Inactivation by Folding Intermediate Targeting (PPI-FIT). This approach specifically targets folding intermediates to modulate protein expression levels, thus opening new opportunities for pharmacological intervention. This approach could be particularly relevant for diseases linked to targets that were previously considered "undruggable." A promising outcome of the PPI-FIT strategy is the identification of SM875, a compound that has been shown to lower prion protein (PrP) levels, positioning it as a potential therapeutic candidate for prion diseases. This study describes the initial phase of optimization of the SM875 scaffold. It encompasses the chemical diversification of SM875, followed by systematic evaluations of its biological activity and toxicity, with the aim of establishing structure-activity relationships (SAR). This knowledge is instrumental in guiding the synthesis of analogs with enhanced properties, advancing them through the development pipeline toward clinical application. Furthermore, this work investigates the potential regulatory function of folding intermediates in physiological processes, hypothesizing that they may serve as substrates for post translational modifications (PTMs). This hypothesis proposes an expansion of the current paradigm, suggesting that folding intermediates could constitute an additional layer of regulation within the complex network of proteostasis.
|
726 |
Contrastive Filtering And Dual-Objective Supervised Learning For Novel Class Discovery In Document-Level Relation ExtractionHansen, Nicholas 01 June 2024 (has links) (PDF)
Relation extraction (RE) is a task within natural language processing focused on the classification of relationships between entities in a given text. Primary applications of RE can be seen in various contexts such as knowledge graph construction and question answering systems. Traditional approaches to RE tend towards the prediction of relationships between exactly two entity mentions in small text snippets. However, with the introduction of datasets such as DocRED, research in this niche has progressed into examining RE at the document-level. Document-level relation extraction (DocRE) disrupts conventional approaches as it inherently introduces the possibility of multiple mentions of each unique entity throughout the document along with a significantly higher probability of multiple relationships between entity pairs.
There have been many effective approaches to document-level RE in recent years utilizing various architectures, such as transformers and graph neural networks. However, all of these approaches focus on the classification of a fixed number of known relationships. As a result of the large quantity of possible unique relationships in a given corpus, it is unlikely that all interesting and valuable relationship types are labeled before hand. Furthermore, traditional naive approaches to clustering on unlabeled data to discover novel classes are not effective as a result of the unique problem of large true negative presence. Therefore, in this work we propose a multi-step filter and train approach leveraging the notion of contrastive representation learning to discover novel relationships at the document level. Additionally, we propose the use of an alternative pretrained encoder in an existing DocRE solution architecture to improve F1 performance in base multi-label classification on the DocRED dataset by 0.46.
To the best of our knowledge, this is the first exploration of novel class discovery applied to the document-level RE task. Based upon our holdout evaluation method, we increase novel class instance representation in the clustering solution by 5.5 times compared to the naive approach and increase the purity of novel class clusters by nearly 4 times. We then further enable the retrieval of both novel and known classes at test time provided human labeling of cluster propositions achieving a macro F1 score of 0.292 for novel classes. Finally, we note only a slight macro F1 decrease on previously known classes from 0.402 with fully supervised training to 0.391 with our novel class discovery training approach.
|
727 |
Topic Model-based Mass Spectrometric Data Analysis in Cancer Biomarker Discovery StudiesWang, Minkun 14 June 2017 (has links)
Identification of disease-related alterations in molecular and cellular mechanisms may reveal useful biomarkers for human diseases including cancers. High-throughput omic technologies for identifying and quantifying multi-level biological molecules (e.g., proteins, glycans, and metabolites) have facilitated the advances in biological research in recent years. Liquid (or gas) chromatography coupled with mass spectrometry (LC/GC-MS) has become an essential tool in such large-scale omic studies. Appropriate LC/GC-MS data preprocessing pipelines are needed to detect true differences between biological groups. Challenges exist in several aspects of MS data analysis. Specifically for biomarker discovery, one fundamental challenge in quantitation of biomolecules is owing to the heterogeneous nature of human biospecimens. Although this issue has been a subject of discussion in cancer genomic studies, it has not yet been rigorously investigated in mass spectrometry based omic studies. Purification of mass spectometric data is highly desired prior to subsequent differential analysis.
In this research dissertation, we majorly target at addressing the purification problem through probabilistic modeling. We propose an intensity-level purification model (IPM) to computationally purify LC/GC-MS based cancerous data in biomarker discovery studies. We further extend IPM to scan-level purification model (SPM) by considering information from extracted ion chromatogram (EIC, scan-level feature). Both IPM and SPM belong to the category of topic modeling approach, which aims to identify the underlying "topics" (sources) and their mixture proportions in composing the heterogeneous data. Additionally, denoise deconvolution model (DMM) is proposed to capture the noise signals in samples based on purified profiles. Variational expectation-maximization (VEM) and Markov chain Monte Carlo (MCMC) methods are used to draw inference on the latent variables and estimate the model parameters. Before we come to purification, other research topics in related to mass spectrometric data analysis for cancer biomarker discovery are also investigated in this dissertation.
Chapter 3 discusses the developed methods in the differential analysis of LC/GC-MS based omic data, specifically for the preprocessing in data of LC-MS profiled glycans. Chapter 4 presents the assumptions and inference details of IPM, SPM, and DDM. A latent Dirichlet allocation (LDA) core is used to model the heterogeneous cancerous data as mixtures of topics consisting of sample-specific pure cancerous source and non-cancerous contaminants. We evaluated the capability of the proposed models in capturing mixture proportions of contaminants and cancer profiles on LC-MS based serum and tissue proteomic and GC-MS based tissue metabolomic datasets acquired from patients with hepatocellular carcinoma (HCC) and liver cirrhosis. Chapter 5 elaborates these applications in cancer biomarker discovery, where typical single omic and integrative analysis of multi-omic studies are included. / Ph. D. / This dissertation documents the methodology and outputs for computational deconvolution of heterogeneous omics data generated from biospecimens of interest. These omics data convey qualitative and quantitative information of biomolecules (e.g., glycans, proteins, metabolites, etc.) which are profiled by instruments named liquid (or gas) chromatography and mass spectrometer (LC/GC-MS). In the scenarios of biomarker discovery, we aim to find out the significant difference on intensities of biomolecules with respect to two specific phenotype groups so that the biomarkers can be used as clinical indicators for early stage diagnose. However, the purity of collected samples constitutes the fundamental challenge to the process of differential analysis. Instead of experimental methods that are costly and time-consuming, we treat the purification task as one of the topic modeling procedures, where we assume each observed biomolecular profile is a mixture of hidden pure source together with unwanted contaminants.
The developed models output the estimated mixture proportion as well as the underlying “topics”. With different level’s purification applied, improved discrimination power of candidate biomarkers and more biologically meaningful pathways were discovered in LC/GC-MS based multi-omic studies for liver cancer. This research work originates from a broader scope of probabilistic generative modeling, where rational assumptions are made to characterize the generation process of the observations. Therefore, the developed models in this dissertation have great potential in applications other than heterogeneous data purification discussed in this dissertation. A good example is to uncover the relationship of human gut microbiome with the host’s phenotypes of interest (e.g., disease like type-II diabetes). Similar challenges exist in how to infer the underlying intestinal flora distribution and estimate their mixture proportions.
This dissertation also covers topics of related data preprocessing and integration, but with a consistent goal in improving the performance of biomarker discovery. In summary, the research help address sample heterogeneity issue observed in LC/GC-MS based cancer biomarker discovery studies and shed light on computational deconvolution of the mixtures, which can be generalized to other domains of interest.
|
728 |
The changing landscape of cancer drug discovery: a challenge to the medicinal chemist of tomorrowPors, Klaus, Goldberg, F.W., Leamon, C.P., Rigby, A.C., Snyder, S.A., Falconer, Robert A. 11 1900 (has links)
No / Since the development of the first cytotoxic agents, synthetic organic chemistry has advanced
enormously. The synthetic and medicinal chemists of today are at the centre of drug development and
are involved in most, if not all, processes of drug discovery. Recent decreases in government funding and
reformed educational policies could, however, seriously impact on drug discovery initiatives worldwide.
Not only could these changes result in fewer scientific breakthroughs, but they could also negatively
affect the training of our next generation of medicinal chemists.
|
729 |
Aldehyde dehydrogenases in cancer: an opportunity for biomarker and drug development?Pors, Klaus, Moreb, J.S. 12 1900 (has links)
No / Aldehyde dehydrogenases (ALDHs) belong to a superfamily of 19 isozymes that are known to participate in many physiologically important biosynthetic processes including detoxification of specific endogenous and exogenous aldehyde substrates. The high expression levels of an emerging number of ALDHs in various cancer tissues suggest that these enzymes have pivotal roles in cancer cell survival and progression. Mapping out the heterogeneity of tumours and their cancer stem cell (CSC) component will be key to successful design of strategies involving therapeutics that are targeted against specific ALDH isozymes. This review summarises recent progress in ALDH-focused cancer research and discovery of small-molecule-based inhibitors.
|
730 |
Candidate Treponema pallidum biomarkers uncovered in urine from individuals with syphilis using mass spectrometryOsbak, K.K., Van Raemdonck, G.A., Dom, M., Cameron, C.E., Meehan, Conor J., Deforce, D., Van Ostade, X., Kenyon, C.R., Dhaenens, M. 05 November 2019 (has links)
No / Aim: A diagnostic test that could detect Treponema pallidum antigens in urine would facilitate the prompt diagnosis of syphilis. Materials & methods: Urine from 54 individuals with various clinical stages of syphilis and 6 controls were pooled according to disease stage and interrogated with complementary mass spectrometry techniques to uncover potential syphilis biomarkers. Results & conclusion: In total, 26 unique peptides were uncovered corresponding to four unique T. pallidum proteins that have low genetic sequence similarity to other prokaryotes and human proteins. This is the first account of direct T. pallidum protein detection in human clinical samples using mass spectrometry. The implications of these findings for future diagnostic test development is discussed. Data are available via ProteomeXchange with identifier PXD009707.
|
Page generated in 0.0345 seconds