Return to search

Context-aware Learning from Partial Observations

The Big Data revolution brought an increasing availability of data sets of unprecedented scales, enabling researchers in machine learning and data mining communities to escalate in learning from such data and providing data-driven insights, decisions, and predictions. However, on their journey, they are faced with numerous challenges, including dealing with missing observations while learning from such data or making predictions on previously unobserved or rare (“tail”) examples, which are present in a large span of domains including climate, medical, social networks, consumer, or computational advertising domains. In this thesis, we address this important problem and propose tools for handling partially observed or completely unobserved data by exploiting information from its context. Here, we assume that the context is available in the form of a network or sequence structure, or as additional information to point-informative data examples. First, we propose two structured regression methods for dealing with missing values in partially observed temporal attributed graphs, based on the Gaussian Conditional Random Fields (GCRF) model, which draw power from the network/graph structure (context) of the unobserved instances. Marginalized Gaussian Conditional Random Fields (m-GCRF) model is designed for dealing with missing response variable value (labels) in graph nodes, whereas Deep Feature Learning GCRF is able to deal with missing values in explanatory variables while learning feature representation jointly with learning complex interactions of nodes in a graph and together with the overall GCRF objective. Next, we consider unsupervised and supervised shallow and deep neural models for monetizing web search. We focus on two sponsored search tasks here: (i) query-to-ad matching, where we propose novel shallow neural embedding model worLd2vec with improved local query context (location) utilization and (ii) click-through-rate prediction for ads and queries, where Deeply Supervised Semantic Match model is introduced for dealing with unobserved and tail queries click-through-rate prediction problem, while jointly learning the semantic embeddings of a query and an ad, as well as their corresponding click-through-rate. Finally, we propose a deep learning approach for ranking investigators based on their expected enrollment performance on new clinical trials, that learns from both, investigator and trial-related heterogeneous (structured and free-text) data sources, and is applicable to matching investigators to new trials from partial observations, and for recruitment of experienced investigators, as well as new investigators with no previous experience in enrolling patients in clinical trials. Experimental evaluation of the proposed methods on a number of synthetic and diverse real-world data sets shows surpassing performance over their alternatives. / Computer and Information Science

Identiferoai:union.ndltd.org:TEMPLE/oai:scholarshare.temple.edu:20.500.12613/2927
Date January 2018
CreatorsGligorijevic, Jelena
ContributorsObradovic, Zoran, Vucetic, Slobodan, Dragut, Eduard Constantin, Zhao, Zhigen
PublisherTemple University. Libraries
Source SetsTemple University
LanguageEnglish
Detected LanguageEnglish
TypeThesis/Dissertation, Text
Format167 pages
RightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available., http://rightsstatements.org/vocab/InC/1.0/
Relationhttp://dx.doi.org/10.34944/dspace/2909, Theses and Dissertations

Page generated in 0.0019 seconds