Global ETD Search

1	Domain-based Frameworks and Embeddings for Dynamics over Networks Adhikari, Bijaya 01 June 2020 (has links) Broadly this thesis looks into network and time-series mining problems pertaining to dynamics over networks in various domains. Which locations and staff should we monitor in order to detect C. Difficile outbreaks in hospitals? How do we predict the peak intensity of the influenza incidence in an interpretable fashion? How do we infer the states of all nodes in a critical infrastructure network where failures have occurred? Leveraging domain-based information should make it is possible to answer these questions. However, several new challenges arise, such as (a) presence of more complex dynamics. The dynamics over networks that we consider are complex. For example, C. Difficile spreads via both people-to-people and surface-to-people interactions and correlations between failures in critical infrastructures go beyond the network structure and depend on the geography as well. Traditional approaches either rely on models like Susceptible Infectious (SI) and Independent Cascade (IC) which are too restrictive because they focus only on single pathways or do not incorporate the model at all, resulting in sub-optimality. (b) data sparsity. Additionally, the data sparsity still persists in this space. Specifically, it is difficult to collect the exact state of each node in the network as it is high-dimensional and difficult to directly sample from. (c) mismatch between data and process. In many situations, the underlying dynamical process is unknown or depends on a mixture of several models. In such cases, there is a mismatch between the data collected and the model representing the dynamics. For example, the weighted influenza like illness (wILI) count released by the CDC, which is meant to represent the raw fraction of total population infected by influenza, actually depends on multiple factors like the number of health-care providers reporting the number and public tendency to seek medical advice. In such cases, methods which generalize well to unobserved (or unknown) models are required. Current approaches often fail in tackling these challenges as they either rely on restrictive models, require large volume of data, and/or work only for predefined models. In this thesis, we propose to leverage domain-based frameworks, which include novel models and analysis techniques, and domain-based low dimensional representation learning to tackle the challenges mentioned above for networks and time-series mining tasks. By developing novel frameworks, we can capture the complex dynamics accurately and analyze them more efficiently. For example, to detect C. Difficile outbreaks in a hospital setting, we use a two-mode disease model to capture multiple pathways of outbreaks and discrete lattice-based optimization framework. Similarly, we propose an information theoretic framework which includes geographically correlated failures in critical infrastructure networks to infer the status of the network components. Moreover, as we use more realistic frameworks to accurately capture and analyze the mechanistic processes themselves, our approaches are effective even with sparse data. At the same time, learning low-dimensional domain-aware embeddings capture domain specific properties (like incidence-based similarity between historical influenza seasons) more efficiently from sparse data, which is useful for subsequent tasks. Similarly, since the domain-aware embeddings capture the model information directly from the data without any modeling assumptions, they generalize better to new models. Our domain-aware frameworks and embeddings enable many applications in critical domains. For example, our domain-aware frameworks for C. Difficile allows different monitoring rates for people and locations, thus detecting more than 95% of outbreaks. Similarly, our framework for product recommendation in e-commerce for queries with sparse engagement data resulted in a 34% improvement over the current Walmart.com search engine. Similarly, our novel framework leads to a near optimal algorithms, with additive approximation guarantee, for inferring network states given a partial observation of the failures in networks. Additionally, by exploiting domain-aware embeddings, we outperform non-trivial competitors by up to 40% for influenza forecasting. Similarly, domain-aware representations of subgraphs helped us outperform non-trivial baselines by up to 68% in the graph classification task. We believe our techniques will be useful for variety of other applications in many areas like social networks, urban computing, and so on. / Doctor of Philosophy / Which locations and staff should we monitor to detect pathogen outbreaks in hospitals? How do we predict the peak intensity of the influenza incidence? How do we infer the failures in water distribution networks? These are some of the questions on dynamics over networks discussed in this thesis. Here, we leverage the domain knowledge to answer these questions. Specifically, we propose (a) novel optimization frameworks where we exploit domain knowledge for tractable formulations and near-optimal algorithms, and (b) low dimensional representation learning where we design novel neural architectures inspired by domain knowledge. Our frameworks capture the complex dynamics accurately and help analyze them more efficiently. At the same time, our low-dimensional embeddings capture domain specific properties more efficiently from sparse data, which is useful for subsequent tasks. Similarly, our domain-aware embeddings are inferred directly from the data without any modeling assumptions, hence they generalize better. The frameworks and embeddings we develop enable many applications in several domains. For example, our domain-aware framework for outbreak detection in hospitals has more than 95% accuracy. Similarly, our framework for product recommendation in e-commerce for queries with sparse data resulted in a 34% improvement over state-of-the-art e-commerce search engine. Additionally, our approach outperforms non-trivial competitors by up to 40% in influenza forecasting. data mining networks time-series domain-based learning graph summarization network state inference data driven epidemiology
2	A Domain Based Approach to Crawl the Hidden Web Pandya, Milan 04 December 2006 (has links) There is a lot of research work being performed on indexing the Web. More and more sophisticated Web crawlers are been designed to search and index the Web faster. But all these traditional crawlers crawl only the part of Web we call “Surface Web”. They are unable to crawl the hidden portion of the Web. These traditional crawlers retrieve contents only from surface Web pages which are just a set of Web pages linked by some hyperlinks and ignoring the hidden information. Hence, they ignore tremendous amount of information hidden behind these search forms in Web pages. Most of the published research has been done to detect such searchable forms and make a systematic search over these forms. Our approach here will be based on a Web crawler that analyzes search forms and fills tem with appropriate content to retrieve maximum relevant information from the database. web crawler search spider web bot best first crawler focused web crawler web page domain based Computer Sciences

Search results

Domain-based Frameworks and Embeddings for Dynamics over Networks

A Domain Based Approach to Crawl the Hidden Web