• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Knowledge-fused Identification of Condition-specific Rewiring of Dependencies in Biological Networks

Tian, Ye 30 September 2014 (has links)
Gene network modeling is one of the major goals of systems biology research. Gene network modeling targets the middle layer of active biological systems that orchestrate the activities of genes and proteins. Gene network modeling can provide critical information to bridge the gap between causes and effects which is essential to explain the mechanisms underlying disease. Among the network construction tasks, the rewiring of relevant network structure plays critical roles in determining the behavior of diseases. To systematically characterize the selectively activated regulatory components and mechanisms, the modeling tools must be able to effectively distinguish significant rewiring from random background fluctuations. While differential dependency networks cannot be constructed by existing knowledge alone, effective incorporation of prior knowledge into data-driven approaches can improve the robustness and biological relevance of network inference. Existing studies on protein-protein interactions and biological pathways provide constantly accumulated rich domain knowledge. Though novel incorporation of biological prior knowledge into network learning algorithms can effectively leverage domain knowledge, biological prior knowledge is neither condition-specific nor error-free, only serving as an aggregated source of partially-validated evidence under diverse experimental conditions. Hence, direct incorporation of imperfect and non-specific prior knowledge in specific problems is prone to errors and theoretically problematic. To address this challenge, we propose a novel mathematical formulation that enables incorporation of prior knowledge into structural learning of biological networks as Gaussian graphical models, utilizing the strengths of both measurement data and prior knowledge. We propose a novel strategy to estimate and control the impact of unavoidable false positives in the prior knowledge that fully exploits the evidence from data while obtains "second opinion" by efficient consultations with prior knowledge. By proposing a significance assessment scheme to detect statistically significant rewiring of the learned differential dependency network, our method can assign edge-specific p-values and specify edge types to indicate one of six biological scenarios. The data-knowledge jointly inferred gene networks are relatively simple to interpret, yet still convey considerable biological information. Experiments on extensive simulation data and comparison with peer methods demonstrate the effectiveness of knowledge-fused differential dependency network in revealing the statistically significant rewiring in biological networks, leveraging data-driven evidence and existing biological knowledge, while remaining robust to the false positive edges in the prior knowledge. We also made significant efforts in disseminating the developed method tools to the research community. We developed an accompanying R package and Cytoscape plugin to provide both batch processing ability and user-friendly graphic interfaces. With the comprehensive software tools, we apply our method to several practically important biological problems to study how yeast response to stress, to find the origin of ovarian cancer, and to evaluate the drug treatment effectiveness and other broader biological questions. In the yeast stress response study our findings corroborated existing literatures. A network distance measurement is defined based on KDDN and provided novel hypothesis on the origin of high-grade serous ovarian cancer. KDDN is also used in a novel integrated study of network biology and imaging in evaluating drug treatment of brain tumor. Applications to many other problems also received promising biological results. / Ph. D.
2

Supervised Learning for Sequential and Uncertain Decision Making Problems - Application to Short-Term Electric Power Generation Scheduling

Cornélusse, Bertrand 21 December 2010 (has links)
Our work is driven by a class of practical problems of sequential decision making in the context of electric power generation under uncertainties. These problems are usually treated as receding horizon deterministic optimization problems, and/or as scenario-based stochastic programs. Stochastic programming allows to compute a first stage decision that is hedged against the possible futures and -- if a possibility of recourse exists -- this decision can then be particularized to possible future scenarios thanks to the information gathered until the recourse opportunity. Although many decomposition techniques exist, stochastic programming is currently not tractable in the context of day-ahead electric power generation and furthermore does not provide an explicit recourse strategy. The latter observation also makes this approach cumbersome when one wants to evaluate its value on independent scenarios. We propose a supervised learning methodology to learn an explicit recourse strategy for a given generation schedule, from optimal adjustments of the system under simulated perturbed conditions. This methodology may thus be complementary to a stochastic programming based approach. With respect to a receding horizon optimization, it has the advantages of transferring the heavy computation offline, while providing the ability to quickly infer decisions during online exploitation of the generation system. Furthermore the learned strategy can be validated offline on an independent set of scenarios. On a realistic instance of the intra-day electricity generation rescheduling problem, we explain how to generate disturbance scenarios, how to compute adjusted schedules, how to formulate the supervised learning problem to obtain a recourse strategy, how to restore feasibility of the predicted adjustments and how to evaluate the recourse strategy on independent scenarios. We analyze different settings, namely either to predict the detailed adjustment of all the generation units, or to predict more qualitative variables that allow to speed up the adjustment computation procedure by facilitating the ``classical' optimization problem. Our approach is intrinsically scalable to large-scale generation management problems, and may in principle handle all kinds of uncertainties and practical constraints. Our results show the feasibility of the approach and are also promising in terms of economic efficiency of the resulting strategies. The solutions of the optimization problem of generation (re)scheduling must satisfy many constraints. However, a classical learning algorithm that is (by nature) unaware of the constraints the data is subject to may indeed successfully capture the sensitivity of the solution to the model parameters. This has nevertheless raised our attention on one particular aspect of the relation between machine learning algorithms and optimization algorithms. When we apply a supervised learning algorithm to search in a hypothesis space based on data that satisfies a known set of constraints, can we guarantee that the hypothesis that we select will make predictions that satisfy the constraints? Can we at least benefit from our knowledge of the constraints to eliminate some hypotheses while learning and thus hope that the selected hypothesis has a better generalization error? In the second part of this thesis, where we try to answer these questions, we propose a generic extension of tree-based ensemble methods that allows incorporating incomplete data but also prior knowledge about the problem. The framework is based on a convex optimization problem allowing to regularize a tree-based ensemble model by adjusting either (or both) the labels attached to the leaves of an ensemble of regression trees or the outputs of the observations of the training sample. It allows to incorporate weak additional information in the form of partial information about output labels (like in censored data or semi-supervised learning) or -- more generally -- to cope with observations of varying degree of precision, or strong priors in the form of structural knowledge about the sought model. In addition to enhancing the precision by exploiting information that cannot be used by classical supervised learning algorithms, the proposed approach may be used to produce models which naturally comply with feasibility constraints that must be satisfied in many practical decision making problems, especially in contexts where the output space is of high-dimension and/or structured by invariances, symmetries and other kinds of constraints.

Page generated in 0.1007 seconds