501 |
Identification of a phospho-hnRNP E1 Nucleic Acid Consensus Sequence Mediating Epithelial to Mesenchymal TransitionBrown, Andrew S. 27 July 2015 (has links)
No description available.
|
502 |
Characterization of the evolution of satellite DNA across PasseriformesMartins Borges, Inês January 2022 (has links)
Satellite DNA (satDNA) is among the fastest evolving elements in the genome and is highly abundant in some eukaryotic genomes. Its highly repetitive nature means it is challenging to assemble, and thus underrepresented in most assemblies and often understudied as a result. Birds are an ideal model organism for the study of satDNA and its evolution, since the large amount of available sequenced genomes of this clade allows for dense sampling across various evolutionary timescales, and the low number of satDNA families within their satellitomes facilitates their study and comparison between species. Here, we characterize satDNA and its evolution across Passeriformes, an avian clade containing two-thirds of all bird species spanning ~50 million years of evolution. With this goal we use both short-read data and long-read assemblies of species representative of over 30 passerine families in this clade to shed light on the evolution of its satellitome. We focus on examining the phylogenetic relationships between satellites common to most species as well as characterizing satellite array structure and location in genome assemblies. We also analyse satellite abundance in each genome, focusing on differences in the satellite content between male and female individuals to look for satellites present in the female-specific W sex chromosome and the germline-restricted chromosome. Seven satDNA families shared by a quarter of the species were found, that were likely present in an ancestral species shared by most, if not all the species of Passeriformes. We observed that satDNA evolution is complex and does not follow species phylogeny and that satellite arrays generally have a simple head-to-tail conformation, with evidence in four of the sampled species of satDNA arrays with higher-order repeats. We also found two satDNA families with fairly consistent monomer length and conserved regions that we hypothesise to might be functional.
|
503 |
Pipeline for Next Generation Sequencing data of phage displayed libraries to support affinity ligand discoverySchleimann-Jensen, Ella January 2022 (has links)
Affinity ligands are important molecules used in affinity chromatography for purification of significant substances from complex mixtures. To find affinity ligands specific to important target molecules could be a challenging process. Cytiva uses the powerful phage display technique to find new promising affinity ligands. The phage display technique is a method run in several enrichment cycles. When developing new affinity ligands, a protein scaffold library with a diversity of up to 1010-1011 different protein scaffold variants is run through the enrichment cycles. The result from the phage display rounds is screened for target molecule binding followed by sequencing, usually with one of the conventional screening methods ELISA or Biacore followed by Sanger sequencing. However, the throughput of these analyses are unfortunately very low, often with only a few hundred screened clones. Therefore, Next Generation Sequencing or NGS, has become an increasingly popular screening method for phage display libraries which generates millions of sequences from each phage display round. This creates a need for a robust data analysis pipeline to be able to interpret the large amounts of data. In this project, a pipeline for analysis of NGS data of phage displayed libraries has been developed at Cytiva. Cytiva uses NGS as one of their screening methods of phage displayed protein libraries because of the high throughput compared to the conventional screening methods. The purpose is to find new affinity ligands for purification of essential substances used in drugs. The pipeline has been created using the object-oriented programming language R and consists of several analyses covering the most important steps to be able to find promising results from the NGS data. With the developed pipeline the user can analyze the data on both DNA and protein sequence level and per position residue breakdown, as well as filter the data based on specific amino acids and positions. This gives a robust and thorough analysis which can lead to promising results that can be used in the development of novel affinity ligands for future purification products.
|
504 |
USING GENE EXPRESSION ANALYSIS TO GUIDE AND IDENTIFY TREATMENTS FOR BREAST CANCER PATIENTSHallett, Robin M. 10 1900 (has links)
<p>Based on breast cancer clinical trial data accumulated over the last several decades it is obvious that standard breast cancer therapeutics extend survival in breast cancer patients. However, only a minority of patients within these trials derive benefit from treatment. In a population of breast cancer patients treated with adjuvant therapy after surgery, many patients are over-treated, as they would never experience relapse even without receiving adjuvant therapies. Among the remaining patients, some achieve durable remission from therapy, whereas others relapse despite therapy. Hence, there is an obvious need to develop biomarkers that can serve to identify these three populations of patients, such that only patients who are likely to benefit from available therapies are treated with these therapies, as well as to develop new therapies for the treatment of patients who aren’t afforded durable remission by approved treatments. Here, we present the identification of biomarkers to identify low risk breast cancer patients who experience excellent long-term survival even without adjuvant therapy. Conversely, high risk patients represent those patients most likely to benefit from intervention with aggressive treatment regimens. We also report on the identification of biomarkers which can predict the likelihood of response to approved chemotherapy regimens, which could be used to further stratify high risk patients into responders and non-responders. Finally, for high risk patients unlikely to be afforded durable remission from available therapies, we report on the identification of agents that target breast tumor-initiating cells, and may be effective for the treatment of these patients.</p> / Doctor of Philosophy (PhD)
|
505 |
A statistical framework to detect gene-environment interactions influencing complex traitsDeng, Wei Q. 27 August 2014 (has links)
<p>Advancements in human genomic technology have helped to improve our understanding of how genetic variation plays a central role in the mechanism of disease susceptibility. However, the very high dimensional nature of the data generated from large-scale genetic association studies has limited our ability to thoroughly examine genetic interactions. A prioritization scheme – Variance Prioritization (VP) – has been developed to select genetic variants based on differences in the quantitative trait variance between the possible genotypes using Levene’s test (Pare et al., 2010). Genetic variants with Levene’s test p-values lower than a pre-determined level of significance are selected to test for interactions using linear regression models. Under a variety of scenarios, VP has increased power to detect interactions over an exhaustive search as a result of reduced search space. Nevertheless, the use of Levene’s test does not take into account that the variance will either monotonically increase or decrease with the number of minor alleles when interactions are present. To address this issue, I propose a maximum likelihood approach to test for trends in variance between the genotypes, and derive a closed-form representation of the likelihood ratio test (LRT) statistic. Using simulations, I examine the performance of LRT in assessing the inequality of quantitative traits variance stratified by genotypes, and subsequently in identifying potentially interacting genetic variants. LRT is also used in an empirical dataset of 2,161 individuals to prioritize genetic variants for gene-environment interactions. The interaction p-values of the prioritized genetic variants are consistently lower than expected by chance compared to the non-prioritized, suggesting improved statistical power to detect interactions in the set of prioritized genetic variants. This new statistical test is expected to complement the existing VP framework and accelerate the process of genetic interaction discovery in future genome-wide studies and meta-analyses.</p> / Master of Health Sciences (MSc)
|
506 |
Statistical Methods Development for the Multiomic Systems BiologyUgidos Guerrero, Manuel 28 April 2023 (has links)
[ES] La investigación en Biología de Sistemas se ha expandido en los últimos años. El análisis simultáneo de diferentes tipos de datos ómicos permite el estudio de las conexiones y relaciones entre los diferentes niveles de organización celular. La presente tesis doctoral tiene como objetivo desarrollar y aplicar estrategias de integración multiómica al campo de la biología de sistemas.
El elevado coste de las tecnologías ómicas, dificulta que los laboratorios puedan abordar un estudio multiómico completo. No obstante, la gran disponibilidad de datos ómicos en repositorios públicos, permite el uso de estos datos ya generados. Desafortunadamente, la combinación de datos ómicos provenientes de diferentes orígenes, da lugar a la aparición de un ruido no deseado en los datos, el efecto lote. El efecto lote impide el correcto análisis conjunto de los datos y es necesario el uso de los llamados Algoritmos de Corrección de Efecto Lote para eliminarlo. En la actualidad, existe un gran número de éstos algoritmos que se basan en diferentes modelos estadísticos. Sin embargo, los métodos existentes no están pensados para los diseños multiómicos ya que solo permiten la corrección de un mismo tipo de ómica que debe haber sido medida en todos los lotes. Por ello desarrollamos la herramienta MultiBaC basada en la regresión PLS y modelos ANOVA-SCA, que permite la corrección del efecto lote en diseños multiómicos, permitiendo la corrección de datos que no hayan sido medidos en todos los lotes. En este trabajo, MultiBaC fué validado y evaluado en diferentes conjuntos de datos, además presentamos MultiBaC como paquete de R para facilitar su uso.
La mayoría de métodos existentes de integración multiómica son métodos multivariantes basados en el análisis del espacio latente. Estos métodos se conocen como ``dirigidos por datos'', y se basan en la búsqueda de correlaciones para determinar las relaciones entre las variables. Estos métodos necesitan de gran cantidad de observaciones o muestras para poder encontrar correlaciones significativas. Lamentablemente, en el mundo de la biología molecular, los conjuntos de datos con un gran número de muestras no son muy habituales, debido al elevado coste de generación de los datos. Como alternativa a los métodos dirigidos por datos, algunas estrategias de integración multiómicas se basan en métodos ``dirigidos por modelos''. Estos métodos pueden ajustarse con un menor número de observaciones y son muy útiles para encontrar relaciones mecanísticas entre los componentes celulares. Los métodos dirigidos por modelos necesitan de una información a priori, el modelo, que normalmente es un modelo metabólico del organismo estudiado. Actualmente, sólo transcriptómica y metabolómica cuantitativa, han sido los dos tipos de dato ómico que se han integrado con éxito usando métodos dirigidos por modelos.Sin embargo, la metabolómica cuantitativa no está muy extendida y la mayoría de laboratorios generan metabolómica no cuantitativa, la cuál no puede integrarse con los métodos actuales. Para contribuir en esta cuestión, desarrollamos MAMBA, una herramienta de integración multiómica dirigida por modelos y basada en métodología de optimización matemática, que es capaz de analizar conjuntamente metabolómica no cuantitativa con otro tipo de ómica asociada a genes, como por ejemplo la trascriptómica. MAMBA fue comparado con otros métodos existentes en cuanto a la capacidad de predcción de metabolitos y fué aplicado al conjunto interno de datos multiómicos. Este conjunto de datos multiómicos fue generado dentro del proyecto PROMETEO, en el cuál está enmarcada esta tesis. MAMBA demostró capturar la biología conocida sobre nuestro diseño experimental, además de ser útil para derivar nuevas observaciones e hipótesis biológicas.
En conjunto, esta tesis presenta herramientas útiles para el campo de la biología de sistemas, y que cubren tanto el preprocesado de datos multiómicos como su posterior análisis estadístico integrativo. / [CA] La investigació en Biologia de Sistemes s'ha expandit els darrers. L'anàlisi simultània de diferents tipus de dades òmiques permet l'estudi de les connexions i les relacions entre els diferents nivells d'organització cel·lular. Aquesta tesi doctoral té com a objectiu desenvolupar i aplicar estratègies dintegració multiòmica al camp de la biologia de sistemes.
L'elevat cost de les tecnologies òmiques dificulta que els laboratoris puguin abordar un estudi multiòmic complet. Això no obstant, la gran disponibilitat de dades òmiques en repositoris públics permet l'ús d'aquestes dades ja generades. Malauradament, la combinació de dades òmiques provinents de diferents orígens, dóna lloc a l'aparició d'un soroll no desitjat en les dades, l'efecte lot. L'efecte lot impedeix la correcta anàlisi conjunta de les dades i cal utilitzar els anomenats algorismes de correcció d'Efecte lot per eliminar-lo. Actualment hi ha un gran nombre d'aquests algorismes que corregeixen l'efecte lot que es basen en diferents models estadístics. Tot i això, els mètodes existents no estan pensats per als dissenys multiòmics ja que només permeten la correcció d'un mateix tipus de dada òmica que ha d'haver estat mesurada en tots els lots. Per això desenvolupem la nostra eina MultiBaC basada en la regressió PLS i models ANOVA-SCA, que pot corregir l'efecte lot en dissenys multiòmics, permetent la correcció de dades que no hagin estat mesurades a tots els lots. En aquest treball, MultiBaC ha sigut validat i avaluat en diferents conjunts de dades, a més a més, presentem MultiBaC com a paquet de R per facilitar l'ús de la nostra eina.
La majoria de mètodes d'integració multiòmica existents són mètodes multivariants basats en l'anàlisi de l'espai latent. Aquests mètodes es coneixen com a "dirigits per dades", i es basen en la cerca de correlacions per determinar les relacions entre les diferents variables. Els mètodes dirigits per dades necessiten gran quantitat d'observacions o mostres per poder trobar correlacions significatives entre les variables. Lamentablement, al món de la biologia molecular, els conjunts de dades amb un gran nombre de mostres no són molt habituals, degut a l'elevat cost de generació de les dades òmiques. Com a alternativa als mètodes dirigits per dades, algunes estratègies d'integració multiòmiques es basen en mètodes "dirigits per models". Aquests mètodes poden ajustar-se amb un nombre menor d'observacions i són molt útils per trobar relacions mecanístiques entre els components cel·lulars. Tot i això, els mètodes dirigits per models necessiten una informació a priori, el model, que normalment és un model metabòlic de l'organisme estudiat. Actualment, únicament transcriptòmica i metabolòmica quantitativa, han estat els dos tipus de dada òmica que s'han integrat amb èxit usant mètodes dirigits per models. No obstant això, la metabolòmica quantitativa no està gaire estesa i la majoria de laboratoris generen metabolòmica no quantitativa, les quals no es poden integrar amb els mètodes actuals. Per contribuir en aquesta qüestió, hem desenvolupat MAMBA, una eina d'integració multiòmica dirigida per models i basada en la metodologia d'optimització matemàtica, que és capaç d'analitzar conjuntament metabolòmica no quantitativa amb un altre tipus d'òmica associada a gens, com per exemple la trascriptòmica. MAMBA va ser comparat amb altres mètodes existents quant a la capacitat de predcció de metabòlits i va ser aplicat al conjunt intern de dades multiòmiques. Aquest conjunt de dades multiòmiques va ser generat dins del projecte PROMETEO, en el qual està emmarcada aquesta tesi. Es demostra que MAMBA capturar la biologia coneguda sobre el nostre disseny experimental, a més de ser útil per derivar noves observacions i hipòtesis biològiques.
En conjunt, aquesta tesi presenta eines útils per al camp de la biologia de sistemes, i que cobreixen tant el preprocessament de dades multiòmiques com la seua posterior anàlisi estadística integrativa. / [EN] Systems Biology research has expanded over the last years together with the development of omic technologies. The combination and simultaneous analysis of different kind of omic data allows the study of the connections and relationships between different cellular layers. Indeed, multiomic integration strategies provides a key source of knowledge about the cell as a system. The present Ph.D. thesis aims to study, develop and apply multiomic integration approaches to the field of systems biology.
The still high cost of omics technologies makes it difficult for most laboratories to afford a complete multiomic study. However, the wide availability of omic data in public repositories allows the use of these already generated data. Unfortunately, the combination of omic data from different sources provokes the appearance of unwanted noise in data, known as batch effect. Batch effect impairs the correct integrative analysis of the data. Therefore, the use of so-called Batch Effect Correction Algorithms is necessary. As of today, there is a large number of such algorithms based on different statistical models and methods that correct batch effect and are part of the data pre-processing steps. However, the existing methods are not intended for multi-omics designs as they only allow the correction of the same type of omic data that must be measured across all batches. For this reason, we developed MultiBaC algorithm, which removes batch effect in multiomic designs, allowing the correction of data that are not measured across all batches. MultiBaC is based on PLS regression and ANOVA-SCA models and was validated and evaluated on different datasets. We also present MultiBaC as an R package to facilitate the use of this tool.
Most existing multiomic integration approaches are multivariate methods based on latent space analysis. These methods are known as data-driven as they are based on the search for correlations to determine the relationships between the different variables. Data-driven methods require a large number of observations or samples to find robust and/or significant correlations among features. Unfortunately, in the molecular biology field, data sets with a large number of samples are not very common, again due to the high cost of generating omic data. As an alternative to data-driven methods, some multiomic integration strategies are based on model-driven approaches. These methods can be fitted with a smaller number of observations and are very useful for finding mechanistic relationships between different cellular components. However, model-driven methods require a priori information, which is usually a metabolic model of the organism under study. Currently, only transcriptomics and quantitative metabolomics have been successfully integrated using model-driven methods. Nonetheless, quantitative metabolomics is not very widespread and most laboratories generate non-quantitative or semi-quantitative metabolomics, which cannot be integrated with current methods. To address this issue, we developed MAMBA, a model-driven multiomic integration method that relies on mathematical optimization problems and is able to jointly analyze non-quantitative or semi-quantitative metabolomics with other types of gene-centric omic data, such as transcriptomics. MAMBA was compared to other existing methods in terms of metabolite prediction accuracy and was applied to a multiomic dataset generated within the PROMETEO project, in which this thesis is framed. MAMBA proved to capture the known biology of our experimental design and was useful for deriving new findings and biological hypotheses.
Altogether, this thesis presents useful tools for the field of systems biology, covering both the pre-processing of multiomic datasets and their subsequent statistical integrative analysis. / Ugidos Guerrero, M. (2023). Statistical Methods Development for the Multiomic Systems Biology [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/193031
|
507 |
Enhancing carbon fixation in Rubisco through generative modelling / Mot en förbättring av kolfixering av Rubisco genom generativ AIShute, Ellen January 2024 (has links)
Kolavskiljning, avlägsnande av koldioxid (CO2) från atmosfären, har fått uppmärksamhet som en metod för att mildra effekterna av den globala uppvärmningen. Växter och fototrofa mikroorganismer har den inneboende förmåganatt fånga upp kol genom fixering av CO2 för att producera biomassa. Däremot inhemska kolfixeringsvägar begränsas av nyckelenzymer med låg katalytisk aktivitet vilket resulterar i låg energieffektivitet. Rubisco är en sådan nyckelenzym, ökänt för sin dåliga prestanda. Tidigare forskning har misslyckats när det gäller att förbättra kolet fixering i Rubisco med konventionella metoder. Generativ modellering har dykt upp som en innovativ förhållningssätt till enzymteknik, dra fördel av olika arkitekturer för neurala nätverk för att föreslå en ny varianter med önskade egenskaper. Här tränas en variationsautokodare (VAE) på Rubisco-sekvensen utrymme användes för utmaningen med Rubiscos ingenjörskonst. Två modeller utbildades och med hjälp av dimensionsreduktionsegenskapen hos VAE, utforskades fitnesslandskapet i Rubisco. Sekvenser var märkt med katalytiskt relevanta data och en regressionsmodell byggdes med syftet att förutsäga dessa sekvenser med ökad katalytisk aktivitet. Nya Rubisco-sekvenser genererades efter systematiska utfrågning av det lågdimensionella rummet. Användningen av generativ modellering här ger ett nytt perspektiv på Rubisco engineering. / Carbon capture, the removal of carbon dioxide (CO2) from the atmosphere, has gained attention as a method to mitigate the effects of global warming. Plants and phototrophic microorganisms have the inherent ability to capture carbon through the fixation of CO2 to produce biomass. However, native carbon fixing pathways are limited by key enzymes with low catalytic activity resulting in low energy efficiency. Rubisco is one such key enzyme, notorious for its poor performance. Past research has been unsuccessful at enhancing carbon fixation in Rubisco through conventional methods. Generative modelling has emerged as an innovative approach to enzyme engineering, taking advantage of different neural network architectures to propose novel variants with desired characteristics. Here, a variational autoencoder (VAE) trained on the Rubisco sequence space was applied to the challenge of Rubisco engineering. Two models were trained and, using the dimensionality reduction property of VAEs, the fitness landscape of Rubisco was explored. Sequences were labelled with catalytically relevant data and a regression model was built with the aim of predicting those sequences with enhanced catalytic activity. Novel Rubisco sequences were generated following systematic interrogation of the low-dimensional space. The use of generative modelling here provides a fresh perspective on Rubisco engineering.
|
508 |
Data Deconvolution for Drug PredictionMenacher, Lisa Maria January 2024 (has links)
Treating cancer is difficult as the disease is complex and drug responses often depend on the patient's characteristics. Precision medicine aims to solve this by selecting individualized treatments. Since this involves the analysis of large datasets, machine learning can be used to make the drug selection process more efficient. Traditionally, such models utilize bulk gene expression data. However, this potentially masks information from small cell populations and fails to address tumor heterogeneity. Therefore, this thesis applies data deconvolution methods to bulk gene expression data and estimates the corresponding cell type-specific gene expression profiles. This "increases" the resolution of the input data for the drug response prediction. A hold-out dataset, LODOCV and LOCOCV were used for the evaluation of this approach. Furthermore, all results are compared against a baseline model, which was trained on bulk data. Overall, the accuracy of the cell type-specific model did not show an improvement compared to the bulk model. It also prioritizes information from bulk samples, which makes the additional data unnecessary. The robustness of the cell type-specific model is slightly lower than that of the bulk model. Note, that these outcomes are not necessarily due to a flaw in the underlying concept, but may be connected to poor deconvolution results as the same reference matrix was used for the deconvolution of all bulk samples regardless of the cancer type or disease.
|
509 |
Cardiac mechanical model personalisation and its clinical applicationsXi, Jiahe January 2013 (has links)
An increasingly important research area within the field of cardiac modelling is the development and study of methods of model-based parameter estimation from clinical measurements of cardiac function. This provides a powerful approach for the quantification of cardiac function, with the potential to ultimately lead to the improved stratification and treatment of individuals with pathological myocardial mechanics. In particular, the diastolic function (i.e., blood filling) of left ventricle (LV) is affected by its capacity for relaxation, or the decay in residual active tension (AT) whose inhibition limits the relaxation of the LV chamber, which in turn affects its compliance (or its reciprocal, stiffness). The clinical determination of these two factors, corresponding to the diastolic residual AT and passive constitutive parameters (stiffness) in the cardiac mechanical model, is thus essential for assessing LV diastolic function. However these parameters are difficult to be assessed in vivo, and the traditional criterion to diagnose diastolic dysfunction is subject to many limitations and controversies. In this context, the objective of this study is to develop model-based applicable methodologies to estimate in vivo, from 4D imaging measurements and LV cavity pressure recordings, these clinically relevant parameters (passive stiffness and active diastolic residual tension) in computational cardiac mechanical models, which enable the quantification of key clinical indices characterising cardiac diastolic dysfunction. Firstly, a sequential data assimilation framework has been developed, covering various types of existing Kalman filters, outlined in chapter 3. Based on these developments, chapter 4 demonstrates that the novel reduced-order unscented Kalman filter can accurately retrieve the homogeneous and regionally varying constitutive parameters from the synthetic noisy motion measurements. This work has been published in Xi et al. 2011a. Secondly, this thesis has investigated the development of methods that can be applied to clinical practise, which has, in turn, introduced additional difficulties and opportunities. This thesis has presented the first study, to our best knowledge, in literature estimating human constitutive parameters using clinical data, and demonstrated, for the first time, that while an end-diastolic MR measurement does not constrain the mechanical parameters uniquely, it does provide a potentially robust indicator of myocardial stiffness. This work has been published in Xi et al. 2011b. However, an unresolved issue in patients with diastolic dysfunction is that the estimation of myocardial stiffness cannot be decoupled from diastolic residual AT because of the impaired ventricular relaxation during diastole. To further address this problem, chapter 6 presents the first study to estimate diastolic parameters of the left ventricle (LV) from cine and tagged MRI measurements and LV cavity pressure recordings, separating the passive myocardial constitutive properties and diastolic residual AT. We apply this framework to three clinical cases, and the results show that the estimated constitutive parameters and residual active tension appear to be a promising candidate to delineate healthy and pathological cases. This work has been published in Xi et al. 2012a. Nevertheless, the need to invasively acquire LV pressure measurement limits the wide application of this approach. Chapter 7 addresses this issue by analysing the feasibility of using two kinds of non-invasively available pressure measurements for the purpose of inverse parameter estimation. The work has been submitted for publication in Xi et al. 2012b.
|
510 |
Cumulative Distribution Networks: Inference, Estimation and Applications of Graphical Models for Cumulative Distribution FunctionsHuang, Jim C. 01 March 2010 (has links)
This thesis presents a class of graphical models for directly representing the joint cumulative distribution function (CDF) of many random variables, called cumulative distribution networks (CDNs). Unlike graphical models for probability density and mass functions, in a CDN, the marginal probabilities for any subset of variables are obtained by computing limits of functions in the model. We will show that the conditional independence properties in a CDN are distinct from the conditional independence properties of directed, undirected and factor graph models, but include the conditional independence properties of bidirected graphical models. As a result, CDNs are a parameterization for bidirected models that allows us to represent complex statistical dependence relationships between observable variables. We will provide a method for constructing a factor graph model with additional latent variables for which graph separation of variables in the corresponding CDN implies conditional independence of the separated variables in both the CDN and in the factor graph with the latent variables marginalized out. This will then allow us to construct multivariate extreme value distributions for which both a CDN and a corresponding factor graph representation exist.
In order to perform inference in such graphs, we describe the `derivative-sum-product' (DSP) message-passing algorithm where messages correspond to derivatives of the joint cumulative distribution function. We will then apply CDNs to the problem of learning to rank, or estimating parametric models for ranking, where CDNs provide a natural means with which to model multivariate probabilities over ordinal variables such as pairwise preferences. We will show that many previous probability models for rank data, such as the Bradley-Terry and Plackett-Luce models, can be viewed as particular types of CDN. Applications of CDNs will be described for the problems of ranking players in multiplayer team-based games, document retrieval and discovering regulatory sequences in computational biology using the above methods for inference and estimation of CDNs.
|
Page generated in 0.1243 seconds