Global ETD Search

Return to search

Phenotyping with Partially Labeled, Partially Observed Data

Identifying a group of individuals that share a common set of characteristics is a conceptually simple task, which is often difficult in practice. Such phenotyping problems emerge in various settings, including the analysis of clinical data. In this setting, phenotyping is often stymied by persistent data quality issues. These include a lack of reliable labels to indicate the presence of absence of characteristics of interest, and significant missingness in observed variables.

This dissertation introduces methods for learning phenotypes when the data contain missing values (partially observed) and labels are scarce (partially labeled). Aim 1 utilizes an unsupervised probabilistic graphical model to learn phenotypes from partially observed data. Aim 2 introduces a related semi-supervised probabilistic graphical model for learning phenotypes from partially labeled clinical data. Finally, Aim 3 describes a method for training deep generative models when the training data contain missing values. The algorithm is then applied in a semi-supervised setting where it accounts for partially labeled data as well.

https://doi.org/10.7916/30x4-3h16

Bioinformatics

Phenotype

Identifer	oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/30x4-3h16
Date	January 2023
Creators	Rodriguez, Victor Alfonso
Source Sets	Columbia University
Language	English
Detected Language	English
Type	Theses

Page generated in 0.0056 seconds

Phenotyping with Partially Labeled, Partially Observed Data

Description

Links & Downloads

Tags

Additional Fields