<p>Ancestry is a complex and layered concept, but it must be operationalized for its objective use in genetic studies. Critical decisions in research analyses, clinical practice, and forensic investigations are based on genetic ancestry inference. For example, in genetic association studies for clinical and applied research, investigators may need to isolate one population of interest from a worldwide dataset to avoid false positive results, or in human identification, ancestry inferences can help reveal the identity of unknown DNA evidence by narrowing down a suspect list. Many studies seek to improve ancestry inference for these reasons. The research presented here offers valuable resources for exploring and improving genetic ancestry inference and intelligence toward identity. </p>
<p>First, analyses with ‘big data’ in genomics is a resource-intensive task that requires optimization. Therefore, this research introduces a suite of automated Snakemake workflows, <em>Iliad</em>, that was developed to give the research community an easy-to-learn, hands-off computational tool for genomic data processing of multiple data formats. <em>Iliad</em> can be installed and run on a Google Cloud Platform remote server instance in less than 20 minutes when using the provided installation code in the ReadTheDocs documentation. The workflows support raw data processing from various genetic data types including microarray, sequence, and compressed alignment data, as well as performing micro-workflows on variant call format (VCF) files to merge data or lift over variant positions. When compared to a similar workflow, <em>Iliad </em>completed processing one sample’s raw paired-end sequence reads to a human-legible VCF file in 7.6 hours which was three-times faster than the other workflow. This suite of workflows is paramount towards building reference population panels from human whole-genome sequence (WGS) data which is useful in many research studies including imputation, ancestry estimation, and ancestry informative marker (AIM) discovery.</p>
<p>Second, there are persistent challenges in ancestry inference for individuals of the Middle East, especially with the use of AIMs. This research demonstrates a population genomics study pertaining to the Middle East, novel population data from Lebanon (n=190), and an unsupervised genetic clustering approach with WGS data from the 1000 Genomes Project and Human Genome Diversity Project. These efforts for AIM discovery identified two single nucleotide polymorphisms (SNPs) based on their high allelic frequency differences between the Middle East and populations in Eurasia, namely Europe and South/Central Asia. These candidate AIMs were evaluated with the most current and comprehensive AIM panel to date, the VISAGE Enhanced Tool (ET), using an external validation set of Middle Eastern WGS data (n=137). Instead of relying on pre-defined biogeographic ancestry labels to confirm the accuracy of validation sample ancestry inference, this research produced a deep, unsupervised ADMIXTURE analysis on 3,469 worldwide WGS samples with nearly 2 million independent SNPs (r2 < 0.1) which provided a genetic “ground truth”. This resulted in 136/137 validation samples as Middle East and provided valuable insights toward reference samples with varying co-ancestries that ultimately affects the classification of admixed individuals. Novel deep learning methods, specifically variational autoencoders, were introduced for visualizing one hundred percent of the genetic variance found using these AIMS in an alternative method to PCA and presents distinct population clusters in a robust ancestry space that remains static for the projection of unknown samples to aid in ancestry inference and human identification. </p>
<p>Third, this research delves into a craniofacial study that makes improvements toward key intelligence information about physical identity by exploring the relationship between dentition and facial morphology with an advanced phenotyping approach paired with robust dental parameters used in clinical practice. Cone-beam computed tomography (CBCT) imagery was used to analyze the hard and soft tissue of the face at the same time. Low-to-moderate partial correlations were observed in several comparisons of dentition and soft tissue segments. These results included partial correlations of: i) inter-molar width and soft tissue segments nearest the nasal aperture, the lower maxillary sinuses, and a portion of the upper cheek, and ii) of lower incisor inclination and soft tissue segments overlapping the mentolabial fold. These results indicate that helpful intelligence information, potentially leading towards identity in forensic investigations, may be present where hard tissue structures are manifested in an observable way as a soft tissue phenotype. This research was a valuable preliminary study that paves the way towards the addition of facial hard tissue structures in combination with external soft tissue phenotypes to advance fundamental facial genetic research. Thus, CBCT scans greatly add to the current facial imagery landscape available for craniofacial research and provide hard and soft tissue data, each with measurable morphological variation among individuals. When paired with genetic association studies and functional biological experiments, this will ultimately lead to a greater understanding of the intricate coordination that takes place in facial morphogenesis, and in turn, guide clinical orthodontists to better treatment modalities with an emphasis on personalized medicine. Lastly, it aids intelligence methodologies when applied within the field of forensic anthropology.</p>
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/23750643 |
Date | 02 October 2023 |
Creators | Noah C Herrick (16649334) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/thesis/EXPLORING_THE_EFFECTS_OF_ANCESTRY_ON_INFERENCE_AND_IDENTITY_USING_BIOINFORMATICS/23750643 |
Page generated in 0.0028 seconds