Return to search

Bioinformatic Applications in Protein Low Complexity Regions and Targeted Metagenomics

Part I: Low complexity regions (LCRs) are common motifs in eukaryotic proteins, despite the
fact that they are also mutationally unstable. For LCRs to be widely used and tolerated there
must be regulatory mechanisms which compensate for their presence. I have endeavored to
characterize the relationships and co-evolution of LCRs with the abundance of the proteins that
host them as well as the transcripts which encode them. As the abundance of a gene product is
ultimately responsible for its associated phenotype, any relationships have implications for the
many neurodegenerative diseases associated with LCR expansion. I found that there are indeed
relationships. LCRs are more associated with low abundance proteins, but the opposite is true
at the RNA level: LCRs encoding transcripts have higher abundance. Investigating the
co-evolution of LCRs and transcript abundance revealed that on short evolutionary timescales
indels in LCRs influence the selective pressures on TAb. Viewing LCRs through the previously
unexplored lens of abundance has generated new results. Results which, together with
explorations of information flow and low-complexity in untranslated regions, expand our
knowledge of the functional impacts of LCRs evolution.
Part II: A commonly encountered problem in DNA sequencing is a situation where the DNA
of interest makes up a small proportion of the DNA in a sample. This challenge can be
compounded when the DNA of interest may come from many different organisms. Targeted
metagenomics is a set of techniques which aim to bias sequencing results towards the DNA of
interest. Many of these techniques rely on carefully designed probes which are specific to
targets of interest. I have developed a bioinformatic tool, HUBDesign, to design oligonucleotide
probes to capture identifying sequences from a given set of targets of interest. Using
HUBDesign, and other methods, I have contributed to projects ranging in context from clinical
to ancient DNA. / Thesis / Doctor of Science (PhD) / This thesis describes research in two fields: repetitive protein sequences and methods for
sequencing the portions of a sample in which one is most interested. In the first part I describe
the general properties of repetitive proteins, establish a connection between the presence of
repeats in a protein and the amount of that protein which a cell maintains, and show that these
two quantities evolve together. This informs our understanding of evolution and regulation with
implications for repeat related diseases and further evolutionary research. In the second part I
describe a method for selecting short nucleotide sequences which can be used to capture
specifically the DNA of organisms of interest, as well as applications of this and other methods.
These contributions are widely applicable as targeted sequencing is useful in fields as far apart
as clinical sepsis diagnosis and determining the colour of ancient animals.

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/29248
Date January 2023
CreatorsDickson, Zachery
ContributorsGolding, G Brian, Biology
Source SetsMcMaster University
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.0022 seconds