Return to search

Networks and multivariate statistics as applied to biological datasets and wine-related omics / Netwerke en meerveranderlike statistiek toegepas op biologiese datastelle en wyn-verwante omika

Thesis (PhD)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: Introduction: Wine production is a complex biotechnological process aiming
at productively coordinating the interactions and outputs of several biological
systems, including grapevine and many microorganisms such as wine yeast
and wine bacteria. High-throughput data generating tools in the elds of
genomics, transcriptomics, proteomics, metabolomics and microbiomics are
being applied both locally and globally in order to better understand complex
biological systems. As such, the datasets available for analysis and mining
include de novo datasets created by collaborators as well as publicly available
datasets which one can use to get further insight into the systems under study.
In order to model the complexity inherent in and across these datasets it is
necessary to develop methods and approaches based on network theory and
multivariate data analysis as well as to explore the intersections between these
two approaches to data modelling, mining and interpretation.
Networks: The traditional reductionist paradigm of analysing single components
of a biological system has not provided tools with which to adequately
analyse data sets that are attempting to capture systems-level information.
Network theory has recently emerged as a new discipline with which to model
and analyse complex systems and has arisen from the study of real and often
quite large networks derived empirically from the large volumes of data
that have collected from communications, internet, nancial and biological
systems. This is in stark contrast to previous theoretical approaches to understanding
complex systems such as complexity theory, synergetics, chaos
theory, self-organised criticality, and fractals which were all sweeping theoretical
constructs based on small toy models which proved unable to address the
complexity of real world systems.
Multivariate Data Analysis: Principle components analysis (PCA) and
Partial Least Squares (PLS) regression are commonly used to reduce the dimensionality of a matrix (and amongst matrices in the case of PLS) in which
there are a considerable number of potentially related variables. PCA and PLS
are variance focused approaches where components are ranked by the amount
of variance they each explain. Components are, by de nition, orthogonal to
one another and as such, uncorrelated.
Aims: This thesis explores the development of Computational Biology tools
that are essential to fully exploit the large data sets that are being generated
by systems-based approaches in order to gain a better understanding of winerelated
organisms such as grapevine (and tobacco as a laboratory-based plant
model), plant pathogens, microbes and their interactions. The broad aim of
this thesis is therefore to develop computational methods that can be used in
an integrated systems-based approach to model and describe di erent aspects
of the wine making process from a biological perspective. To achieve this
aim, computational methods have been developed and applied in the areas of
transcriptomics, phylogenomics, chemiomics and microbiomics.
Summary: The primary approaches taken in this thesis have been the use of
networks and multivariate data analysis methods to analyse highly dimensional
data sets. Furthermore, several of the approaches have started to explore the
intersection between networks and multivariate data analysis. This would seem
to be a logical progression as both networks and multivariate data analysis are
focused on matrix-based data modelling and therefore have many of their roots
in linear algebra. / AFRIKAANSE OPSOMMING: Inleiding: Wynproduksie is 'n komplekse biotegnologiese proses wat mik op
die produktiewe koördinering van verskeie interaksies en uitsette van verskeie
biologiese sisteme. Hierdie sisteme sluit in die wingerd, wat van besondere belang
is, asook die wyn gis en wyn bakterieë. Hoë-deurset data generasie word
huidiglik beide globaal en plaaslik toegepas in die velde van genomika, transkriptomika,
proteomika, metabolomika en mikrobiomika. As sulks is hierdie
tipe datastelle beskikbaar vir ontleding, bemyning en verkening. Die datastelle
kan de novo gegenereer word, met behulp van medewerkers, of dit kan vanuit
die publieke databasisse gewerf word waar sulke datastelle dikwels beskikbaar
gemaak word sodat verdere insig verkry kan word met betrekking tot die sisteem
onder studie. Die hoë-deurset datastelle onder bespreking bevat 'n hoë
mate van inherente kompleksiteit, beide ten opsigte van ditself asook tussen
verskeie datastelle. Om ten einde hierdie datastelle en hul inherente kompleksiteit
te modelleer is dit nodig om metodes en benaderings te ontwikkel wat
gesetel is in netwerk teorie en meerveranderlike statistiek. Verdermeer is dit
ook nodig om die kruisings tussen netwerk teorie en meerveranderlike statistiek
te verken om sodoende die modellering, bemyning, verkening en interpretasie
van data te verbeter.
Netwerke: Die tradisionele reduksionistiese paradigma, waarby enkele komponente
van 'n biologiese sisteem geontleed word, het tot dusver nie voldoende
metodes en gereedskap gelewer waarmee datastelle, wat streef om sisteemvlak
informasie te bekom, geontleed kan word nie. Netwerk teorie het na vore gekom
as 'n nuwe dissipline wat toegepas kan word vir die model-skepping en
ontleding van komplekse sisteme. Dit stem uit die studie van egte, dikwels
groot netwerke wat empiries afgelei word uit die groot volumes data wat tans na vore kom vanuit kommunikasie-, internet-, nansiële- en biologiese sisteme.
Dit is in skrille kontras met vorige teoretiese benaderings wat gestreef het
om komplekse sisteme te verstaan met konsepte soos kompleksiteits teorie,
synergetics , chaos teorie, self-georganiseerde kritikaliteit en fraktale. Al die
bogeneomde is breë teoretiese konstrukte, gebasseer op relatief kleinskaal modelle,
wat nie instaat was om oplossings vir die kompleksiteit van egte-wêreld
sisteme te bied nie.
Meerveranderlike Data-analise: Hoofkomponente-ontleding (PCA) en Partial
Least Squares (PLS) regressie word dikwels gebruik om die dimensionaliteit
van 'n matriks (en tussen matrikse in die geval van PLS) te verminder.
Hierdie matrikse bevat dikwels 'n aansienlike groot hoeveelheid moontlikverwante
veranderlikes. PCA en PLS is variansie gedrewe metodes en behels
dat komponente gerang word deur die hoeveelheid variansie wat elke component
verduidelik. Komponente is by de nisie ortogonaal ten opsigte van
mekaar en as sulks ongekorreleerd.
Doelwitte: Hierdie tesis verken die ontwikkeling van verskeie Computational
Biology metodes wat noodsaaklik is om ten volle die groot skaal datastelle
te benut wat tans deur sisteem-gebasseerde benaderings gegenereer word. Die
doel is om beter begrip en kennis van wyn verwante organismes te kry, hierdie
organismes sluit in die wingerd (met tabak as laboratorium-gebasseerde plant
model), plant patogene en microbes sowel as hulle interaksies.
Die breë mikpunt van hierdie tesis is dus om gerekenaardiseerde metodes
te ontwikkel wat gebruik kan word in 'n geintergreerde sisteem-gebaseerde benadering
tot die modellering en beskrywing van verskillende aspekte van die
wynmaak proses vanuit 'n biologiese standpunt. Om die mikpunt te bereik is
gerekenaardiseerde metodes ontwikkel en toegepas in die velde van transkriptomika,
logenomika, chemiomika en mikrobiomika.
Opsomming: Die primêre benadering geneem in hierdie tesis is die gebruik
van netwerke en meerveranderlike data-ontleding metodes om hoë-dimensie
datastelle te ontleed. Verdermeer, verskeie van die metodes begin om die
gemeenskaplike grond tussen netwerke en meerveranderlike data-ontleding te
verken. Dit blyk om 'n logiese progressie te wees, aangesien beide netwerke en
meerveranderlike data-ontleding gefokus is op matriks-gebaseerde data modellering
en dus gewortel is in liniêre algebra.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/85630
Date12 1900
CreatorsJacobson, Daniel A.
ContributorsVivier, M., Stellenbosch University. Faculty of AgriSciences. Dept. of Viticulture and Oenology. Institute for Wine Biotechnology.
PublisherStellenbosch : Stellenbosch University
Source SetsSouth African National ETD Portal
Languageen_ZA
Detected LanguageUnknown
TypeThesis
Format241 p. : ill.
RightsStellenbosch University

Page generated in 0.0182 seconds