Spelling suggestions: "subject:"[een] GENETIC DATA"" "subject:"[enn] GENETIC DATA""
1 |
From blood to data : an ethnographic account of the construction of the Generation Scotland population genetic databaseMarsden, Wendy January 2008 (has links)
This thesis is an examination of a population genetic database as both a social and scientific entity. Science and social science usually operate in a dichotomy this is a synergy of the two. The thesis examines practices and processes, and reveals how the formation of the Generation Scotland assemblage is the producer of multiple disconnections and connections layered in the science, technology, objects, people and places. The story is based on a multi‐sited ethnography that moves from the medical setting of blood sample and data collection, through the practices and processes of the laboratory, to end up in the much more diffuse settings of computer analysis. The blood sample is transformed into digital genetic data, and then connected to diverse other data for research. It traces the transformation and aggregation of heterogeneous elements which will become fixed in the population genetic database through scientific ordering and relationships which will be rendered immutable by the technology. In the processes described here, people’s bodies, and information about them, are explicitly rendered as research ‘resources’. The thesis contributes to the growing knowledge of population genetic databases, and it is a response to calls from social science to understand better the science and technology that are currently changing the shape of the social world. Disconnections and connections are creating a framework of new referents between health and illness, identity and relationships in a way that rearticulates the body and the population.
|
2 |
Characterisation of the Schizosaccharomyces pombe sum 1'+ geneDunand-Sauthier, Isabelle January 2001 (has links)
No description available.
|
3 |
Hypothesis Testing for High-Dimensional Regression Under Extreme Phenotype Sampling of Continuous TraitsJanuary 2018 (has links)
acase@tulane.edu / Extreme phenotype sampling (EPS) is a broadly-used design to identify candidate genetic factors contributing to the variation of quantitative traits. By enriching the signals in the extreme phenotypic samples within the top and bottom percentiles, EPS can boost the study power compared with the random sampling with the same sample size. The existing statistical methods for EPS data test the variants/regions individually. However, many disorders are caused by multiple genetic factors. Therefore, it is critical to simultaneously model the effects of genetic factors, which may increase the power of current genetic studies and identify novel disease-associated genetic factors in EPS. The challenge of the simultaneous analysis of genetic data is that the number (p ~10,000) of genetic factors is typically greater than the sample size (n ~1,000) in a single study. The standard linear model would be inappropriate for this p>n problem due to the rank deficiency of the design matrix. An alternative solution is to apply a penalized regression method – the least absolute shrinkage and selection operator (LASSO).
LASSO can deal with this high-dimensional (p>n) problem by forcing certain regression coefficients to be zero. Although the application of LASSO in genetic studies under random sampling has been widely studied, its statistical inference and testing under EPS remain unknown. We propose a novel sparse model (EPS-LASSO) with hypothesis test for high-dimensional regression under EPS based on a decorrelated score function to investigate the genetic associations, including the gene expression and rare variant analyses. The comprehensive simulation shows EPS-LASSO outperforms existing methods with superior power when the effects are large and stable type I error and FDR control. Together with the real data analysis of genetic study for obesity, our results indicate that EPS-LASSO is an effective method for EPS data analysis, which can account for correlated predictors. / 1 / Chao Xu
|
4 |
Genetic variation and population structure within the Gudgeon genus Hypseleotris (Pisces-Eleotridae) in Southeastern AustraliaSyaifullah, University of Western Sydney, Hawkesbury, Faculty of Science and Technology January 1999 (has links)
This study investigated the causes of high level of intra-and inter-population variation known to occur in the morphology of fish in the genus Hypseleotris Eleotride in southern Australia, particularly within the Murray-Darling river system. The three major objectives of the study were, identify the number and distribution of species,determine the genetic structure of the populations and analyse relationships between species and consider the process of speciation in this species complex. The investigation of morphological variation in Hypseleotris confirmed the presence of two well known species i.e. H. compressa and H. galli, in the coastal rivers and also of the inland species H. klunzingeri. Populations of Hypseleotris klunzigeri sensu lato in inland river were found to be very highly variable and analysis using discriminant functions and principle component analysis showed the widespread presence of three forms (A, B1 and B2). The analysis was confused by the presence of north/south clines and upstream/downstream variation in characteristic in each form. After these factors were removed, there was still a great deal of variation in each population. The presence of hybrids between each pair of inland species, identified by both morphological and genetic data, further confused the analysis and makes identification of all specimens to species in the field difficult. Examination of type material of H. Klunzingeri showed that this belonged to form B2. The other forms can be related to the undescribed species, Midgley's carp gudgeon and Lake's carp gudgeon. Keys to the species in the complex in southeastern Australia are given. The morphological and genetic data show that H. compressa and H. klunzingeri are sister species, primarily separated by the eastern uplands. Similarly, the coastal species, H. galli is related to form B1 and more distantly, to form A. Possible scenarios for the complex are given. / Doctor of Philosophy (PhD)
|
5 |
Application of Bayesian Hierarchical Models in Genetic Data AnalysisZhang, Lin 14 March 2013 (has links)
Genetic data analysis has been capturing a lot of attentions for understanding the mechanism of the development and progressing of diseases like cancers, and is crucial in discovering genetic markers and treatment targets in medical research. This dissertation focuses on several important issues in genetic data analysis, graphical network modeling, feature selection, and covariance estimation. First, we develop a gene network modeling method for discrete gene expression data, produced by technologies such as serial analysis of gene expression and RNA sequencing experiment, which generate counts of mRNA transcripts in cell samples. We propose a generalized linear model to fit the discrete gene expression data and assume that the log ratios of the mean expression levels follow a Gaussian distribution. We derive the gene network structures by selecting covariance matrices of the Gaussian distribution with a hyper-inverse Wishart prior. We incorporate prior network models based on Gene Ontology information, which avails existing biological information on the genes of interest. Next, we consider a variable selection problem, where the variables have natural grouping structures, with application to analysis of chromosomal copy number data. The chromosomal copy number data are produced by molecular inversion probes experiments which measure probe-specific copy number changes. We propose a novel Bayesian variable selection method, the hierarchical structured variable se- lection (HSVS) method, which accounts for the natural gene and probe-within-gene architecture to identify important genes and probes associated with clinically relevant outcomes. We propose the HSVS model for grouped variable selection, where simultaneous selection of both groups and within-group variables is of interest. The HSVS model utilizes a discrete mixture prior distribution for group selection and group-specific Bayesian lasso hierarchies for variable selection within groups. We further provide methods for accounting for serial correlations within groups that incorporate Bayesian fused lasso methods for within-group selection. Finally, we propose a Bayesian method of estimating high-dimensional covariance matrices that can be decomposed into a low rank and sparse component. This covariance structure has a wide range of applications including factor analytical model and random effects model. We model the covariance matrices with the decomposition structure by representing the covariance model in the form of a factor analytic model where the number of latent factors is unknown. We introduce binary indicators for estimating the rank of the low rank component combined with a Bayesian graphical lasso method for estimating the sparse component. We further extend our method to a graphical factor analytic model where the graphical model of the residuals is of interest. We achieve sparse estimation of the inverse covariance of the residuals in the graphical factor model by employing a hyper-inverse Wishart prior method for a decomposable graph and a Bayesian graphical lasso method for an unrestricted graph.
|
6 |
GENETIC FEATURE SELECTION USING DIMENSIONALITY REDUCTION APPROACHES: A COMPARATIVE STUDYNAHLAWI, Layan 16 December 2010 (has links)
The recent decade has witnessed great advances in microarray and genotyping technologies which allow genome-wide single nucleotide polymorphism (SNP) data to be captured on a single chip. As a consequence, genome-wide association studies require the development of algorithms capable of manipulating ultra-large-scale SNP datasets. Towards this goal, this thesis proposes two SNP selection methods; the first using Independent Component Analysis (ICA) and the second based on a modified version of Fast Orthogonal Search.
The first proposed technique, based on ICA, is a filtering technique; it reduces the number of SNPs in a dataset, without the need for any class labels. The second proposed technique, orthogonal search based SNP selection, is a multivariate regression approach; it selects the most informative features in SNP data to accurately model the entire dataset.
The proposed methods are evaluated by applying them to publicly available gene SNP datasets, and comparing the accuracies of each method in reconstructing the datasets. In addition, the selection results are compared with those of another SNP selection method based on Principal Component Analysis (PCA), which was also applied to the same datasets.
The results demonstrate the ability of orthogonal search to capture a higher amount of information than ICA SNP selection approach, all while using a smaller number of SNPs. Furthermore, SNP reconstruction accuracies using the proposed ICA methodology demonstrated the ability to summarize a greater or equivalent amount of information in comparison with the amount of information captured by the PCA-based technique reported in the literature.
The execution time of the second developed methodology, mFOS, has paved the way for its application to large-scale genome wide datasets. / Thesis (Master, Computing) -- Queen's University, 2010-12-15 18:03:00.208
|
7 |
Statistical inference in population genetics using microsatellitesCsilléry, Katalin January 2009 (has links)
Statistical inference from molecular population genetic data is currently a very active area of research for two main reasons. First, in the past two decades an enormous amount of molecular genetic data have been produced and the amount of data is expected to grow even more in the future. Second, drawing inferences about complex population genetics problems, for example understanding the demographic and genetic factors that shaped modern populations, poses a serious statistical challenge. Amongst the many different kinds of genetic data that have appeared in the past two decades, the highly polymorphic microsatellites have played an important role. Microsatellites revolutionized the population genetics of natural populations, and were the initial tool for linkage mapping in humans and other model organisms. Despite their important role, and extensive use, the evolutionary dynamics of microsatellites are still not fully understood, and their statistical methods are often underdeveloped and do not adequately model microsatellite evolution. In this thesis, I address some aspects of this problem by assessing the performance of existing statistical tools, and developing some new ones. My work encompasses a range of statistical methods from simple hypothesis testing to more recent, complex computational statistical tools. This thesis consists of four main topics. First, I review the statistical methods that have been developed for microsatellites in population genetics applications. I review the different models of the microsatellite mutation process, and ask which models are the most supported by data, and how models were incorporated into statistical methods. I also present estimates of mutation parameters for several species based on published data. Second, I evaluate the performance of estimators of genetic relatedness using real data from five vertebrate populations. I demonstrate that the overall performance of marker-based pairwise relatedness estimators mainly depends on the population relatedness composition and may only be improved by the marker data quality within the limits of the population relatedness composition. Third, I investigate the different null hypotheses that may be used to test for independence between loci. Using simulations I show that testing for statistical independence (i.e. zero linkage disequilibrium, LD) is difficult to interpret in most cases, and instead a null hypothesis should be tested, which accounts for the “background LD” due to finite population size. I investigate the utility of a novel approximate testing procedure to circumvent this problem, and illustrate its use on a real data set from red deer. Fourth, I explore the utility of Approximate Bayesian Computation, inference based on summary statistics, to estimate demographic parameters from admixed populations. Assuming a simple demographic model, I show that the choice of summary statistics greatly influences the quality of the estimation, and that different parameters are better estimated with different summary statistics. Most importantly, I show how the estimation of most admixture parameters can be considerably improved via the use of linkage disequilibrium statistics from microsatellite data.
|
8 |
A prote??o ?tico-jur?dica dos gen?ticos humanos em atividades de biobancos, ? luz da Constitui??o Federal de 1988 e das diretrizes internacionaisBellarmino , Clarissa Lopes 22 August 2018 (has links)
Submitted by PPG Direito (ppgdir@pucrs.br) on 2018-10-05T12:52:39Z
No. of bitstreams: 1
Clarissa_Lopes_ Bellarmino_Tes.pdf: 1858165 bytes, checksum: f3c9f1535f426f30e77001036a4fc860 (MD5) / Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-10-08T12:22:57Z (GMT) No. of bitstreams: 1
Clarissa_Lopes_ Bellarmino_Tes.pdf: 1858165 bytes, checksum: f3c9f1535f426f30e77001036a4fc860 (MD5) / Made available in DSpace on 2018-10-08T12:34:27Z (GMT). No. of bitstreams: 1
Clarissa_Lopes_ Bellarmino_Tes.pdf: 1858165 bytes, checksum: f3c9f1535f426f30e77001036a4fc860 (MD5)
Previous issue date: 2018-08-22 / Scientific research involving human beings, carried out in the biomedical and genetic areas, generates data from activities developed in biobank, among which genetic data, which should be protected not only in the ethical scope, but also in the legal sphere. Genetic data are information related to hereditary characteristics obtained from samples of human biological sources (eg: cells, hair, tissues, blood, bone, tumors and organs, among other materials derived from human body). These samples can be stored, processed and accessed on biobank - nonprofit structure, organized and systematized in universities and research institutions that provides technologies and/or equipment necessary for scientific investigation. The purpose of this study is to verify whether current Brazilian regulation is sufficient or not regarding the ethical-legal protection of human genetic data in biobank activities, considering the rights of the participants, as far as safeguarding their personal data and to sensitive data, fundamentals of genetic identify. Genetic identify understood as the projection of personal identify. In view of this, it is essential to review the national literature in the light of the constitutional principle of human dignity and fundamental rights, such as the right to life, health, intimacy, privacy and the free development of personality. In turn, a review and analysis of current Brazilian legislation, as well as the identification of guidelines, recommendations and international regulations, are constructs that support and contribute to the understanding of the relevance and pertinence of the subject of personal data protection, particularly human genetic data, demanding the legal support from the principles of the democratic State based on the rule of law. Finally, it is concluded that the protection of human genetic data in biobank activities is deserving of specific legislation, which includes measures of coherence in case of violation of the principle of human dignity and of the fundamental rights implied, that guarantees the integrity of the participant of the research and their personal rights, that imposes to the researchers, the participants, the research institutions and universities and the State, the duties and limits of action in relation to human life and health. Therefore, a specific infraconstitutional legislation, effective in the legal and effective protection in its implementation. / Las investigaciones cient?ficas que involucran seres humanos, realizadas en las ?reas biom?dica y gen?tica, generan datos a partir de actividades desarrolladas en biobanco - entre los cuales datos gen?ticos - debiendo los mismos ser protegidos, no s?lo en el ?mbito ?tico, sino tambi?n en el ?mbito jur?dico. Los datos gen?ticos son informaciones sobre las caracter?sticas hereditarias obtenidas de muestras de material biol?gico humano (por ejemplo, c?lulas, pelo, tejidos, sangre, huesos, tumores y ?rganos, entre otros ejemplares derivados del cuerpo humano). Estas muestras pueden ser almacenadas, procesadas y accesadas en biobanco - estructura sin fines de lucro, organizada y sistematizada en universidades e instituciones de investigaci?n que ofrece tecnolog?as y / o recursos necesarios para la investigaci?n cient?fica. El prop?sito de este estudio es verificar si la regulaci?n brasile?a actual es suficiente - o no - cuanto a la protecci?n ?tico-jur?dica de los datos gen?ticos humanos en actividades de biobanco, garantizando los derechos de los participantes, con la salvaguardia de sus datos personales y datos sensibles, fundamentales para la identidad gen?tica. Identidad gen?tica entendida como proyecci?n de la identidad personal. En este prisma, es imprescindible una revisi?n de la literatura nacional, a la luz del principio constitucional de la dignidad humana y de los derechos fundamentales, tales como el derecho a la vida, a la salud, a la intimidad, a la privacidad y al libre desarrollo de la personalidad. Por su parte, una revisi?n y an?lisis de la legislaci?n brasile?a actual, as? como la identificaci?n de directrices, recomendaciones y regulaciones internacionales, son constructos que fundamentan y contribuyen a la comprensi?n de la relevancia y pertenencia de la tem?tica acerca de la protecci?n de datos personales, particularmente datos gen?ticos humanos, lo que exije del Estado Democr?tico de Derecho el debido apoyo legal. Por ?ltimo, se concluye que la protecci?n de los datos gen?ticos humanos en las actividades de biobanco est? merced a una legislaci?n espec?fica, que contemple medidas de cohesi?n para el caso de afrenta al principio de la dignidad humana y de los derechos fundamentales implicados, que garantice la integridad del producto, que participa en la investigaci?n y sus derechos personales, que imponga a los investigadores, a los participantes, a las instituciones de investigaci?n y universidades y al Estado, los deberes y l?mites de actuaci?n en relaci?n con la vida humana y la salud. Por lo tanto, una legislaci?n infraconstitucional espec?fica, eficaz en la protecci?n jur?dica y efectiva en su implementaci?n. / As pesquisas cient?ficas envolvendo seres humanos, realizadas nas ?reas biom?dica e gen?tica, geram dados a partir de atividades desenvolvidas em biobanco, dentre os quais dados gen?ticos, devendo os mesmos serem protegidos, n?o apenas no ?mbito ?tico, como tamb?m no ?mbito jur?dico. Dados gen?ticos s?o informa??es referentes ?s caracter?sticas heredit?rias obtidas de amostras de material biol?gico humano (por exemplo: c?lulas, cabelo, tecidos, sangue, ossos, tumores e ?rg?os, entre outros exemplares derivados do corpo humano). Essas amostras podem ser armazenadas, processadas e acessadas em biobanco - estrutura sem fins lucrativos, organizada e sistematizada em universidades e institui??es de pesquisa que oferece tecnologias e/ou equipamentos necess?rios ? pesquisa cient?fica. O prop?sito deste estudo ? verificar se a regula??o brasileira atual ? suficiente ou n?o quanto ? prote??o ?tico-jur?dica dos dados gen?ticos humanos em atividades de biobanco, considerando os direitos dos participantes, na medida da salvaguarda dos seus dados pessoais e dados sens?veis, fundamentos da identidade gen?tica. Identidade gen?tica entendida como proje??o da identidade pessoal. Sob este prisma, ? imprescind?vel uma revis?o da literatura nacional e internacional, ? luz de princ?pio constitucional da dignidade humana e dos direitos fundamentais, tais como o direito ? vida, ? sa?de, ? intimidade, ? privacidade e ao livre desenvolvimento da personalidade. Por sua vez, uma revis?o e an?lise da legisla??o brasileira atual, assim como a identifica??o de diretrizes, recomenda??es e regulamenta??es internacionais, s?o construtos que fundamentam e contribuem para a compreens?o da relev?ncia e pertin?ncia da tem?tica da prote??o de dados pessoais, particularmente dados gen?ticos humanos, exigindo do Estado Democr?tico de Direito o devido respaldo legal. Por fim, conclui-se que a prote??o dos dados gen?ticos humanos em atividades de biobanco est? a merecer uma legisla??o espec?fica, que contemple medidas de coers?o para o caso de afronta ao princ?pio da dignidade humana e dos direitos fundamentais implicados, que garanta a integridade do participante da pesquisa e seus direitos pessoais, que imponha aos pesquisadores, aos participantes, ?s institui??es de pesquisa e universidades e ao Estado, os deveres e limites de atua??o em rela??o vida humana e ? sa?de. Portanto, uma legisla??o infraconstitucional espec?fica, eficaz na prote??o jur?dica e efetiva na sua implementa??o.
|
9 |
Approches bio-informatiques appliquées aux technologies émergentes en génomiqueLemieux Perreault, Louis-Philippe 02 1900 (has links)
Les études génétiques, telles que les études de liaison ou d’association, ont permis d’acquérir une plus grande connaissance sur l’étiologie de plusieurs maladies affectant les populations humaines. Même si une dizaine de milliers d’études génétiques ont été réalisées sur des centaines de maladies ou autres traits, une grande partie de leur héritabilité reste inexpliquée. Depuis une dizaine d’années, plusieurs percées dans le domaine de la génomique ont été réalisées. Par exemple, l’utilisation des micropuces d’hybridation génomique comparative à haute densité a permis de démontrer l’existence à grande échelle des variations et des polymorphismes en nombre de copies. Ces derniers sont maintenant détectables à l’aide de micropuce d’ADN ou du séquençage à haut débit. De plus, des études récentes utilisant le séquençage à haut débit ont permis de démontrer que la majorité des variations présentes dans l’exome d’un individu étaient rares ou même propres à cet individu. Ceci a permis la conception d’une nouvelle micropuce d’ADN permettant de déterminer rapidement et à faible coût le génotype de plusieurs milliers de variations rares pour un grand ensemble d’individus à la fois.
Dans ce contexte, l’objectif général de cette thèse vise le développement de nouvelles méthodologies et de nouveaux outils bio-informatiques de haute performance permettant la détection, à de hauts critères de qualité, des variations en nombre de copies et des variations nucléotidiques rares dans le cadre d’études génétiques. Ces avancées permettront, à long terme, d’expliquer une plus grande partie de l’héritabilité manquante des traits complexes, poussant ainsi l’avancement des connaissances sur l’étiologie de ces derniers.
Un algorithme permettant le partitionnement des polymorphismes en nombre de copies a donc été conçu, rendant possible l’utilisation de ces variations structurales dans le cadre d’étude de liaison génétique sur données familiales. Ensuite, une étude exploratoire a permis de caractériser les différents problèmes associés aux études génétiques utilisant des variations en nombre de copies rares sur des individus non reliés. Cette étude a été réalisée avec la collaboration du Wellcome Trust Centre for Human Genetics de l’University of Oxford. Par la suite, une comparaison de la performance des algorithmes de génotypage lors de leur utilisation avec une nouvelle micropuce d’ADN contenant une majorité de marqueurs rares a été réalisée. Finalement, un outil bio-informatique permettant de filtrer de façon efficace et rapide des données génétiques a été implémenté. Cet outil permet de générer des données de meilleure qualité, avec une meilleure reproductibilité des résultats, tout en diminuant les chances d’obtenir une fausse association. / Genetic studies, such as linkage and association studies, have contributed greatly to a better understanding
of the etiology of several diseases. Nonetheless, despite the tens of thousands of genetic
studies performed to date, a large part of the heritability of diseases and traits remains unexplained.
The last decade experienced unprecedented progress in genomics. For example, the use of
microarrays for high-density comparative genomic hybridization has demonstrated the existence
of large-scale copy number variations and polymorphisms. These are now detectable using DNA
microarray or high-throughput sequencing. In addition, high-throughput sequencing has shown
that the majority of variations in the exome are rare or unique to the individual. This has led to
the design of a new type of DNA microarray that is enriched for rare variants that can be quickly
and inexpensively genotyped in high throughput capacity.
In this context, the general objective of this thesis is the development of methodological approaches
and bioinformatics tools for the detection at the highest quality standards of copy number polymorphisms
and rare single nucleotide variations. It is expected that by doing so, more of the
missing heritability of complex traits can then be accounted for, contributing to the advancement
of knowledge of the etiology of diseases.
We have developed an algorithm for the partition of copy number polymorphisms, making it feasible
to use these structural changes in genetic linkage studies with family data. We have also conducted
an extensive study in collaboration with the Wellcome Trust Centre for Human Genetics of the
University of Oxford to characterize rare copy number definition metrics and their impact on study
results with unrelated individuals. We have conducted a thorough comparison of the performance
of genotyping algorithms when used with a new DNA microarray composed of a majority of very
rare genetic variants. Finally, we have developed a bioinformatics tool for the fast and efficient
processing of genetic data to increase quality, reproducibility of results and to reduce spurious
associations.
|
10 |
Approches bio-informatiques appliquées aux technologies émergentes en génomiqueLemieux Perreault, Louis-Philippe 02 1900 (has links)
Les études génétiques, telles que les études de liaison ou d’association, ont permis d’acquérir une plus grande connaissance sur l’étiologie de plusieurs maladies affectant les populations humaines. Même si une dizaine de milliers d’études génétiques ont été réalisées sur des centaines de maladies ou autres traits, une grande partie de leur héritabilité reste inexpliquée. Depuis une dizaine d’années, plusieurs percées dans le domaine de la génomique ont été réalisées. Par exemple, l’utilisation des micropuces d’hybridation génomique comparative à haute densité a permis de démontrer l’existence à grande échelle des variations et des polymorphismes en nombre de copies. Ces derniers sont maintenant détectables à l’aide de micropuce d’ADN ou du séquençage à haut débit. De plus, des études récentes utilisant le séquençage à haut débit ont permis de démontrer que la majorité des variations présentes dans l’exome d’un individu étaient rares ou même propres à cet individu. Ceci a permis la conception d’une nouvelle micropuce d’ADN permettant de déterminer rapidement et à faible coût le génotype de plusieurs milliers de variations rares pour un grand ensemble d’individus à la fois.
Dans ce contexte, l’objectif général de cette thèse vise le développement de nouvelles méthodologies et de nouveaux outils bio-informatiques de haute performance permettant la détection, à de hauts critères de qualité, des variations en nombre de copies et des variations nucléotidiques rares dans le cadre d’études génétiques. Ces avancées permettront, à long terme, d’expliquer une plus grande partie de l’héritabilité manquante des traits complexes, poussant ainsi l’avancement des connaissances sur l’étiologie de ces derniers.
Un algorithme permettant le partitionnement des polymorphismes en nombre de copies a donc été conçu, rendant possible l’utilisation de ces variations structurales dans le cadre d’étude de liaison génétique sur données familiales. Ensuite, une étude exploratoire a permis de caractériser les différents problèmes associés aux études génétiques utilisant des variations en nombre de copies rares sur des individus non reliés. Cette étude a été réalisée avec la collaboration du Wellcome Trust Centre for Human Genetics de l’University of Oxford. Par la suite, une comparaison de la performance des algorithmes de génotypage lors de leur utilisation avec une nouvelle micropuce d’ADN contenant une majorité de marqueurs rares a été réalisée. Finalement, un outil bio-informatique permettant de filtrer de façon efficace et rapide des données génétiques a été implémenté. Cet outil permet de générer des données de meilleure qualité, avec une meilleure reproductibilité des résultats, tout en diminuant les chances d’obtenir une fausse association. / Genetic studies, such as linkage and association studies, have contributed greatly to a better understanding
of the etiology of several diseases. Nonetheless, despite the tens of thousands of genetic
studies performed to date, a large part of the heritability of diseases and traits remains unexplained.
The last decade experienced unprecedented progress in genomics. For example, the use of
microarrays for high-density comparative genomic hybridization has demonstrated the existence
of large-scale copy number variations and polymorphisms. These are now detectable using DNA
microarray or high-throughput sequencing. In addition, high-throughput sequencing has shown
that the majority of variations in the exome are rare or unique to the individual. This has led to
the design of a new type of DNA microarray that is enriched for rare variants that can be quickly
and inexpensively genotyped in high throughput capacity.
In this context, the general objective of this thesis is the development of methodological approaches
and bioinformatics tools for the detection at the highest quality standards of copy number polymorphisms
and rare single nucleotide variations. It is expected that by doing so, more of the
missing heritability of complex traits can then be accounted for, contributing to the advancement
of knowledge of the etiology of diseases.
We have developed an algorithm for the partition of copy number polymorphisms, making it feasible
to use these structural changes in genetic linkage studies with family data. We have also conducted
an extensive study in collaboration with the Wellcome Trust Centre for Human Genetics of the
University of Oxford to characterize rare copy number definition metrics and their impact on study
results with unrelated individuals. We have conducted a thorough comparison of the performance
of genotyping algorithms when used with a new DNA microarray composed of a majority of very
rare genetic variants. Finally, we have developed a bioinformatics tool for the fast and efficient
processing of genetic data to increase quality, reproducibility of results and to reduce spurious
associations.
|
Page generated in 0.0525 seconds