• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 376
  • 47
  • 33
  • 20
  • 17
  • 10
  • 8
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 706
  • 706
  • 369
  • 189
  • 173
  • 106
  • 96
  • 94
  • 90
  • 82
  • 81
  • 78
  • 78
  • 76
  • 73
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
561

On genome rearrangement models = Sobre modelos de rearranjo de genomas / Sobre modelos de rearranjo de genomas

Feijão, Pedro Cipriano, 1975- 21 August 2018 (has links)
Orientador: João Meidanis / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-21T17:01:05Z (GMT). No. of bitstreams: 1 Feijao_PedroCipriano_D.pdf: 1943126 bytes, checksum: 4c547e8c568bbd0f2eb8235dfde05524 (MD5) Previous issue date: 2012 / Resumo: Rearranjo de genomas é o nome dado a eventos onde grandes blocos de DNA trocam de posição durante o processo evolutivo. Com a crescente disponibilidade de sequências completas de DNA, a análise desse tipo de eventos pode ser uma importante ferramenta para o entendimento da genômica evolutiva. Vários modelos matemáticos de rearranjo de genomas foram propostos ao longo dos últimos vinte anos. Nesta tese, desenvolvemos dois novos modelos. O primeiro foi proposto como uma definição alternativa ao conceito de distância de breakpoint. Essa distância é uma das mais simples medidas de rearranjo, mas ainda não há um consenso quanto à sua definição para o caso de genomas multi-cromossomais. Pevzner e Tesler deram uma definição em 2003 e Tannier et al. a definiram de forma diferente em 2008. Nesta tese, nós desenvolvemos uma outra alternativa, chamada de single-cut-or-join (SCJ). Nós mostramos que, no modelo SCJ, além da distância, vários problemas clássicos de rearranjo, como a mediana de rearranjo, genome halving e pequena parcimônia são fáceis, e apresentamos algoritmos polinomiais para eles. O segundo modelo que apresentamos é o formalismo algébrico por adjacências, uma extensão do formalismo algébrico proposto por Meidanis e Dias, que permite a modelagem de cromossomos lineares. Esta era a principal limitação do formalismo original, que só tratava de cromossomos circulares. Apresentamos algoritmos polinomiais para o cálculo da distância algébrica e também para encontrar cenários de rearranjo entre dois genomas. Também mostramos como calcular a distância algébrica através do grafo de adjacências, para facilitar a comparação com outras distâncias de rearranjo. Por fim, mostramos como modelar todas as operações clássicas de rearranjo de genomas utilizando o formalismo algébrico / Abstract: Genome rearrangements are events where large blocks of DNA exchange places during evolution. With the growing availability of whole genome data, the analysis of these events can be a very important and promising tool for understanding evolutionary genomics. Several mathematical models of genome rearrangement have been proposed in the last 20 years. In this thesis, we propose two new rearrangement models. The first was introduced as an alternative definition of the breakpoint distance. The breakpoint distance is one of the most straightforward genome comparison measures, but when it comes to defining it precisely for multichromosomal genomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, and Tannier et al. defined it differently in 2008. In this thesis we provide yet another alternative, calling it single-cut-or-join (SCJ). We show that several genome rearrangement problems, such as genome median, genome halving and small parsimony, become easy for SCJ, and provide polynomial time algorithms for them. The second model we introduce is the Adjacency Algebraic Theory, an extension of the Algebraic Formalism proposed by Meidanis and Dias that allows the modeling of linear chromosomes, the main limitation of the original formalism, which could deal with circular chromosomes only. We believe that the algebraic formalism is an interesting alternative for solving rearrangement problems, with a different perspective that could complement the more commonly used combinatorial graph-theoretic approach. We present polynomial time algorithms to compute the algebraic distance and find rearrangement scenarios between two genomes. We show how to compute the rearrangement distance from the adjacency graph, for an easier comparison with other rearrangement distances. Finally, we show how all classic rearrangement operations can be modeled using the algebraic theory / Doutorado / Ciência da Computação / Doutor em Ciência da Computação
562

Unraveling the Structure and Assessing the Quality of Protein Interaction Networks with Power Graph Analysis

Royer, Loic 12 December 2017 (has links) (PDF)
Molecular biology has entered an era of systematic and automated experimentation. High-throughput techniques have moved biology from small-scale experiments focused on specific genes and proteins to genome and proteome-wide screens. One result of this endeavor is the compilation of complex networks of interacting proteins. Molecular biologists hope to understand life's complex molecular machines by studying these networks. This thesis addresses tree open problems centered upon their analysis and quality assessment. First, we introduce power graph analysis as a novel approach to the representation and visualization of biological networks. Power graphs are a graph theoretic approach to lossless and compact representation of complex networks. It groups edges into cliques and bicliques, and nodes into a neighborhood hierarchy. We demonstrate power graph analysis on five examples, and show its advantages over traditional network representations. Moreover, we evaluate the algorithm performance on a benchmark, test the robustness of the algorithm to noise, and measure its empirical time complexity at O (e1.71)- sub-quadratic in the number of edges e. Second, we tackle the difficult and controversial problem of data quality in protein interaction networks. We propose a novel measure for accuracy and completeness of genome-wide protein interaction networks based on network compressibility. We validate this new measure by i) verifying the detrimental effect of false positives and false negatives, ii) showing that gold standard networks are highly compressible, iii) showing that authors' choice of confidence thresholds is consistent with high network compressibility, iv) presenting evidence that compressibility is correlated with co-expression, co-localization and shared function, v) showing that complete and accurate networks of complex systems in other domains exhibit similar levels of compressibility than current high quality interactomes. Third, we apply power graph analysis to networks derived from text-mining as well to gene expression microarray data. In particular, we present i) the network-based analysis of genome-wide expression profiles of the neuroectodermal conversion of mesenchymal stem cells. ii) the analysis of regulatory modules in a rare mitochondrial cytopathy: emph{Mitochondrial Encephalomyopathy, Lactic acidosis, and Stroke-like episodes} (MELAS), and iii) we investigate the biochemical causes behind the enhanced biocompatibility of tantalum compared with titanium.
563

Applications de l'apprentissage statistique à la biologie computationnelle / Applications of machine learning in computational biology

Pauwels, Edouard 14 November 2013 (has links)
Les biotechnologies sont arrivées au point ou la quantité d'information disponible permet de penser les objets biologiques comme des systèmes complexes. Dans ce contexte, les phénomènes qui émergent de ces systèmes sont intimement liés aux spécificités de leur organisation. Cela pose des problèmes computationnels et statistiques qui sont précisément l'objet d'étude de la communauté liée à l'apprentissage statistique. Cette thèse traite d'applications de méthodes d'apprentissage pour l'étude de phénomène biologique dans une perspective de système complexe. Ces méthodes sont appliquées dans le cadre de l'analyse d'interactions protéine-ligand et d'effets secondaires, du phenotypage de populations de cellules et du plan d'expérience pour des systèmes dynamiques non linéaires partiellement observés.D'importantes quantités de données sont désormais disponibles concernant les molécules mises sur le marché, tels que les profils d'interactions protéiques et d'effets secondaires. Cela pose le problème d'intégrer ces données et de trouver une forme de structure sous tendant ces observations à grandes échelles. Nous appliquons des méthodes récentes d'apprentissage non supervisé à l'analyse d'importants jeux de données sur des médicaments. Des exemples illustrent la pertinence de l'information extraite qui est ensuite validée dans un contexte de prédiction.Les variations de réponses à un traitement entre différents individus posent le problème de définir l'effet d'un stimulus à l'échelle d'une population d'individus. Par exemple, dans le contexte de la microscopie à haut débit, une population de cellules est exposée à différents stimuli. Les variations d'une cellule à l'autre rendent la comparaison de différents traitement non triviale. Un modèle génératif est proposé pour attaquer ce problème et ses propriétés sont étudiées sur la base de données expérimentales.A l'échelle moléculaire, des comportements complexes émergent de cascades d'interactions non linéaires entre différentes espèces moléculaires. Ces non linéarités engendrent des problèmes d'identifiabilité du système. Elles peuvent cependant être contournées par des plans expérimentaux spécifiques, un des champs de recherche de la biologie des systèmes. Une stratégie Bayésienne itérative de plan expérimental est proposée est des résultats numériques basés sur des simulations in silico d'un réseau biologique sont présentées. / Biotechnologies came to an era where the amount of information one has access to allows to think about biological objects as complex systems. In this context, the phenomena emerging from those systems are tightly linked to their organizational properties. This raises computational and statistical challenges which are precisely the focus of study of the machine learning community. This thesis is about applications of machine learning methods to study biological phenomena from a complex systems viewpoint. We apply machine learning methods in the context of protein-ligand interaction and side effect analysis, cell population phenotyping and experimental design for partially observed non linear dynamical systems.Large amount of data is available about marketed molecules, such as protein target interaction profiles and side effect profiles. This raises the issue of making sense of this data and finding structure and patterns that underlie these observations at a large scale. We apply recent unsupervised learning methods to the analysis of large datasets of marketed drugs. Examples show the relevance of extracted information which is further validated in a prediction context.The variability of the response to a treatment between different individuals poses the challenge of defining the effect of this stimulus at the level of a population of individuals. For example in the context High Content Screening, a population of cells is exposed to different stimuli. Between cell variability within a population renders the comparison of different treatments difficult. A generative model is proposed to overcome this issue and properties of the model are investigated based on experimental data.At the molecular scale, complex behaviour emerge from cascades of non linear interaction between molecular species. These non linearities leads to system identifiability issues. These can be overcome by specific experimental plan, one of the field of research in systems biology. A Bayesian iterative experimental design strategy is proposed and numerical results based on in silico biological network simulations are presented.
564

Intégrer les échelles moléculaires et cellulaires dans l'inférence de réseaux métaboliques : application aux xénobiotiques / Integrate molecular and cellular scales in the inference of metabolic networks : application to xenobiotics

Delannée, Victorien 08 November 2017 (has links)
Prédire, modéliser et analyser le métabolisme de xénobiotiques, substances étrangères à un organisme, à l'aide de méthodes informatiques est un challenge majeur mobilisant la communauté scientifique depuis de nombreuses années. Cette thèse vise à implémenter des méthodes informatiques multi-échelles pour prédire et analyser le métabolisme des xénobiotiques. Un premier axe de cette étude portait sur la construction et l'annotation automatique de novo de graphes métaboliques combinant fortes sensibilités et précisions. Ces graphes fournissent ainsi la prédiction du métabolisme de xénobiotiques chez l'homme, ainsi que la génotoxicité des molécules et atomes qui le composent. Puis, le travail s'est orienté sur l'implémentation d'un modèle mathématique dynamique modélisant des effets de compétition enzymatique à travers le développement d'une méthodologie permettant l'exploitation de données biologiques restreintes tout en limitant les biais inhérents. / Predicting, modelling and analysing the metabolism of xenobiotics, substances foreign to an organism, using computer methods, has been a major challenge for the scientific community for many years. This thesis aims to implement multiscale computing methods for predicting and analyzing the metabolism of xenobiotics. A first focus of this study was on the construction and automatic de novo annotation of metabolic graphs combining high sensitivity and precision. These graphs thus provide the prediction of the metabolism of xenobiotics in humans, as well as the genotoxicity of the molecules and atoms that make up xenobiotics. Then, the work focused on the implementation of a dynamic mathematical model modelling enzymatic competition effects through the development of a methodology allowing the exploitation of limited biological data while limiting inherent biases.
565

Investigation of multivariate prediction methods for the analysis of biomarker data

Hennerdal, Aron January 2006 (has links)
The paper describes predictive modelling of biomarker data stemming from patients suffering from multiple sclerosis. Improvements of multivariate analyses of the data are investigated with the goal of increasing the capability to assign samples to correct subgroups from the data alone. The effects of different preceding scalings of the data are investigated and combinations of multivariate modelling methods and variable selection methods are evaluated. Attempts at merging the predictive capabilities of the method combinations through voting-procedures are made. A technique for improving the result of PLS-modelling, called bagging, is evaluated. The best methods of multivariate analysis of the ones tried are found to be Partial least squares (PLS) and Support vector machines (SVM). It is concluded that the scaling have little effect on the prediction performance for most methods. The method combinations have interesting properties – the default variable selections of the multivariate methods are not always the best. Bagging improves performance, but at a high cost. No reasons for drastically changing the work flows of the biomarker data analysis are found, but slight improvements are possible. Further research is needed.
566

Increasing bioinformatics in third world countries : Studies of S.digitata and P.Polymyxa to further bioinformatics in east Africa / Bioinformatiska förbättringsåtgärder för u-länder : Studier av S.digitata och P.Polymyxa för att förbättra bioinformatiken i östra Afrika

Isak, Sylvin January 2016 (has links)
Despite an increase of biotechnical studies in third world countries, the bioinformatical side is largely lacking. In this paper we attempt to further the bioinformatical capabilities of east Af-rica. The project consisted of two teaching segments for east African doctorates, one as part of an academic workshop at ILRI, Kenya, and one in a small class at SLU, Sweden. The project also included the generation of two simple to use bioinformatical pipelines with the explicit aim to be reused by novice bioinformaticians from the very same region. The viability of the piplines were verified by generating transcriptional expression level differences for Paeni-bacillus polymyxa strain A26 and whole genome annotations for Setaria digitata. Both pipe-lines may have some merit for the collaborative effort between ILRI and SLU to annotate Eleusine coracana, a draught resilient crop, the annotation of which may save lives. The teaching material, source code for the pipelines and overall teaching impression have been included in this paper.
567

Automatic detection of protein degradation markers in mass spectrometry imaging

Herman, Stephanie January 2016 (has links)
Today we are collecting a large amount of tissue samples to store for future studies of different health conditions, in hopes that the focus in health care will shift from treatments to early detection and prevention, by the use of biomarkers. To make sure that the storing of tissue is done in a reliable way, where the molecular profile of the samples are preserved, we first need to characterise how these changes occur. In this thesis, data from mice brains were collected using MALDI imaging mass spectrometry (IMS) and an analysis pipeline for robust MALDI IMS data handling and evaluation was implemented. The finished pipeline contains two reduction algorithms, catching images with interesting intensity features, while taking the spatial information into account, along with a robust similarity measurement, for measuring the degree of co-localisation. It also includes a clustering algorithm built upon the similarity measurement and an amino acid mass comparer, iteratively generating combinations of amino acids for further mass comparisons with mass differences between cluster members. Availability: The source code is available at https://github.com/stephanieherman/thesis
568

Proteus : A new predictor for protean segments

Söderquist, Fredrik January 2015 (has links)
The discovery of intrinsically disordered proteins has led to a paradigm shift in protein science. Many disordered proteins have regions that can transform from a disordered state to an ordered. Those regions are called protean segments. Many intrinsically disordered proteins are involved in diseases, including Alzheimer's disease, Parkinson's disease and Down's syndrome, which makes them prime targets for medical research. As protean segments often are the functional part of the proteins, it is of great importance to identify those regions. This report presents Proteus, a new predictor for protean segments. The predictor uses Random Forest (a decision tree ensemble classifier) and is trained on features derived from amino acid sequence and conservation data. Proteus compares favourably to state of the art predictors and performs better than the competition on all four metrics: precision, recall, F1 and MCC. The report also looks at the differences between protean and non-protean regions and how they differ between the two datasets that were used to train the predictor.
569

Structural Feature of Prokaryotic Promoters and their Role in Gene Expression

Aditya Kumar, * January 2015 (has links) (PDF)
Transcription initiation is an important step in the process of gene regulation in prokaryotes. Promoters are stretches of DNA sequence that are present in the upstream region of transcription start sites (TSSs), where RNA polymerase and other transcription factors bind to initiate transcription. Recent advancement in sequencing technologies has resulted in huge amount of raw data in the form of whole genome sequences. This sequence data has to be annotated, in order to identify coding, non-coding and regulatory regions. Computational tools are useful for a quick and fairly reliable annotation of many genome sequences. Promoter prediction is an important step in genome annotation process which is needed, not only for the validation of predicted genes, but also for the identification of novel genes, especially those coding for non-coding RNA, which are missed by gene prediction programs. DNA sequence dependent structural properties such as DNA duplex stability, bendability and intrinsic curvature have been found to be associated with promoter regions in all domains of life. The work presented in this thesis focuses on the analysis of these structural features in the promoter regions of published prokaryotic transcriptome data. Furthermore, promoters were predicted using these structural features and their role in gene expression were studied. The organization of thesis is as follows. An overview of transcription machinery of prokaryotes, promoter architecture, available promoter prediction programs and sequence dependent structural features is presented in chapter 1. Chapter 2 describes the datasets and methods used in entire study. Structural features of promoters associated with primary and operon TSSs of H.pylori26695 genes and their orthologs (chapter 3) Promoter regions in genomic sequences from all domains of life show similar trends in their structural properties such as stability, bendability, curvature. This chapter dis-cuss the DNA duplex stability and bendability of various classes of promoter regions (based on the identification of different classes of transcription start sites, viz. primary, secondary, internal, operon TSSs etc, in transcriptome study) of Helicobacter pylori 26695 strain. It is found that the primary TSS and operon associated TSS promoters show significantly strong structural features in their promoter regions. DNA free energy based promoter prediction tool PromPredict has been used to annotate promoters of different classes and very high recall values (80%) are obtained for primary TSS. Orthologous genes from 10 different strains of H. pylori show conservation of structural properties in promoter regions as well as coding regions. PromPredict annotates promoters of orthologous genes with very high recall and precision values. DNA duplex stability of promoter region is conserved in the orthologous genes in 10 different strains of Helicobacter pylori genome. Sequence dependent structural features of promoters in prokaryotic transcriptome (chapter 4) Next-generation sequencing studies have revealed that a wide range of transcripts such as primary, internal, antisense and non-coding RNA, are present in the prokaryotic transcriptome and a large fraction of them are functionally involved in various regulatory activities. Identification of promoters associated with different transcripts is important for characterization of transcriptome. The current chapter discusses DNA sequence dependent structural properties like stability, bendability and curvature in the promoter region of six different prokaryotic transcriptomes (Helicobacter pylori, Anabaena, Synechocystis, Escherichia coli, Salmonella and Klebsiella). Using these structural features, promoters associated with different category of transcripts were predicted, which constitute an integral part of the transcriptome. Promoter annotation using structural features is fairly accurate and reliable as compared to motif-based approach since different category of transcripts show poor sequence conservation in the promoter region. Most importantly, it is universal in nature unlike sequence-based approach that is generally organism specific. Role of sequence dependent structural properties in gene expression in prokaryotes (chapter 5) DNA duplex stability, bendability and intrinsic curvature play crucial roles in the process of transcription initiation. Hence, in order to understand the relationship be-tween these structural features and gene expression, the relative differences in stability, bendability and curvature in the promoter regions of high and low expressed genes were studied. It is found that these features are relatively accentuated in the promoter regions associated with high gene expression as compared to low gene expression. Promoter regions associated with high gene expression are annotated more reliably using DNA structural features, compared to those for low gene expression. Sequence dependent structural properties in the promoter region of essential and non-essential genes of the prokaryotes (chapter 6) Essential genes are the minimal possible set of genes required for the survival of organism. These sets of genes can be identified by experiments such as single gene deletion and transposon mediated inactivation. Here, the analysis of DNA duplex stability and bendability in the promoter regions of essential and nonessential genes of prokaryotes is reported. It is found that the average free energy and bendability pro-files are distinct in the promoters regions of essential and nonessential genes. Whole genome promoter predictions using in-house program, PromPredict, for essential and nonessential genes has also been carried out. Chapter 7 present the summary and conclusion of the entire thesis work followed by future perspectives in the field. Optimization of PromPredict algorithm and updating PromBase with newly sequenced genomes (Appendix A) PromPredict is an in-house program, which is based on the relative stability of the DNA in flanking regions. It was found to perform well in predicting promoters across all organisms. In previous studies, it was observed that for organisms having low genomic GC content (<35%), promoter prediction resulted in low precision values, which indicates higher false positive rate. Threshold values of PromPredict algorithm were re-vised in order to optimize the algorithm with low false positive rate. PromBase is a comparative genomics database of microbial genomes. It stores different genomic and structural properties of the microbial genomes. It also displays the predictions obtained from PromPredict in a graphical as well as tabular format. Newly sequenced genomes were downloaded from NCBI and processed using in-house programs and added to the mysql database (back end of the PromBase). Stability profiles for predictions were also added for the RNA coding genes, earlier only profiles for protein coding genes were displayed. Comparative genomics of asymmetric gene orientation in prokaryotes (Appendix B) Transcription proceeds in 5’ to 3’ direction on the template strand, hence it provides directionality. Prokaryotic genomes show asymmetry in gene orientation on leading and lagging strands. The different phyla of prokaryotes were analyzed in terms of asymmetry in gene orientation. It is found that organisms belonging to a particular phyla known as “Firmicutes”, show high asymmetry in gene orientation, which are known to have different DNA polymerase systems for replication.
570

Fragments structuraux : comparaison, prédictibilité à partir de la séquence et application à l'identification de protéines de virus / Structural fragments : comparison, predictability from the sequence and application to the identification of viral structural proteins

Galiez, Clovis 08 December 2015 (has links)
Cette thèse propose de nouveaux outils pour la caractérisation locale de familles de protéines au niveau de la séquence et de la structure. Nous introduisons les fragments en contact (CF) comme des portions de structure conciliant localité spatiale et voisinage séquentiel. Nous montrons qu'ils bénéficient d'une meilleure prédictibilité de structure depuis la séquence que des fragments contigus ou encore que des paires de fragments qui ne seraient pas en contact en structure. Pour comparer structuralement ces CF, nous introduisons l'ASD, une nouvelle mesure de similarité ne nécessitant pas d'alignement préalable, respectant l'inégalité triangulaire tout en étant tolérante aux décalages de séquences et aux indels. Nous montrons notamment que l'ASD offre des meilleures performances que les scores classiques de comparaison de fragments sur des tâches concrètes de classification non-supervisée et de fouille structurale. Enfin, grâce à des techniques d'apprentissage automatique, nous mettrons en œuvre la détection de CF à partir de la séquence pour l'identification de protéines de virus avec l'outil VIRALpro développé au cours de cette thèse. / This thesis investigates the local characterization of protein families at both structural and sequential level. We introduce contact fragments (CF) as parts of protein structure that conciliate spatial locality together with sequential neighborhood. We show that the predictability of CF from the sequence is better than that of contiguous fragments and of structurally distant pairs of fragments. In order to structurally compare CF, we introduce ASD, a novel alignment-free dissimilarity measure that respects triangular inequality while being tolerant to sequence shifts and indels. We show that ASD outperforms classical scores for fragment comparison on practical experiments such that unsupervised classification and structural mining. Ultimately, by integrating the identification of CF from the sequence into a statistical machine learning framework, we developed VIRALpro, a tool that enables the detection of sequences of viral structural proteins.

Page generated in 0.1107 seconds