Spelling suggestions: "subject:"[een] BIOINFORMATICS"" "subject:"[enn] BIOINFORMATICS""
851 |
Discovering rare variants from populations to familiesIndap, Amit R. January 2013 (has links)
Thesis advisor: Gabor T. Marth / Partitioning an individual's phenotype into genetic and environmental components has been a major goal of genetics since the early 20th century. Formally, the proportion of phenotypic variance attributable to genetic variation in the population is known as heritability. Genome wide association studies have explained a modest percentage of variability of complex traits by genotyping common variants. Currently, there is great interest in what role rare variants play in explaining the missing heritability of complex traits. Advances of next generation sequencing and genomic enrichment technologies over the past several years have made it feasible to re-sequence large numbers of individuals, enabling the discovery of the full spectrum of genetic variation segregating in the human population, including rare variants. The four projects that comprise my dissertation all revolve around the discovery of rare variants from next generation sequencing datasets. In my first project, I analyzed data from the exon sequencing pilot of the 1000 Genomes Project, where I discovered variants from exome capture sequencing experiments in a worldwide sample of nearly 700 individuals. My results show that the allele frequency spectrum of the dataset has an excess of rare variants. My next project demonstrated the applicability of using whole-genome amplified DNA (WGA) in capture sequencing. WGA is a method that amplifies DNA from nanogram starting amounts of template. In two separate capture experiments I compared the concordance of call sets, both at the site and genotype level, of variant calls derived from WGA and genomic DNA. WGA derived calls have excellent concordance metrics, both at the site and genotypic level, suggesting that WGA DNA can be used in lieu of genomic DNA. The results of this study have ramifications for medical sequencing experiments, where DNA stocks are a finite quantity and re-collecting samples maybe too expensive or not possible. My third project kept its focus on capture sequencing, but in a different context. Here, I analyzed sequencing data from Mendelian exome study of non-sensorineural hearing loss (NSHL). A subset of 6 individuals (5 affected, 1 unaffected) from a family of European descent were whole exome sequenced in an attempt to uncover the causative mutation responsible for the loss of hearing phenotype in the family. Previous linkage analysis uncovered a linkage region on chr12, but no mutations in previous candidate genes were found, suggesting a novel mutation segregates in the family. Using a discrete filtering approach with a minor allele frequency cutoff, I uncovered a putative causative non-synonymous mutation in a gene that encodes a transmembrane protein. The variant perfectly segregates with the phenotype in the family and is enriched in frequency in an unrelated cohort of individuals. Finally, for my last project I implemented a variant calling method for family sequencing datasets, named Pgmsnp, which incorporates Mendelian relationships of family members using a Bayesian network inference algorithm. My method has similar detection sensitivities compared to other pedigree aware callers, and increases power of detection for non-founder individuals. / Thesis (PhD) — Boston College, 2013. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Biology.
|
852 |
Microbial Responses to Environmental Change in Canada’s High ArcticColby, Graham 28 May 2019 (has links)
The Arctic is undergoing a rapid environmental shift with increasing temperatures and precipitations expected to continue over the next century. Yet, little is known about how microbial communities and their underlying metabolic processes will respond to ongoing climatic changes. To address this question, we focused on Lake Hazen, NU, Canada. As the largest High Arctic lake by volume, it is a unique site to investigate microbial responses to environmental changes. Over the past decade, glacial coverage of the lake has declined. Increasing glacial runoff and sedimentation rates in the lake has resulted in differential influx of nutrients through spatial gradients. I used these spatial gradients to study how environmental changes might affect microbial community structure and functional capacity in Arctic lakes. I performed a metagenomic analysis of microbial communities from hydrological regimes representing high, low, and negligible influence of glacial runoff and compared the observed structure and function to the natural geochemical gradients. Genes and reconstructed genomes found in different abundances across these sites suggest that high-runoff regimes alter geochemical gradients, homogenise the microbial structure, and reduce genetic diversity. This work shows how a genome-centric metagenomics approach can be used to predict future microbial responses to a changing climate.
|
853 |
Framtidens biomarkörer : En prioritering av proteinerna i det humana plasmaproteometAntonsson, Elin, Eulau, William, Fitkin, Louise, Johansson, Jennifer, Levin, Fredrik, Lundqvist, Sara, Palm, Elin January 2019 (has links)
In this report, we rank possible protein biomarkers based on different criteria for use in Olink Proteomics’ protein panels. We started off with a list compiled through the Human Plasma Proteome Project (HPPP) and have in different ways used this to obtain the final results. To complete this task we compared the list with Olink’s and its competitors’ protein catalogs, identified diseases beyond Olink’s coverage and the proteins linked with these. We also created a scoring system used to fa- cilitate detection of good biomarkers. From this, we have concluded that Olink should focus on proteins that the competitors have in their catalogs and proteins that can be found in many pathways and are linked with many diseases. From each of the methods used, we have been able to identify a number of proteins that we recommend Olink to investigate further.
|
854 |
Investigating cis- and trans-acting elements involved in regulating fetal hemoglobin gene expression using high throughput genetic dataShaikho Elhaj Mohammed, Elmutaz 27 November 2018 (has links)
Sickle cell anemia is caused by a single mutation in the β-hemoglobin gene, HBB. The disease originated in Africa and affects millions of people worldwide. Sickle hemoglobin tetramers polymerize upon deoxygenation and lead to hemolysis and vaso-occlusion. Patients with high fetal hemoglobin (HbF) can have milder disease. The only FDA-approved drug is hydroxyurea that increases HbF. HbF modulates the disease by preventing the polymerization of sickle hemoglobin and reduces the pain episodes, anemia, and organ damage associated with the disease. There are five common haplotypes associated with the HbS gene and that are very loosely associated with disease severity and HbF. Understanding the genetic bases of HbF regulation is a key factor to identify potential drug targets to induce HbF for therapeutic purposes.
To fully understand the mechanism behind HbF regulation, developing a fast and accurate computational method for sickle cell haplotype classification is useful for examining the variability of HbF among sickle cell patients. Moreover, investigating the cis and trans-acting regulators of HbF gene expression to pinpoint the mechanism through which they regulate HbF is essential to develop a successful treatment. The availability of high-throughput genetic data provides an excellent opportunity to study HbF regulation in sickle cell patients and normal people comprehensively.
The work reported in this thesis describes a fast and accurate method for sickle cell HBB haplotype classification. I also examine the differential effect of cis and trans-acting HbF hemoglobin regulators on -globin gene expression using the GTEx database and identify BCL2L1 as a new potential trans-regulator of HbF.
|
855 |
Ensemble Learning Algorithms for the Analysis of Bioinformatics DataUnknown Date (has links)
Developments in advanced technologies, such as DNA microarrays, have generated
tremendous amounts of data available to researchers in the field of bioinformatics.
These state-of-the-art technologies present not only unprecedented opportunities to
study biological phenomena of interest, but significant challenges in terms of processing
the data. Furthermore, these datasets inherently exhibit a number of challenging
characteristics, such as class imbalance, high dimensionality, small dataset size, noisy
data, and complexity of data in terms of hard to distinguish decision boundaries
between classes within the data.
In recognition of the aforementioned challenges, this dissertation utilizes a variety
of machine-learning and data-mining techniques, such as ensemble classification
algorithms in conjunction with data sampling and feature selection techniques to alleviate
these problems, while improving the classification results of models built on
these datasets. However, in building classification models researchers and practitioners
encounter the challenge that there is not a single classifier that performs relatively
well in all cases. Thus, numerous classification approaches, such as ensemble learning
methods, have been developed to address this problem successfully in a majority of circumstances. Ensemble learning is a promising technique that generates multiple
classification models and then combines their decisions into a single final result.
Ensemble learning often performs better than single-base classifiers in performing
classification tasks.
This dissertation conducts thorough empirical research by implementing a series
of case studies to evaluate how ensemble learning techniques can be utilized to
enhance overall classification performance, as well as improve the generalization ability
of ensemble models. This dissertation investigates ensemble learning techniques
of the boosting, bagging, and random forest algorithms, and proposes a number of
modifications to the existing ensemble techniques in order to improve further the
classification results. This dissertation examines the effectiveness of ensemble learning
techniques on accounting for challenging characteristics of class imbalance and
difficult-to-learn class decision boundaries. Next, it looks into ensemble methods
that are relatively tolerant to class noise, and not only can account for the problem
of class noise, but improves classification performance. This dissertation also examines
the joint effects of data sampling along with ensemble techniques on whether
sampling techniques can further improve classification performance of built ensemble
models. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2016. / FAU Electronic Theses and Dissertations Collection
|
856 |
Unravelling higher order chromatin organisation through statistical analysisMoore, Benjamin Luke January 2016 (has links)
Recent technological advances underpinned by high throughput sequencing have given new insights into the three-dimensional structure of mammalian genomes. Chromatin conformation assays have been the critical development in this area, particularly the Hi-C method which ascertains genome-wide patterns of intra and inter-chromosomal contacts. However many open questions remain concerning the functional relevance of such higher order structure, the extent to which it varies, and how it relates to other features of the genomic and epigenomic landscape. Current knowledge of nuclear architecture describes a hierarchical organisation ranging from small loops between individual loci, to megabase-sized self-interacting topological domains (TADs), encompassed within large multimegabase chromosome compartments. In parallel with the discovery of these strata, the ENCODE project has generated vast amounts of data through ChIP-seq, RNA-seq and other assays applied to a wide variety of cell types, forming a comprehensive bioinformatics resource. In this work we combine Hi-C datasets describing physical genomic contacts with a large and diverse array of chromatin features derived at a much finer scale in the same mammalian cell types. These features include levels of bound transcription factors, histone modifications and expression data. These data are then integrated in a statistically rigorous way, through a predictive modelling framework from the machine learning field. These studies were extended, within a collaborative project, to encompass a dataset of matched Hi-C and expression data collected over a murine neural differentiation timecourse. We compare higher order chromatin organisation across a variety of human cell types and find pervasive conservation of chromatin organisation at multiple scales. We also identify structurally variable regions between cell types, that are rich in active enhancers and contain loci of known cell-type specific function. We show that broad aspects of higher order chromatin organisation, such as nuclear compartment domains, can be accurately predicted in a variety of human cell types, using models based upon underlying chromatin features. We dissect these quantitative models and find them to be generalisable to novel cell types, presumably reflecting fundamental biological rules linking compartments with key activating and repressive signals. These models describe the strong interconnectedness between locus-level patterns of local histone modifications and bound factors, on the order of hundreds or thousands of basepairs, with much broader compartmentalisation of large, multi-megabase chromosomal regions. Finally, boundary regions are investigated in terms of chromatin features and co-localisation with other known nuclear structures, such as association with the nuclear lamina. We find boundary complexity to vary between cell types and link TAD aggregations to previously described lamina-associated domains, as well as exploring the concept of meta-boundaries that span multiple levels of organisation. Together these analyses lend quantitative evidence to a model of higher order genome organisation that is largely stable between cell types, but can selectively vary locally, based on the activation or repression of key loci.
|
857 |
Abordagem computacional para detecção e análise de polimorfismos de nucleotídeo único em genomas bacterianos / Compuitational approach for detection and analysis of single-nucleotide polymorphisms SNPS)in bacterial genomesLima, Nicholas Costa Barroso 01 December 2011 (has links)
Made available in DSpace on 2015-03-04T18:57:46Z (GMT). No. of bitstreams: 1
NickDisser.pdf: 4259889 bytes, checksum: ee955de15c6345917110d7b6dc4b9765 (MD5)
Previous issue date: 2011-12-01 / Single nucleotide polymorphism, SNP, are common and may be responsible for di_erent phenotypes. The attention around this type of polymorphism was intensi_ed when it was discovered, through the sequencing project of the human
genome, that they were responsible for most of the genetic variability (90%) of complete human genomes compared. Thus, presenting a frequency of occurrence of one SNP per 1.000-2.000bp intervals. Recently, several studies have focused on the detection of this type of polymorphism in bacterial genomes for use in bacterial strain typing and phylogeny reconstruction, for example. In this work we developed a methodology for detecting and _ltering SNPs for bacterial genomes in order to analyze the prevalence of this type of polymorphism. The methodology involves the use of sequence alignment algorithms and _lters developed in PERL programming language for the detection and filtering of SNPs in order to obtain a reliable final set. The occurrence of SNPs fits the concept of Poisson probability distribution because they are events that occur in an interval, in this case, coding sequences. Within this context, we also calculated the expected frequency of SNPs for each case using a Poisson probability distribution. SNPs that exceeded the expected frequency may be subject to diferent selective pressure. The methodology was tested and evaluated for genomes in five genera of the family Enterobacteriaceae (Enterobacter, Escherichia, Salmonella, Shigella and Yersinia) and used in the case study of Klebsiella pneumoniae str. Kp13 genome, a bacteria causing nosocomial infection. The methodology has been able to detect and filter SNPs in diferent species of the family Enterobacteriaceae in accordance with data already published. For the four Klebsiella pneumoniae strains analyzed the occurrence of such polymorphism between the strains compared was observed. Thus, coding sequences with a number of SNPs higher than the expected frequency, obtained by the Poisson Probability Distribution, have been investigated to assess its possible association with the bacteria lifestyle. / Polimorfismos de Unico Nucleotídeo, SNP, são freqüentes e podem ser responsáveis por diferentes fenótipos. A atenção em torno deste tipo de polimorfismo se intensificou quando se descobriu, através do projeto de seqüenciamento do genoma humano, que eram responsáveis pela maior parte da variabilidade genética (90%) entre genomas humanos completos comparados. Com isso apresentando uma freqüência de ocorrência de 1 SNP em intervalos de 1.000-2.000pb. Recentemente vários estudos se concentraram na detecção desse tipo de polimorfismo em genomas bacterianos para uso em tipagem de estirpes e reconstrução de filogenia, por exemplo.
Neste trabalho foi desenvolvida uma metodologia de detecção e filtragem de SNPs para genomas bacterianos visando a análise da prevalência desse tipo de polimorfismo. A metodologia envolve o uso de algoritmos de alinhamento de seqüência e filtros desenvolvidos na linguagem de programação PERL para a detecção e filtragem de SNPs com a finalidade de se obter um conjunto final confiável.
A ocorrência de SNPs se encaixa no conceito de distribuição de probabilidade de Poisson por serem eventos que ocorrem em um intervalo, nesse caso, seqüência
codificantes. Dentro deste contexto, também foi calculada a freqüência esperada de SNPs para cada caso estudado usando uma distribuição de probabilidade de Poisson. Microrganismos que apresentem SNPs em uma freqüência acima da esperada podem estar sujeitos a pressões seletivas diferenciadas. A metodologia foi testada e avaliada para genomas em cinco gêneros da família Enterobacteriaceae (Enterobacter, Escherichia, Salmonella, Shigella e Yersinia) e utilizada no caso específico da bactéria Klebsiella pneumoniae str. Kp13, causadora de infecção nosocomial isolada no Brasil. A metodologia se provou capaz de detectar e filtrar SNPs em diferentes espécies da família Enterobacteriaceae em concordância com dados já publicados.
Para as 4 estirpes de Klebsiella pneumoniae foi observada a ocorrência desse tipo de polimorfismo entre as estirpes comparadas. Desta maneira, seqüências codificantes com um número de SNPs maior que a freqüência esperada, obtida com a Distribuição de Probabilidade de Poisson, foram investigadas para averiguação da sua possível associação com o estilo de vida bacteriano.
|
858 |
Análise empírica da utilização de técnicas de aprendizagem de máquina para classificação de sequências de proteínas de Metarhizium anisopliae / Empirical analysis of machine learning techniques for classification of protein sequences of Metarhizium AnisopliaeDias, Maria Fernanda Ribeiro 23 February 2015 (has links)
Submitted by Maria Cristina (library@lncc.br) on 2015-04-02T18:45:21Z
No. of bitstreams: 1
Dissertacao_MariaFernandaRibeiroDias_entregue.pdf: 3554535 bytes, checksum: 008e52d46f1049b4b131d2d5de745ce9 (MD5) / Approved for entry into archive by Maria Cristina (library@lncc.br) on 2015-04-02T18:45:37Z (GMT) No. of bitstreams: 1
Dissertacao_MariaFernandaRibeiroDias_entregue.pdf: 3554535 bytes, checksum: 008e52d46f1049b4b131d2d5de745ce9 (MD5) / Made available in DSpace on 2015-04-02T18:45:51Z (GMT). No. of bitstreams: 1
Dissertacao_MariaFernandaRibeiroDias_entregue.pdf: 3554535 bytes, checksum: 008e52d46f1049b4b131d2d5de745ce9 (MD5)
Previous issue date: 2015-02-23 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Capes) / Metarhizium anisopliae} is an entomopathogenic fungus used as biological insecticide possessing proteins linked to infection processes and unknown control mechanisms, many of which may be controlled by the ubiquitination system. In this work we used machine-learning techniques to predict {\it M. anisopliae~ isolate E6} ubiquitination-prone proteins.
One hundred fifty-one ubiquitinable peptides and one hundred fifty-one non-ubiquitinable peptides from {\it S. cereviseae} and {\it H. sapiens} were used as the training set.\!\footnote{(http://iclab.life.nctu.edu.tw/ubipred/)} These peptides were composed by 21 amino acids flanking a central lysine residue. Each of these peptides was represented as a numerical vector corresponding to the average value for their constituent amino-acids of each of the 31 physicochemical properties previously used in Ubipred. Hierarchical clustering of ubiquitinable proteins from this dataset showed evidence of correlation between several physicochemical properties, indicating redundancy in these features. Redundant features may cause model overfitting and increase computational cost. We used the classification algorithm Weighted-Voting (W-V), with cross validation, to disclose the minimal set of features best correlated with the probability of a given peptide being ubiquitinable.
WeightedVotingXvalidation performed similarly for vectors of 10 and 31 dimensions. Indeed, each of the 10 features in the minimal set correlates with most of the remaining 21 features, as confirmed by pairwise Pearson correlation test (coefficients ranging from -0.95 to -0.40 and 0.40 to 0.98). We then compared the performance of the algorithms W-V and Support Vector Machine (SVM) with radial kernel for vectors with 10 or 31 dimensions. In both cases, SVM outperformed W-V. The resulting of recall, precision and accuracy with 10 features were respectively, 67\%, 65\% and 66\% for SVM against 65\%, 55\% and 47\% for W-V and with 31 features were 71\%, 71\% and 71\% for SVM against 60\%, 55\% and 52\% for W-V.
Processing time in ASUS K43U, Process with AMD Dual Core C60 1.0 GHz, 2 x 512 KB of cache memory, 4 GB of RAM. Processing time was 8h and 22h, for SVM with 10 or 31 physicochemical features, respectively.
Considering the risk of overfitting the model due to features redundancy, we applied the SVM trained with 10 features to search for ubiquitination-prone proteins in the predicted proteome of {\it M. anisopliae~ isolate E6} (10,775 proteins). The 160,694, 21-amino acids long peptides flanking a central lysine residue extracted from these proteins were represented as a 10 dimensional vector for the training dataset. Forty-four of these proteins with no lysine were automatically excluded from this analysis. The classifier predicted 9,314 proteins as being ubiquitination-prone.
The small loss of performance of the SVM after dimensionality reduction is compensated by the significant reduction in processing time. The feature correlations suggest a lower risk of overfitting for the 10-dimensions model. / Metarhizium anisopliae é um fungo entomopatogênico utilizado como inseticida biológico. Este organismo possui proteínas ligadas ao processo de infecção cujos mecanismos de controle ainda são desconhecidos. Muitos destes mecanismos podem ser controlados pelo sistema de ubiquitinação. Neste trabalho, foram utilizados métodos de aprendizado de máquina para classificar sítios de ubiquitinação em proteínas preditas, a partir do genoma do fungo {\it \Manisopliae~} isolado E6.
Cento e cinquenta e um (151) peptídeos ubiquitinados e cento e cinquenta e um (151) peptídeos não-ubiquitinados de {\it S.cereviseae} e {\it H.sapiens} foram utilizados como conjunto de treinamento.\!\footnote{(http://iclab.life.nctu.edu.tw/ubipred/)} Cada um destes peptídeos foi composto por 21 aminoácidos com um resíduo de lisina central. Os peptídeos foram representados como vetores numéricos que correspondem ao valor médio das propriedades físico-químicas dos aminoácidos. O agrupamento hierárquico feito com os peptídeos (dados de treinamento), mostrou evidências de correlação entre várias propriedades físico-químicas, indicando alguma redundância nos atributos. Atributos redundantes podem causar {\it overfitting} do modelo e aumentar o custo computacional. Nós utilizamos o algoritmo {\it WeightedVotingXValidation} para descobrir o conjunto mínimo de atributos que me\-lhor re\-pre\-sen\-tam os peptídeos a serem classificados como ubiquitináveis ou
não.
O algoritmo {\it WeightedVotingXValidation} apresentou um comportamento semelhante para vetores de 10 e 31 dimensões. A correlação entre os atributos foi confirmada pelo teste de correlação de Pearson (coeficientes que variam de -0,95 a -0,40 e 0,40 a 0,98). Em seguida, comparamos o desempenho dos classificadores W-V e {\it Support Vector Machine} (SVM) com a função {\it kernel} radial para vetores com 10 ou 31 dimensões. Em ambos os casos, os resultados do SVM superou W-V. O resultado de {\it recall}, precisão e acurácia quando utilizamos 10 atributos foram, respectivamente, 67 \%, 65\% e 66\% para SVM, contra 65\%, 55\% e 47\% para W-V. Com o uso de 31 atributos, o resultado para os indicadores de desempenho foi de 71\%, 71\% e 71\% para SVM contra 60\%, 55\%, 52\% para W-V.
Os dados foram processados em um {\it notebook} ASUS K43U com AMD Dual Core C60 1.0GHz , 2 x 512 KB de memória cache, 4 GB de RAM. O tempo de processamento foi de 8h e 22h, para SVM com 10 e 31 atributos físico-químicos, respectivamente.
Considerando-se o risco de {\it overfitting} do modelo e a redundância dos atributos, nós aplicamos o algoritmo SVM treinado com 10 atributos físico-químicos para classificar possíveis proteínas propensas a ubiquitinação no proteoma de {\it \Manisopliae~} isolado E6 (10.775 proteínas). Os 160.694 peptídeos particionados em 21 aminoácidos contendo um resíduo de lisina na posição central, extraídos a partir de proteínas, foram representados por vetores de 10 dimensões e utilizados como conjunto independente. Das \seqliprot, 9.314 foram classificadas como sendo propensas a ubiquitinação e 1.417 como não-ubiquitináveis. Quarenta e quatro destas proteínas não foram analisadas por não possuírem o aminoácido lisina.
A pequena perda de desempenho com a redução de dimensão do espaço de dados é compensada pela redução significativa no tempo de processamento e pelo menor risco de {\it overfitting} utilizando vetores de 10 dimensões.
|
859 |
Análise transcriptômica e interatômica de diferentes espécies de Hevea inoculadas com Pseudocercospora ulei / Transcriptomic and interatomic analysis of different hevea species inoculated with Pseudocercospora uleiGuedes, Fernanda Alves de Freitas 18 May 2015 (has links)
Submitted by Maria Cristina (library@lncc.br) on 2015-10-06T18:17:45Z
No. of bitstreams: 1
Tese_Fernanda_Guedes.pdf: 14236831 bytes, checksum: f93e5c5d35139d93616f11dfb65f05f6 (MD5) / Approved for entry into archive by Maria Cristina (library@lncc.br) on 2015-10-06T18:18:06Z (GMT) No. of bitstreams: 1
Tese_Fernanda_Guedes.pdf: 14236831 bytes, checksum: f93e5c5d35139d93616f11dfb65f05f6 (MD5) / Made available in DSpace on 2015-10-06T18:18:22Z (GMT). No. of bitstreams: 1
Tese_Fernanda_Guedes.pdf: 14236831 bytes, checksum: f93e5c5d35139d93616f11dfb65f05f6 (MD5)
Previous issue date: 2015-05-18 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Plants are frequently attacked by herbivores and pathogens. Plant defense response against pathogens aims to block their proliferation and colonization, and also impart long-term resistence. Rubber tree (Hevea spp), which is an outstanding latex productor, is strongly affected by the fungus Pseudocercospora ulei, that causes South American Leaf Blight (SALB), a disease that affects young leaves and is the main threat to rubber tree plantations. Some rubber tree genotypes are resistant while others are susceptible to SALB. So, the main goals of this work were increase our knowledge about Hevea response to fungal attack and find genes potencially involved with resistance. These were achieved by analysis of leaf transcriptome and interactome of different Hevea genotypes. NGS Sequencing and leaf transcriptome assembly of three resistant genotypes (F4542 - H.benthamiana, PA31 - H.pauciflora, MDF180 - H.brasiliensis) and one susceptible (PB314 -H.brasiliensis) at inoculated and non-inoculated conditions generated 50.239 contigs. Approximately 75% (33.988) of rubber tree contigs had some functional annotation. Similarity percentual found among the inoculated transcriptomes of the four genotypes (37% of contigs) suggests that rubber tree resistance to SALB is caused mainly by regulation of gene expression. Analysis of transcripts levels in inoculated and non-inoculated conditions of resistant genotypes indicates changes induced by fungal pathogen in expression profile of F4542 and PA31 are smaller than MDF180, a result compatible with observed lesions. Differencial expression analysis showed 1.795 differentially expressed contigs. Despite only 22% (10.138) of Hevea contigs had orthologs with Arabidopsis thaliana sequences, the first Hevea interactome was constructed based on interologs. Rubber tree protein-protein network, formed by 5.382 proteins and 72.702 interactions, shows features commonly observed in other biological networks like power law degree distribution and small-world-ness, also presenting some particularities. Modularity was also observed in Hevea interactome and funcional modules identified were involved with different biological process like metabolism, transcription and translation, signal transduction, response to hormone and stress. Multiple-criteria selection based in protein function, presence in resistant genotypes, expression profile and topological features resulted in 30 sequences potencially involved to defense response and resistance of Hevea spp to pathogenic fungus P.ulei. These sequences are target to experimental validation and to create resistant rubber tree cultivars. / Plantas são alvos frequentes de herbívoros e patógenos. As diferentes estratégias vegetais de defesa contra patógenos visam impedir sua proliferação e a colonização, além de conferir resistência de longa duração. A seringueira (Hevea spp), que se destaca por sua produção de borracha natural, é bastante afetada pelo fungo Pseudocercospora ulei, causador do Mal-das-folhas, uma doença que atinge folhas jovens e representa a principal ameaça a plantações de seringueira. Existem alguns genótipos de seringueira resistentes e outros suscetíveis ao Mal-das-folhas. Assim, os objetivos principais deste trabalho são ampliar o conhecimento sobre a resposta da seringueira ao ataque do fungo causador do Mal-das-folhas e buscar genes potencialmente envolvidos com sua resistência através da análise do transcriptoma e do interatoma de folhas de diferentes genótipos de Hevea spp. O sequenciamento NGS e montagem do transcriptoma da folha de três genótipos resistentes (F4542 - H.benthamiana, PA31 - H.pauciflora, MDF180 - H.brasiliensis) e um suscetível (PB314 - H.brasiliensis), nas condições inoculado e não-inoculado com Pseudocercospora ulei, geraram um total de 50.239 contigs. Cerca de 75% (33.988) dos contigs de seringueira tiveram alguma anotação funcional. O percentual de similaridade encontrado na composição do transcriptoma inoculado dos quatro genótipos (37% dos contigs) sugere que a resistência da seringueira ao Mal-das-Folhas é causada principalmente pela regulação dos níveis de expressão gênica. A análise dos níveis dos transcritos nas condições inoculada e não-inoculada dos genótipos resistentes indicam também que as mudanças induzidas pelo patógeno no perfil de expressão do transcriptoma de F4542 e PA31 são menos pronunciadas do que em MDF180, um resultado compatível com as lesões observadas. A análise de expressão diferencial revelou um total de 1.795 contigs diferencialmente expressos nos genótipos estudados. Apesar de apenas 22% (10.138) dos contigs de seringueira terem ortólogo com sequências de Arabidopsis thaliana, foi construído o primeiro interatoma de seringueira a partir dos interólogos desta espécie modelo. A rede proteína-proteína de seringueira, formada por 5.382 proteínas e 72.702 interações entre elas, apresenta características comumente encontradas em outras redes biológicas como a distribuição de conectividade segundo lei de potência e rede do tipo ``mundo pequeno'', apresentando também algumas particularidades. Foi observada a modularidade no interatoma de Hevea, com a detecção de módulos funcionais envolvidos em diferentes processos biológicos como metabolismo, transcrição e tradução, transdução de sinais, respostas a hormônios e estresse. A associação de critérios como a função da proteína, sua presença nos genótipos resistentes, seu perfil de expressão e posicionamento nas redes de seringueira permitiu a seleção de 30 sequências potencialmente envolvidas com a resposta de defesa e resistência da seringueira ao patógeno P.ulei, que constituem alvos de comprovação experimental e melhoramento para criar cultivares resistentes.
|
860 |
Análise do transcritoma do mexilhão marrom (Perna perna) sob contaminação por antraceno / The transcriptome of the brown mussel Perna perna when exposed to anthraceneMonteiro, Jhonatas Sirino 30 October 2017 (has links)
O mexilhão marrom Perna perna (Linnaeus, 1758) auxilia no monitoramento de compostos químicos em ecossistemas marinhos. No entanto, os mecanismos moleculares de detoxificação e resposta ao estresse são desconhecidos. Elucidar esses mecanismos é crucial para entender os efeitos tóxicos dos poluentes químicos e desenvolver biomarcadores para avaliar a qualidade ambiental dos ecossistemas marinhos. No presente estudo, indivíduos da espécie P. perna foram expostos a antraceno (ANT) e os RNAs mensageiros (mRNA) das brânquias foram sequenciados com a plataforma Illumina. A análise química do tecido mole dos animais identificou concentrações de ANT 268 a 715 vezes mais alta no grupo exposto comparado ao grupo controle, demonstrando que a exposição foi realizada com sucesso. O sequenciamento do transcritoma do P. perna gerou 273.152.390 pares de reads, resultando na montagem de 231.728 contigs com tamanho médio de 720 pb e N50 de 1.083 pb, os quais 66.563 contigs (28,7%) pode ser anotado utilizando banco de dados como GenBank, Pfam, Gene Ontology e KEGG. Os resultados obtidos a partir da anotação funcional sugerem que as brânquias tenham papel na biotransformação de xenobióticos, resposta antioxidante, sinalização, resposta imunológica inata, e osmorregulação. Foi possível identificar genes de biotransformação de fase I, II e III, incluindo CYPs e GSTs. Transcritos similares a CYPs e GSTs estavam sendo expressos no grupo exposto, porém nenhum deles foram classificados como diferencialmente expressos. Contudo, muitos genes hipotéticos foram diferencialmente expressos, o que sugere que P. perna utilize mecanismos desconhecidos de biotransformação para lidar com a contaminação de ANT. Genes de sistema imune inato foram regulados tanto positivamente quanto negativamente, assim como observado para Perna viridis exposto a benzo(a)pireno, sugerindo que ANT promove alterações da capacidade de resposta do sistema imune inato do P. perna. / The brown mussel Perna perna (Linnaeus, 1758) helps the monitoring of chemical compounds in marine ecosystems. However its molecular mechanisms of detoxification and stress response remain unclear. Elucidating these mechanisms is crucial to understand the toxic effects of chemical pollutants and to develop biomarkers to assess marine ecosystems. In this study, P. perna individuals were exposed to anthracene (ANT) and its mRNA complement was sampled sequenced with Illumina technology. Chemical analysis of the soft tissue identified ANT concentrations 268 - 715 fold higher in the exposed group compared to controls, demonstrating that the exposure procedure was successfully accomplished. Transcriptome sequencing of P. perna generated 273.152.390 paired reads that were assembled in 231.728 contigs of average length 720 bp and N50 1.083 bp , which 66.563 contigs (28,7%) could be annotated using GenBank genes, Pfam domains, Gene Ontology (GO) terms and KEGG pathways. The results obtained from functional annotation suggest gills play a role in xenobiotics biotransformation, antioxidant response, signal transduction, innate immune response, and osmoregulation. It was possible to identify transcripts similar to genes related with biotransformation reactions of phases I, II and III, including CYPs and GSTs. Transcripts similar to CYPs and GSTs isoforms were highly expressed in the group exposed to ANT, however no CYP, GST, or even other genes related with biotransformation reactions were classified as differentially expressed. On the other hand, several hypothetical genes were differentially expressed, which suggests that P. perna uses unknown mechanisms of biotransformation to deal with ANT stress contamination. Immune related-genes were both up and down-regulated, as was also observed for Perna viridis exposed to benzo(a)pyrene, suggesting that ANT promotes alteration in the immune response of P. perna.
|
Page generated in 0.0869 seconds