Global ETD Search

21	LDA-based dimensionality reduction and domain adaptation with application to DNA sequence classification Mungre, Surbhi January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / Several computational biology and bioinformatics problems involve DNA sequence classification using supervised machine learning algorithms. The performance of these algorithms is largely dependent on the availability of labeled data and the approach used to represent DNA sequences as {\it feature vectors}. For many organisms, the labeled DNA data is scarce, while the unlabeled data is easily available. However, for a small number of well-studied model organisms, large amounts of labeled data are available. This calls for {\it domain adaptation} approaches, which can transfer knowledge from a {\it source} domain, for which labeled data is available, to a {\it target} domain, for which large amounts of unlabeled data are available. Intuitively, one approach to domain adaptation can be obtained by extracting and representing the features that the source domain and the target domain sequences share. \emph{Latent Dirichlet Allocation} (LDA) is an unsupervised dimensionality reduction technique that has been successfully used to generate features for sequence data such as text. In this work, we explore the use of LDA for generating predictive DNA sequence features, that can be used in both supervised and domain adaptation frameworks. More precisely, we propose two dimensionality reduction approaches, LDA Words (LDAW) and LDA Distribution (LDAD) for DNA sequences. LDA is a probabilistic model, which is generative in nature, and is used to model collections of discrete data such as document collections. For our problem, a sequence is considered to be a ``document" and k-mers obtained from a sequence are ``document words". We use LDA to model our sequence collection. Given the LDA model, each document can be represented as a distribution over topics (where a topic can be seen as a distribution over k-mers). In the LDAW method, we use the top k-mers in each topic as our features (i.e., k-mers with the highest probability); while in the LDAD method, we use the topic distribution to represent a document as a feature vector. We study LDA-based dimensionality reduction approaches for both supervised DNA sequence classification, as well as domain adaptation approaches. We apply the proposed approaches on the splice site predication problem, which is an important DNA sequence classification problem in the context of genome annotation. In the supervised learning framework, we study the effectiveness of LDAW and LDAD methods by comparing them with a traditional dimensionality reduction technique based on the information gain criterion. In the domain adaptation framework, we study the effect of increasing the evolutionary distances between the source and target organisms, and the effect of using different weights when combining labeled data from the source domain and with labeled data from the target domain. Experimental results show that LDA-based features can be successfully used to perform dimensionality reduction and domain adaptation for DNA sequence classification problems. Domain Adaptation Splice Site Prediction Latent Dirichlet Allocation DNA Sequence Classification Dimentionality Reduction Computer Science (0984)
22	Habitat fragmentation, patterns of diversity and phylogeography of small mammal species in the Albertine rift Kaleme, Prince K. 12 1900 (has links) Thesis (PhD) - Stellenbosch University, 2011. / ENGLISH ABSTRACT: The Albertine Rift is characterized by a heterogeneous landscape which may, at least in part, drive the exceptional biodiversity found across all taxonomic levels. Notwithstanding the biodiversity and beauty of the region, large areas are poorly understood because of political instability with the inaccessibility of most of the region as a contributing factor. The majority of studies in the Albertine Rift have focussed on charismatic mega fauna, with other taxa receiving less attention. One of the taxonomically and numerically more abundant small mammal genera is the genus Praomys, an African endemic with a wide distribution range spanning most of west, central and east Africa. Four species are typically recognized from the Albertine Rift namely P. degraaffi, P. jacksoni, P. misonnei and P. verschureni. In this study I used a combination of DNA sequence data (mitochondrial control region, mitochondrial cytochrome b and 7th intron of the nuclear ß-fibrinogen gene) as well as morphometric data (traditional and geometric) to investigate the systematics of the Praomys taxa occurring in the Albertine Rift. To allow meaningful DNA assessments and in an attempt to identify potential drivers of diversifications, other Praomys species were also included from public sequence data bases for comparisons. The main focus was on P. jacksoni (the numerically most abundant taxon; also, up to 2005, all Praomys in the Albertine Rift were mostly collected as “jacksoni”) and P. degraaffi (an Albertine Rift endemic). A surprising finding was the presence of P. mutoni; this represents a range extension for this species into the Albertine Rift. Distinct evolutionary lineages were found in both P. jacksoni (confirmed by sequence data as well as morphometrics) as well as P. degraaffi (based only on sequence data; insufficient samples precluded a full morphometric investigation). These lineages (in both P. jacksoni as well as P. degraaffi) appear to be separated along a north – south gradient; however, further investigations should confirm this. To further investigate the genetic patterns at local scales across the Albertine Rift, as well as introgression between species as revealed by sequence data, a species-specific microsatellite library was developed for P. jacksoni. Twelve polymorphic markers were identified of which nine also amplified in P. degraaffi. Introgression was confirmed between the two focal species with almost 20% of the individuals analysed being jacksoni-degraaffi hybrids. This is perhaps not so surprising given that there is considerable overlap in their ranges (between ~ 1500 m a.s.l. to 2450 m a.s.l.) as well as the relative ages of the species (the divergence time between these two species were estimated at 3.8 Mya). The presence of distinct lineages within each of these species was confirmed by microsatellite analyses (these lineages diverged approcimately at same time at ca. 3.4 Mya). As suggested by sequence and morphometric data, these lineages had a largely north – south distribution but with considerable overlap in the central Albertine Rift in the vicinity of Lake Kivu. The phylogeographic patterns obtained for both focal species were not consistent with the physical barriers such as the rivers, lakes or mountains, nor were they exclusively associated with Pleistocene phenomena such as the change of the course of the rivers or uplift; rather, the lineages predate the Pleistocene and fall firmly in the Pliocene (>3 Mya). Biogeographically, the north - south location of lineages with a centrally - located contact zone could be a result of parapatric speciation due to habitat fragmentation or past climate change, followed by secondary contact. Barcoding using genetic information provides a useful tool to identify unknown taxa, cryptic diversity or where different life stages are difficult to identify. From an invasion biology perspective, it allows for the rapid identification of problem taxa against a known data base. By adopting such a barcoding approach (senso lato), the presence of three invasive rodents was confirmed in the Democratic Republic of the Congo (DRC); these are Rattus rattus (black rat), R. norvegicus (Norway rat) and Mus musculus domesticus (house mouse). A comparison with global data available for these species revealed two possible introduction pathways namely via the shipping port at Kinshasa/Matadi (with strong links to Europe) and via the slave trade routes in the east (strong links to the Arab world and the east). Of these three taxa, only R. rattus is currently documented from the DRC although the others have received mention in the gray literature. These findings draw attention to the lack of any official policy regarding biosecurity in the DRC, and argue for the development of strict control measures to prevent further introductions. / AFRIKAANSE OPSOMMING: Die Albertine Rift word gekenmerk deur 'n heterogene landskap wat kan, ten minste gedeeltelik, die uitsonderlike biodiversiteit wat oor al die taksonomiese vlakke gevind word teweeg bring. Nieteenstaande die biodiversiteit en die skoonheid van die streek, is groot gebiede onbekend as gevolg van politieke onstabiliteit met die ontoeganklikheid van meeste van die streek as 'n bydraende faktor. Die meerderheid van studies in die Albertine Rift het gefokus op die charismatiese mega fauna, met ander taxa wat minder aandag ontvang. Een van die taksonomies en numeries meer volop klein soogdier genera is die genus Praomys, 'n Afrika endemiese groep met 'n wye verspreiding wat strek oor die grootste deel van van wes-, sentraal en oos-Afrika. Vier spesies word tipies erken van die Albertine Rift naamlik P. degraaffi, P. jacksoni, P. misonnei en P. verschureni. In hierdie studie het ek 'n kombinasie van DNA volgorde data (mitochondriale beheer streek, mitochondriale sitochroom b en 7de intron van die kern ß-fibrinogeen geen) sowel as morfometriese data (tradisioneel en meetkundig) gebruik om die sistematiek van die Praomys taxa te ondersoek. Om betekenisvolle DNA aanslae toe te laat en in 'n poging om potensiële aandrywers van diversiteit te identifiseer, is ander Praomys spesies van openbare volgorde data basisse vir vergelykings ingesluit. Die hooffokus is op P. jacksoni (die numeries volopste takson, ook, tot en met 2005 is alle Praomys in die Albertine Rift meestal as "jacksoni" versamel) en P. degraaffi ('n Albertine Rift endemiese spesie). 'n Verrassende bevinding was die teenwoordigheid van P. mutoni, dit verteenwoordig' n verspreidingsuitbreiding vir hierdie spesie in die Albertine Rift. Bepaalde evolusionêre ontwikkelingslyne was in beide P. jacksoni (bevestig deur die volgorde data sowel as morfometrie) sowel as P. degraaffi (wat slegs gebaseer is op die volgorde data, onvoldoende monsters verhinder 'n volledige morfometriese ondersoek). Hierdie lyne (in beide P. jacksoni sowel as P. degraaffi) word geskei langs 'n noord - suid gradiënt, maar verdere ondersoeke moet dit bevestig. Om die genetiese patrone op plaaslike skaal oor die Albertina Rift verder te ondersoek, sowel as introgressie tussen spesies soos geopenbaar deur die volgorde data, is 'n spesie-spesifieke mikrosatelliet biblioteek ontwikkel vir P. jacksoni. Twaalf polimorfiese merkers is geïdentifiseer waarvan nege ook amplifiseer in P. degraaffi. Introgressie is bevestig tussen die twee brandpunt spesies met byna 20% van die individue wat ontleed is as jacksoni-degraaffi basters. Dit is miskien nie so verbasend gegee dat daar aansienlike oorvleueling is in hul gebiede (tussen ~ 1500 m bo seespieel tot 2450 m bo seespieel), sowel as die relatiewe ouderdomme van die spesies (die divergensie tussen hierdie twee spesies is geskat op 3,8 Mya). Die teenwoordigheid van verskillende lyne in elk van hierdie spesies is bevestig deur mikrosatelliet ontleding (hierdie lyne het gedivergeer ongeveer 3,4 Mya). Soos voorgestel deur die DNA volgorde en morfometriese data, het hierdie lyne 'n grootliks noorde – suid verspreiding, maar met 'n aansienlike oorvleueling in die sentrale Albertine Rift in die omgewing van die Kivumeer. Die filogeografiese patrone wat vir beide die brandpunt spesies gevind is nie in ooreenstemming met die fisiese struikelblokke soos die riviere, mere of berge nie, en hou ook nie uitsluitlik verband met die Pleistoseen verskynsels soos die verandering van die loop van die riviere nie; die afstammelinge is eerder veel ouer as die Pleistoseen en val binne die Plioseen (> 3 Mya). Biogeografies, die noorde – suid plasing van die lyne met 'n sentraal geleë kontak sone kan die gevolg wees van parapatriese spesiasie te danke aan habitatfragmentasie as gevolg van verandering in die klimaat, gevolg deur 'n sekondêre kontak. Strepieskodering met behulp van genetiese inligting verskaf 'n nuttige instrument om onbekend taxa, kriptiese diversiteit of waar verskillende lewensfases moeilik is om te identifiseer, te identifiseer. Vanuit 'n indringerbiologie perspektief, maak hierdie benadering dit moontlik om vinnige identifikasies van die probleem taksa teen' n bekende data basis te bekom. Deur gebruik te maak van so 'n strepieskoderingsbenadering (senso lato), is die teenwoordigheid van drie indringende knaagdiere bevestig in die Demokratiese Republiek van die Kongo (DRK), naamlik Rattus rattus (swart rot), R. norvegicus (Noorweë rot) en Mus musculus domesticus (huis muis). 'n Vergelyking met die globale data wat beskikbaar is vir hierdie spesies het aan die lig gebring dat twee moontlike betree-roetes bestaan, naamlik via die skeepshawe by Kinshasa / Matadi (met sterk skakels na Europa), en via die slawehandel roetes in die ooste (sterk skakels na die Arabiese wêreld en die ooste) . Van hierdie drie taxa, is tans slegs R. rattus van die Demokratiese Republiek van die Kongo gedokumenteer, hoewel die ander melding ontvang in die grys literatuur. Hierdie bevindinge vestig die aandag op die gebrek aan enige amptelike beleid ten opsigte van biosekuriteit in die Demokratiese Republiek van die Kongo, en argumenteer vir die ontwikkeling van streng beheermaatreëls om verdere indringerspesies te voorkom. DNA sequence Phylogenetic analyses Traditional morphometrics Geometric morphometrics Mammals Small mammals Albertine rift Botany & Zoology
23	Algorithmes et structures de données efficaces pour l’indexation de séquences d’ADN / Efficient algorithms and data structures for indexing DNA sequence data Salikhov, Kamil 17 November 2017 (has links) Les volumes des données générées par les technologies de séquençage haut débit augmentent exponentiellement ce dernier temps. Le stockage, le traitement et le transfertdeviennent des défis de plus en plus sérieux. Pour les affronter, les scientifiques doivent élaborer des approches et des algorithmes de plus en plus efficaces.Dans cette thèse, nous présentons des structures de données efficaces etdes algorithmes pour des problèmes de recherche approchée de chaînes de caractères, d'assemblagedu génome, de compression de séquences d’ADN et de classificationmétagénomique de lectures d’ADN.Le problème de recherche approchée a été bien étudié, avec un grandnombre de travaux publiés. Dans ledomaine de bioinformatique, le problème d’alignement de séquences peut être considéré comme unproblème de recherche approchée de chaînes de caractères. Dans notre travail, nousétudions une stratégie de recherche basée sur une structure d'indexation ditebidirectionnelle. D’abord, nous définissons un formalisme des schémas de recherche pour travailleravec les stratégies de recherche de ce type, ensuite nous fixons une mesure probabiliste del’efficacité de schémas de recherche et démontrons quelques propriétés combinatoires de schémasde recherche efficaces. Finalement, nous présentons des calculs expérimentaux quivalident la supériorité de nos stratégies. L’assemblage du génome est un des problèmes clefs en bioinformatique.Dans cette thèse, nous présentons une structure de données — filtre de Bloom en Cascade— qui améliore le filtre de Bloom standard et peut être utilisé pour larésolution de certains problèmes, y compris pour l’assemblage du génome. Nousdémontrons ensuite des résultats analytiques et expérimentaux sur les propriétés du filtre deBloom en Cascade. Nous présentons également comment le filtre de Bloom en Cascade peut être appliqué au problèmede compression de séquences d’ADN.Un autre problème que nous étudions dans cette thèse est la classificationmétagénomique de lectures d’ADN. Nous présentons une approche basée sur la transforméede Burrows-Wheeler pour la recherche efficace et rapide de k-mers (mots de longueur k).Cette étude est centrée sur les structures des données qui améliorent lavitesse et la consommation de mémoire par rapport à l'index classique de Burrows-Wheeler, dans le cadre de notre application / Amounts of data generated by Next Generation Sequencing technologies increase exponentially in recent years. Storing, processing and transferring this data become more and more challenging tasks. To be able to cope with them, data scientists should develop more and more efficient approaches and techniques.In this thesis we present efficient data structures and algorithmic methods for the problems of approximate string matching, genome assembly, read compression and taxonomy based metagenomic classification.Approximate string matching is an extensively studied problem with countless number of published papers, both theoretical and practical. In bioinformatics, read mapping problem can be regarded as approximate string matching. Here we study string matching strategies based on bidirectional indices. We define a framework, called search schemes, to work with search strategies of this type, then provide a probabilistic measure for the efficiency of search schemes, prove several combinatorial properties of efficient search schemes and provide experimental computations supporting the superiority of our strategies.Genome assembly is one of the basic problems of bioinformatics. Here we present Cascading Bloom filter data structure, that improves standard Bloom filter and can be applied to several problems like genome assembly. We provide theoretical and experimental results proving properties of Cascading Bloom filter. We also show how Cascading Bloom filter can be used for solving another important problem of read compression.Another problem studied in this thesis is metagenomic classification. We present a BWT-based approach that improves the BWT-index for quick and memory-efficient k-mer search. We mainly focus on data structures that improve speed and memory usage of classical BWT-index for our application Structures de données Algorithmes L’indexation de séquences d’ADN Data structures Algorithms DNA sequence data
24	Caracterização fenotípica e molecular de isolados do gênero Nocardia e proposição de algoritimo de identificação / Phenotypic and molecular characterization of Nocardia isolates and proposition of identification algorithm Silva, Edna Cleide Muricy da 15 May 2015 (has links) O gênero Nocardia é composto por bactérias gram-positivas e filamentosas, que normalmente só causam doença em indivíduos imunocomprometidos. No entanto, certo número de infecções por nocardia têm sido relatados em pacientes imunocompetentes. Nos últimos anos, observou-se um aumento da frequência de infecções por Nocardia spp. e, com o aumento do número de espécies descritas, a identificação correta tem sido de difícil obtenção mas de grande importância para a aplicação do tratamento correto e elucidação epidemiológica. O objetivo deste estudo foi caracterizar, por métodos fenotípicos e moleculares, 72 isolados de Nocardia spp. de interesse médico, avaliando as metodologias para elaborar um algoritmo de identificação. Os isolados foram provenientes da Micoteca do Instituto de Medicina Tropical de São Paulo e da rotina do Núcleo de Tuberculose e Micobacterioses do Instituto Adolfo Lutz. Os isolados foram identificados por testes fenotípicos, identificação molecular por análise de restrição (PRA-hsp65), sequenciamento dos genes hsp65 e 16S rRNA e MALDI-TOF MS (Matrix-associated laser desorption time of flight mass spectrometry). O perfil de suscetibilidade foi analisado pelo método de Concentração Inibitória Mínima (MIC) e disco difusão (DD), com os fármacos: amicacina, ciprofloxacina, minociclina, tobramicina, amoxacilina+ácido clavulânico, imipenem e sulfametoxazol+trimetoprim. Os resultados revelaram que a identificação fenotípica foi insuficiente para definir as espécies. Apenas 24 (33,4%) isolados tiveram identificação fenotípica concordante com o sequenciamento do gene 16S rRNA. Na analise feita pela técnica de PRA-hsp65 foram observados 20 (27,8%) N. brasiliensis, seis (8,3%) isolados de outras espécies de nocardias e 38 (52,8%) foram considerados novos padrões (NP). Foi detectado um isolado misto e em cinco isolados não foi obtido produto de amplificação. O sequenciamento do gene hsp65 proporcionou a identificação de 51 isolados como Nocardia, 14 foram identificados como pertencentes a outros gêneros, dos quais, um apresentou mistura de nocardia e micobactéria, sendo identificado como Mycobacterium abscessus no gene hsp65 (análise in silico) e N. otitidiscaviarum no gene 16S rRNA. Sete isolados não foram sequenciados devido à ausência de amplificação do fragmento. Os isolados analisados através do sequenciamento do gene 16S rRNA, foram identificados como Nocardia (57-79,2%), Gordonia (7-9,7%), Rhodoccocus (3-4,2%), Tsukamurella (2-2,8%), Mycobacterium (2-2,8%) e Streptomyces (1-1,3%). Para a análise de MALDI-TOF MS foi observado que, dos 72 isolados estudados, 49 foram identificados como Nocardia, 11 como pertencentes a outros gêneros e 12 amostras não puderam ser identificadas devido aos valores de leitura não serem adequados para análise. Pela primeira vez o corante resazurina foi utilizado para leitura de MIC de Nocardia sp. Entre os fármacos testados através do MIC, os que apresentaram maior sensibilidade foram amicacina (100%) e tobramicina (84%). As maiores resistências foram encontradas com os fármacos sulfametoxazol+trimetoprim (76%) e imipenem (54%). Devido à ausência de critérios interpretativos de disco difusão para o gênero Nocardia, foi elaborado um critério para o presente estudo. Os resultados obtidos no teste de DD mostraram 100% de sensibilidade para os fármacos amicacina, minociclina e sulfametoxazol+trimetroprim. Os isolados apresentaram a maior porcentagem de resistência ao fármaco ciprofloxacina (64%). Comparando os resultados de DD com os obtidos no MIC, observamos que os fármacos ciprofloxacina, imipenem e sulfametoxazol+trimetoprim apresentaram porcentagem de discordância acima de 20%. O fármaco sulfametoxazol+trimetoprim teve a maior discordância (>75%), com elevada porcentagem de isolados resistentes na MIC, mas baixa porcentagem de resistência em DD. O único fármaco com 100% de concordância entre os resultados foi amicacina. Foi elaborado um algoritmo de identificação que utiliza técnicas fenotípicas para triagem para diferenciar nocardias de outros gêneros. A identificação por PRA-hsp65 será útil na rotina de laboratório de micobactérias como identificação presuntiva. A identificação definitiva das espécies deve ser obtida pelo sequenciamento do gene 16S rRNA. / The genus Nocardia comprises filamentous gram-positive bacteria that usually cause disease only in immunocompromised individuals. However, a number of Nocardia infections have been reported in immunocompetent patients. Overall, it has been reported in the recent years an increased frequency of infections caused by Nocardia spp. due to an also increasing number of species, making the correct identification of the species more difficult. Correct identification assumes a major importance with respect to antimicrobial treatment\'s choice as well a for epidemiological investigation purposes. The objective of this study was to characterize by phenotypic and molecular methods 72 isolates of Nocardia spp. of medical interest, designing the methodologies for an identification algorithm. The isolates were obtained from the fungal collection of the Tropical Medicine Institute of São Paulo and the routine service of the Tuberculosis and Mycobacteriosis Center of the Adolfo Lutz Institute. The isolates were identified by phenotypic testing, molecular identification by restriction analysis (PRA-hsp65), hsp65 and 16S rRNA genes sequencing, and MALDI-TOF MS (matrix-associated laser desorption time-of-flight mass spectrometry). The susceptibility profile was analyzed by the Minimum Inhibitory Concentration method (MIC) and disk diffusion (DD), with the following drugs: amikacin, ciprofloxacin, minocycline, tobramycin, amoxicillin+ clavulanic acid, imipenem and sulfamethoxazole trimethoprim. The results showed that the phenotypic identification was insufficient to define the species. Only 24 (33.4%) of the isolates had phenotype identification that was concordant with the 16S rRNA gene sequencing. In the analysis made with the PRA-hsp65 technique were observed 20 (27.8%) N. brasiliensis and six (8.3%) isolates from other species of Nocardia spp., while 38 isolates (52.8%) were considered as new standards (NP). Also by this technique a mixed isolate was detected and an amplification product was not obtained in five isolates. The hsp65 gene sequencing provided identification of 51 isolates as Nocardia, 14 were identified as belonging to other genera, one of them being identified either as Mycobacterium abscessus by in silico analysis of the hsp65 gene and as N. otitidiscaviarum by the 16S rRNA gene sequencing. Seven isolates were not sequenced due to the absence of the amplified fragment. The isolates analyzed by 16S rRNA gene sequencing were identified as Nocardia (57 to 79.2%), Gordonia (7 to 9.7%), Rhodoccocus (3 to 4.2%), Tsukamurella (2-2, 8%), Mycobacterium (2 to 2.8%) and Streptomyces (1-1.3%). By MALDI-TOF MS analysis of the 72 isolates, 49 were identified as Nocardia, 11 were identified as belonging to other genera and 12 isolates could not be identified because the samples provided reading values that were inadequate for analysis. For the first time, to our knowledge, the resazurin dye was used to determine the MICs of Nocardia sp. Among the drugs tested, the most sensitives were amikacin (100%) and tobramycin (84%). The higher resistances were found with trimethoprim-sulfamethoxazole (76%) and imipenem (54%). Due to the absence of establishe criteria for the interpretation of the disk diffusion assay with Nocardia, we designed a specific criterion for this study. The results obtained in the DD test showed 100% sensitivity for the drugs amikacin, minocycline and trimethoprim-sulfamethoxazole. The isolates showed the highest percentage of resistance to ciprofloxacin drug (64%). Comparing the results with those obtained with DD and MIC, we observed that ciprofloxacin, imipenem and sulfamethoxazole-trimethoprim showed a percentage of disagreement above 20%. The sulfamethoxazole-trimethoprim drug had the highest discrepancy (> 75%), with high percentage of resistant isolates with MIC but low percentage of resistance in DD. The only drug with 100% agreement between the both results was amikacin. We designed a recognition algorithm using phenotyping techniques to screen and differentiate nocardias from other genera. The identification by PRA-hsp65 will be useful in routine mycobacteria laboratory as a presumptive identification tool. The final identification of the species should be obtained by sequencing the 16S rRNA gene. Algorithms Algoritmos Análise de sequência de DNA DNA sequence analysis Fenótipos Nocardiaceae Nocardiaceae Phenotypics
25	Prospecção de genes codificadores de enzimas lipolíticas em biblioteca metagenômica de consórcio microbiano degradador de óleo diesel. / Screening for lipolytic enzyme codification genes in a metagenomic library of consortia specialized in diesel oil degradation. Pereira, Mariana Rangel 03 March 2011 (has links) As enzimas lipolíticas vêm atraindo atenção no mercado global devido ao enorme potencial biotecnológico, como: na formulação de detergentes; na indústria de couro; produção de cosméticos, fármacos, aromas, biodiesel, etc. O objetivo deste trabalho foi prospectar genes codificadores de enzimas lipolíticas em biblioteca metagenômica de um consórcio microbiano degradador de óleo diesel. A seleção foi feita pela atividade lipolítica através do cultivo dos clones em placa de petri e a avaliação foi pela observação de halo ao redor da colônia, sendo positiva para 30 clones dentre os quais dois se destacaram. Estes dois clones foram selecionados e subclonados. Os DNAs das sub-bibliotecas foram sequenciados, gerando um contig completo para cada clone. Através do ORF Finder foi identificado cinco ORFs de esterase/lipase, dentre as quais uma alcançou 58% de identidade com uma bactéria não cultivável. As árvores filogenéticas indicam que duas ORFs são similares à família IV das enzimas lipolíticas, enquanto que as outras três ORFs à família V. / Lipolytic enzymes have been attracting global market attention because they show enormous biotechnological potential. The present work was done as an attempt to find genes which codify lipolytic enzymes in a metagenomic library composed of diesel oil degradation microbe consortia. Clones were selected according to lipolytic activity and were then evaluated after cultivation in Petri dishes by observation of halo formation around the colonies. 30 clones produced halo formations and were identified as positives, two of which showed prominent results. These two were then selected and sub cloned. DNA from the sub libraries was sequenced, generating a complete contig for each clone. Using the ORF Finder five esterase/lipase ORFs were identified, with one of these attaining 58% of identity to a non cultivatable bacteria species. Assessment of the cladograms showed that two ORFs were similar to lipolytic enzyme family IV, while the other three ORFs were similar to family V. Biochemical genetics DNA sequence Enzimas lipolíticas Genética bioquímica genomas Genomes Lipase Lipase Lipolytic enzymes Sequência do DNA
26	Seleção de leveduras para bioconversão de D-xilose em xilitol / Yeast selection for bioconversion of D-xylose to xilitol Lourenço, Marcus Venicius de Mello 15 January 2010 (has links) Espécies microbianas, em especial as leveduras, são de grande importância para a produção de xilitol. A produção de xilitol envolve uma complicada regulação metabólica, incluindo o transporte de D-xilose, produção de enzimas fundamentais e cofator de regeneração. Assim, a triagem de microorganismos que consomem naturalmente D-xilose se torna uma maneira viável e eficaz para se obter organismos com possível aplicação industrial para a produção de xilitol. Neste trabalho foram isoladas vinte e oito leveduras provenientes do ambiente industrial da produção de etanol (torta de filtro) com habilidade de consumir D-xilose. O seqüenciamento e a identificação pela análise da região D1/D2 do gene do rDNA 26S demonstraram que todas pertencem ao gênero Candida, sendo 24 linhagens (85.71%) C. tropicalis e 4 linhagens (14.29%) C. rugosa. Das 28 linhagens isoladas, cinco linhagens de leveduras foram escolhidas aleatóriamente para o ensaio de bioconversão de D-xilose em xilitol devido ao fato das mesmas apresentarem velocidade de crescimento em D-xilose semelhantes. As linhagens selecionadas para o ensaio foram: Candida tropicalis MVP 03, Candida tropicalis MVP 16, Candida rugosa MVP 17, Candida rugosa MVP 21, Candida tropicalis MVP 40, pois representam bem a amostragem. Três leveduras pertencentes à coleção do Departamento de Ciências Biológicas da ESALQ / USP (kluyveromyces marxianus IZ 1339, Candida tropicalis IZ 1824 e Candida guilliermondii FTI 20037) foram utlizadas nos ensaios para obtenção de xilitol a partir da bioconversão da D-xilose como controle positivo. Para a formação de xilitol em meio sintético utilizando D-xilose como única fonte de carbono. Foram realizados ensaios da cinética de crescimento durante 96 horas de fermentação. Na primeira triagem, para a avaliação da melhor condição nutricional para o ensaio, as leveduras foram cultivadas em três meios quimicamente definidos: YNB 6.7 g L-1, UPX (uréia 2.3 g L-1 e peptona 6.6 g L-1) MCX (KH2PO4 0,62 g L-1; K2HPO4 2,0 g L-1; (NH4)2SO4 1,0 g L-1 MgSO4 1,1 g L-1, extrato de levedura 0.5 g L-1) acrescidos de 20 g L-1 de D-xilose, a 30°C e 120 rpm. O meio UPX apresentou o melhor rendimento, com uma produtividade volumetrica (Qp) entre 0,004 a 0,09, fator de conversão de xilose em xilitol (Yp/s) entre 0,23 a 0,28 g g-1, fator de conversão de D-xilose em biomassa (Yx/s) entre 0.20 a 0.24 g g-1, com uma eficiência de 10 conversão (h) entre 21% a 26%.As leveduras C. tropicalis MVP 03; C. tropicalis MVP 16; C. rugosa MVP 17; C. rugosa MVP 21; C. tropicalis MVP 40 foram avaliadas em uma triagem, em meio UPX, com padronização do inóculo inicial. Para os cinco isolados, a produção de xilitol variou de 5,76 a 32,97 g L-1, a partir de 50 g L-1 de D-xilose com produtividade (Qp) de 0,06 a 0,35 g L-1 h-1, fator de conversão de xilose em xilitol (Yp/s) de 0,14 a 0,65 g g-1, fator de conversão de D-xilose em biomassa (Yx/s) de 0,08 a 0,29 g g-1 e a eficiência de conversão (h) entre 6% e 61% que foi calculado segundo Barbosa et al 1988. Destacou-se a levedura C. tropicalis 16, produzindo 32,97 g L-1 de xilitol com um Qp de 0,35 g L-1 h-1, Yp/s de 0,65 g g-1, Y x/s de 0,11 g g-1 e eficiência de conversão (h) de 61 %. / Microbial species, particularly yeast, are of great importance for the production of xylitol. The xylitol production involves complicated metabolic regulation, including the transport of D-xylose, production of key enzymes and cofactor regeneration. Thus, screening of microorganisms that consume D-xylose naturally becomes a viable and effective way to obtain organisms with industrial application for the production of xylitol. In this work we isolated twenty-eight yeasts from the environment of the industrial production of ethanol (filter cake) with capacity to consume D-xylose. The sequencing and identification by analysis of the D1/D2 region of 26S rDNA gene showed that all belong to the genus Candida, and 24 strains (85.71%) C. tropicalis and 4 strains (14.29%) C. rugosa. Of the 28 isolates, five strains of yeast were selected randomly to test the bioconversion of D-xylose to xylitol due to the fact that they present rate of growth in D-xylose similar. The lines selected for testing were: Candida tropicalis MVP 03, Candida tropicalis MVP 16, Candida rugosa MVP 17, Candida rugosa MVP 21, Candida tropicalis MVP 40, and they represent a sampling. Three yeasts from the collection of the Department of Biological Sciences, ESALQ / USP (Kluyveromyces marxianus IZ 1339, Candida tropicalis IZ 1824 and Candida guilliermondii FTI 20037), used were the tests to obtain xylitol from the bioconversion of D-xylose as positive control, for the formation of xylitol in a synthetic medium using D-xylose as sole carbon source. Assays were performed in the kinetics of growth during 96 hours of fermentation. In the first evaluation, the evaluation of the best nutritional condition in the test, yeast cells were grown in three chemically defined media: YNB 6.7 g L-1, UPX (urea 2.3 g L-1 peptone and 6.6 g L-1) MCX ( KH2PO4 0.62 g L-1, K2HPO4 2.0 g L-1, (NH4)2SO4 1.0 g L-1 MgSO4 1.1 g L-1, yeast extract 0.5 g L-1) plus 20 g L-1 D-xylose at 30°C and 120 rpm. Mean UPX showed the best performance with a volumetric productivity (Qp) from 0.004 to 0.09, the conversion factor of xylose to xylitol (Yp/s) between 0,23 to 0,28 g g-1 conversion factor D-xylose in biomass (Yx/s) between 0.20 to 0.24 g g-1 Yeasts 0,20 to 0,24 g g-1, with a conversion efficiency (h) between 21% to 26%. C. tropicalis MVP 03, C. tropicalis MVP 16, C. rugosa MVP 17, C. rugosa MVP 21, C. tropicalis MVP 40 were evaluated in a screening, in media UPX, with standardization of initial inoculation. For 12 five isolates, the production of xylitol varied from 5.76 to 32.97 g L-1, from 50 g L-1 Dxylose with productivity (Qp) of 0.06 to 0,35 g L-1 h-1, the conversion factor of xylose to xylitol (Yp/s) 0.14 to 0.65 g g-1, the conversion factor of D-xylose in biomass (Yx/s) from 0.08 to 0.29 g g-1 and conversion efficiency (h) between 6% and 61% which was calculated according to Barbosa et al 1988. They outlined the yeast C. tropicalis MVP16, yielding 32.97 g L-1 of xylitol with a Qp of 0.35 g L-1 h-1, Yp/s to 0.65 g g-1, Y x/s of 0.11 g g -1 and conversion efficiency (h) of 61%. Aerobic fermentation Candida Candida DNA Sequence. Fermentação aeróbica Leveduras - Isolamento e purificação Sequência do DNA Yeast-Isolation and purification
27	Prospecção de sequências genômicas codificadoras de enzimas lipolíticas degradadoras de hidrocarbonetos de petróleo. / Screening for genomic sequences which codify lipolytic enzymes specialized in petroleum hydrocarbons degradation. Maester, Thais Carvalho 30 May 2011 (has links) Enzimas lipolíticas possuem enorme potencial biotecnológico. O objetivo foi prospectar genes para a codificação de enzimas lipolíticas em biblioteca metagenômica com 4224 clones. A atividade lipolítica foi avaliada pela formação de halo ao redor das colônias através do cultivo dos clones em meio de cultura suplementado com tributirina, sendo positiva para 30 clones, e dois foram selecionados e tiveram o DNA sub-clonado. Os DNAs das sub-bibliotecas foram sequenciados, gerando um contig completo para o clone PL28.F10, que foi comparado com as sequências do banco NCBI. Uma ORF codificadora de esterase/lipase de 303 aminoácidos e 61% de identidade com micro-organismo não cultivável foi encontrada. Árvores filogenéticas indicam que o clone possui a ORF15 mais próxima da família IV das esterases/lipases. Foi possível identificar os sítios ativos representativos da família, confirmando o resultado das árvores filogenéticas. Com sequências já patenteadas, a ORF15 é um grupo irmão das sequências de esterases/lipases da BASF e de uma proteína não identificada da CAMBIA. / Lipolytic enzymes have show enormous biotechnological potential. The work was done to find genes which codify lipolytic enzymes in a metagenomic library with 4224 clones. Clones were selected according to lipolytic activity and were assessed by cultivation in medium supplemented with tributyrin. Assessment was done by observation of halos formed around the colonies, with 30 clones producing halos. Of these, two were selected. DNA from the sub libraries was sequenced, generating a complete contig for clone PL28.F10 that was compared to sequences from the NCBI. An ORF of 303 amino acids with 61% of identity with uncunturable microorganism were found. The clone presented the ORF15 similar to that of lipolytic enzyme family IV. The alignments made possible the identification of active sites which represent the family, confirming the results obtained with the construction of the cladograms. The ORF15 showed similarities to patented BASF esterase/lipase and an unnamed protein of CAMBIA. DNA sequence Enzimas hidrolíticas Hidrocarbonetos Hydrocarbons Hydrolytic enzymes Lipase Lipase Oil Patent Patente Petróleo Sequência do DNA
28	Seleção de leveduras para bioconversão de D-xilose em xilitol / Yeast selection for bioconversion of D-xylose to xilitol Marcus Venicius de Mello Lourenço 15 January 2010 (has links) Espécies microbianas, em especial as leveduras, são de grande importância para a produção de xilitol. A produção de xilitol envolve uma complicada regulação metabólica, incluindo o transporte de D-xilose, produção de enzimas fundamentais e cofator de regeneração. Assim, a triagem de microorganismos que consomem naturalmente D-xilose se torna uma maneira viável e eficaz para se obter organismos com possível aplicação industrial para a produção de xilitol. Neste trabalho foram isoladas vinte e oito leveduras provenientes do ambiente industrial da produção de etanol (torta de filtro) com habilidade de consumir D-xilose. O seqüenciamento e a identificação pela análise da região D1/D2 do gene do rDNA 26S demonstraram que todas pertencem ao gênero Candida, sendo 24 linhagens (85.71%) C. tropicalis e 4 linhagens (14.29%) C. rugosa. Das 28 linhagens isoladas, cinco linhagens de leveduras foram escolhidas aleatóriamente para o ensaio de bioconversão de D-xilose em xilitol devido ao fato das mesmas apresentarem velocidade de crescimento em D-xilose semelhantes. As linhagens selecionadas para o ensaio foram: Candida tropicalis MVP 03, Candida tropicalis MVP 16, Candida rugosa MVP 17, Candida rugosa MVP 21, Candida tropicalis MVP 40, pois representam bem a amostragem. Três leveduras pertencentes à coleção do Departamento de Ciências Biológicas da ESALQ / USP (kluyveromyces marxianus IZ 1339, Candida tropicalis IZ 1824 e Candida guilliermondii FTI 20037) foram utlizadas nos ensaios para obtenção de xilitol a partir da bioconversão da D-xilose como controle positivo. Para a formação de xilitol em meio sintético utilizando D-xilose como única fonte de carbono. Foram realizados ensaios da cinética de crescimento durante 96 horas de fermentação. Na primeira triagem, para a avaliação da melhor condição nutricional para o ensaio, as leveduras foram cultivadas em três meios quimicamente definidos: YNB 6.7 g L-1, UPX (uréia 2.3 g L-1 e peptona 6.6 g L-1) MCX (KH2PO4 0,62 g L-1; K2HPO4 2,0 g L-1; (NH4)2SO4 1,0 g L-1 MgSO4 1,1 g L-1, extrato de levedura 0.5 g L-1) acrescidos de 20 g L-1 de D-xilose, a 30°C e 120 rpm. O meio UPX apresentou o melhor rendimento, com uma produtividade volumetrica (Qp) entre 0,004 a 0,09, fator de conversão de xilose em xilitol (Yp/s) entre 0,23 a 0,28 g g-1, fator de conversão de D-xilose em biomassa (Yx/s) entre 0.20 a 0.24 g g-1, com uma eficiência de 10 conversão (h) entre 21% a 26%.As leveduras C. tropicalis MVP 03; C. tropicalis MVP 16; C. rugosa MVP 17; C. rugosa MVP 21; C. tropicalis MVP 40 foram avaliadas em uma triagem, em meio UPX, com padronização do inóculo inicial. Para os cinco isolados, a produção de xilitol variou de 5,76 a 32,97 g L-1, a partir de 50 g L-1 de D-xilose com produtividade (Qp) de 0,06 a 0,35 g L-1 h-1, fator de conversão de xilose em xilitol (Yp/s) de 0,14 a 0,65 g g-1, fator de conversão de D-xilose em biomassa (Yx/s) de 0,08 a 0,29 g g-1 e a eficiência de conversão (h) entre 6% e 61% que foi calculado segundo Barbosa et al 1988. Destacou-se a levedura C. tropicalis 16, produzindo 32,97 g L-1 de xilitol com um Qp de 0,35 g L-1 h-1, Yp/s de 0,65 g g-1, Y x/s de 0,11 g g-1 e eficiência de conversão (h) de 61 %. / Microbial species, particularly yeast, are of great importance for the production of xylitol. The xylitol production involves complicated metabolic regulation, including the transport of D-xylose, production of key enzymes and cofactor regeneration. Thus, screening of microorganisms that consume D-xylose naturally becomes a viable and effective way to obtain organisms with industrial application for the production of xylitol. In this work we isolated twenty-eight yeasts from the environment of the industrial production of ethanol (filter cake) with capacity to consume D-xylose. The sequencing and identification by analysis of the D1/D2 region of 26S rDNA gene showed that all belong to the genus Candida, and 24 strains (85.71%) C. tropicalis and 4 strains (14.29%) C. rugosa. Of the 28 isolates, five strains of yeast were selected randomly to test the bioconversion of D-xylose to xylitol due to the fact that they present rate of growth in D-xylose similar. The lines selected for testing were: Candida tropicalis MVP 03, Candida tropicalis MVP 16, Candida rugosa MVP 17, Candida rugosa MVP 21, Candida tropicalis MVP 40, and they represent a sampling. Three yeasts from the collection of the Department of Biological Sciences, ESALQ / USP (Kluyveromyces marxianus IZ 1339, Candida tropicalis IZ 1824 and Candida guilliermondii FTI 20037), used were the tests to obtain xylitol from the bioconversion of D-xylose as positive control, for the formation of xylitol in a synthetic medium using D-xylose as sole carbon source. Assays were performed in the kinetics of growth during 96 hours of fermentation. In the first evaluation, the evaluation of the best nutritional condition in the test, yeast cells were grown in three chemically defined media: YNB 6.7 g L-1, UPX (urea 2.3 g L-1 peptone and 6.6 g L-1) MCX ( KH2PO4 0.62 g L-1, K2HPO4 2.0 g L-1, (NH4)2SO4 1.0 g L-1 MgSO4 1.1 g L-1, yeast extract 0.5 g L-1) plus 20 g L-1 D-xylose at 30°C and 120 rpm. Mean UPX showed the best performance with a volumetric productivity (Qp) from 0.004 to 0.09, the conversion factor of xylose to xylitol (Yp/s) between 0,23 to 0,28 g g-1 conversion factor D-xylose in biomass (Yx/s) between 0.20 to 0.24 g g-1 Yeasts 0,20 to 0,24 g g-1, with a conversion efficiency (h) between 21% to 26%. C. tropicalis MVP 03, C. tropicalis MVP 16, C. rugosa MVP 17, C. rugosa MVP 21, C. tropicalis MVP 40 were evaluated in a screening, in media UPX, with standardization of initial inoculation. For 12 five isolates, the production of xylitol varied from 5.76 to 32.97 g L-1, from 50 g L-1 Dxylose with productivity (Qp) of 0.06 to 0,35 g L-1 h-1, the conversion factor of xylose to xylitol (Yp/s) 0.14 to 0.65 g g-1, the conversion factor of D-xylose in biomass (Yx/s) from 0.08 to 0.29 g g-1 and conversion efficiency (h) between 6% and 61% which was calculated according to Barbosa et al 1988. They outlined the yeast C. tropicalis MVP16, yielding 32.97 g L-1 of xylitol with a Qp of 0.35 g L-1 h-1, Yp/s to 0.65 g g-1, Y x/s of 0.11 g g -1 and conversion efficiency (h) of 61%. Candida Fermentação aeróbica Leveduras - Isolamento e purificação Sequência do DNA Aerobic fermentation Candida DNA Sequence. Yeast-Isolation and purification
29	Sex chromosome microsatellite markers from an Australian marsupial: development, application and evolution MacDonald, Anna Jayne, n/a January 2008 (has links) Microsatellites are simple repetitive DNA sequences that are used as genetic markers throughout the biological sciences. The high levels of variation observed at microsatellite loci contribute to their utility in studies at the population and individual levels. This variation is a consequence of mutations that change the length of microsatellite repeat tracts. Current understanding suggests that most mutations are caused by polymerase slippage during DNA replication and lead to changes of a single repeat unit in length, but some changes involving multiple repeats can also occur. Despite this simplistic overview, there is evidence for considerable heterogeneity in mutation processes between species, loci and alleles. Such complex patterns suggest that other mechanisms, including those associated with DNA recombination, are also involved in the generation of microsatellite mutations. Understanding which mutational mechanisms are responsible for variation at microsatellite markers is essential to enable accurate data interpretation in genotyping projects, as many commonly used statistics assume specific mutation models. I developed microsatellite markers specific to the X and Y chromosomes and an autosome in the tammar wallaby, Macropus eugenii, and investigated their evolutionary properties using two approaches: indirectly, as inferred from population data, and directly, from observation of mutation events. First, I found that allelic richness increased with repeat length and that two popular mutation models, the stepwise mutation model and the infinite allele model, were poor at predicting the number of alleles per locus, particularly when gene diversity was high. These results suggest that neither model can account for all mutations at tammar wallaby microsatellites and hint at the involvement of more complex mechanisms than replication slippage. I also determined levels of variation at each locus in two tammar wallaby populations. I found that allelic richness was highest for chromosome 2, intermediate for the X chromosome and lowest for the Y chromosome in both populations. Thus, allelic richness varied between chromosomes in the manner predicted by their relative exposure to recombination, although these results may also be explained by the relative effective population sizes of the chromosomes studied. Second, I used small-pool PCR from sperm DNA to observe de novo mutation events at three of the most polymorphic autosomal markers. To determine the reliability of my observations I developed and applied strict criteria for scoring alleles and mutations at microsatellite loci. I observed mutations at all three markers, with rate variation between loci. Single step mutations could not be distinguished because of the limitations of the approach, but 24 multi-step mutations, involving changes of up to 35 repeat units, were recorded. Many of these mutations involved changes that could not be explained by the gain or loss of whole repeat units. These results imply that a large number of mutations at tammar wallaby microsatellites are caused by mechanisms other than replication slippage and are consistent with a role for recombination in the mutation process. Taken as a whole, my results provide evidence for complex mutation processes at tammar wallaby microsatellites. I conclude that careful characterisation of microsatellite mutation properties should be conducted on a case-by-case basis to determine the most appropriate mutation models and analysis tools for each locus. In addition, my work has provided a set of chromosome-specific markers for use in macropod genetic studies, which includes the first marsupial Y chromosome microsatellites. Sex chromosome microsatellites open a new range of possibilities for population studies, as they provide opportunities to investigate gene flow in a male context, to complement data from autosomal and maternally-inherited mitochondrial markers. Microsatellites repetitive DNA sequence X and Y chromosomes tammar wallaby Macropus eugenii
30	Characterisation of a DNA ligase from an Antarctic metagenomic library Booyse, Dean January 2011 (has links) <p>A metagenomic gene library prepared from soil found beneath a mummified seal carcass in the Miers Valley, Antarctica, suggests an environment rich in uncharacterised biodiversity including enzymes with possible application to industrial processes. A sequence based gene mining investigation was performed on a clone, which archives a metagenomic sequence from this environment. The sequence was annotated using de novo bioinformatics and molecular biology techniques. A predicted NAD+-dependent DNA ligase, ligDB1 was selected for further characterisation. LigDB1 encodes a gene product that contains all the sequence features of a functional ligase. The protein was overexpressed in a heterologous E. coli host and purified to homogeneity. LigDB1 did not exhibit nick sealing activity, but was able to perform AMP-dependent DNA relaxation in the presence of high concentrations of enzyme. DNA modifying enzymes from cold environments perform optimally at low temperatures and may be of use as molecular tools in biotechnology. Complete characterisation of this enzyme is subject to further investigations.</p>

Search results