• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 82
  • 44
  • 37
  • 14
  • 4
  • 3
  • 2
  • 1
  • Tagged with
  • 209
  • 70
  • 53
  • 52
  • 40
  • 23
  • 21
  • 20
  • 18
  • 15
  • 15
  • 15
  • 15
  • 14
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

The oocyte-activation factor, phospholipase C zeta (PLCζ) : clinical prognosis, diagnosis, and treatment of oocyte activation deficiency

Amdani, Siti Nornadhirah January 2018 (has links)
Oocyte activation deficiency (OAD) is an infertile condition observed in patients who have experienced recurrent total fertilisation failure (TFF) following intracytoplasmic sperm injection treatment. This condition was considered to be an idiopathic factor for a long time but strong clinical evidence now suggests that dysfunctional forms of phospholipase C zeta (PLCζ) may be predominant causative factors for OAD. Genetic contribution has played a role in patients suspected of having OAD, as four PLCζ exonic mutations have been discovered and characterised as being the cause of infertility. In this study, a novel nonsense mutation, PLCζK322Stop, was identified in the PLCζ XY-linker region of Patient LR. This variant results in the truncation of approximately half of PLCζ, therefore was non-functional when activity was tested. Patient LR, which also exhibited a previously reported mutation, PLCζH233L, may suggest that the patient is sub-fertile, as opposed to being infertile, as initially expected. Although research has purely focused upon the coding regions of PLCζ, it was obvious that our knowledge of PLCζ regulatory elements remain very limited. Next generation sequencing (NGS) was therefore employed to detect variants in the non-coding regions of PLCζ, promoter and introns, which may have resulted in the observed phenotypic diversity of PLCζ expression in fertile and infertile patients. As a result of mapping failure, an alternative approach was considered to identify variants within human PLCζ, and this involved using the single nucleotide polymorphism (SNP) database. Over 2500 SNPs were localised in the intronic regions of PLCζ and thus, it could be speculated that these variants may help elucidate the wide variation of PLCζ expression reported. Additionally, two particular patients with TFF (79 and 107) were investigated in this study to identify an association with PLCζ and their infertile state. For Patient 79, multiple PLCζ immunofluorescence analysis was performed and a significant improvement in PLCζ expression was observed one year after his first investigation. This may have been the result of an external factor, which influenced protein expression. As for Patient 107, a novel substitution mutation, PLCζV193E, was identified and was predicted to affect PLCζ stability and folding. There is global interest to create a safer and alternative OAD therapy, namely a human recombinant PLCζ protein (hrPLCζ). The first method, using a bacterial cell line resulted in successful purification and identification but the product proved to be inactive following mouse oocyte microinjection. The second method involved production of a mammalian-expressed hrPLCζ, which was successfully purified and identified but due to time restrictions, could not be tested for functionality. Concurrently, the findings in this thesis have reinforced the association between PLCζ and OAD, and provided improved options for the diagnosis and treatment of OAD.
152

Etude des propriétés génétiques et fonctionnelles des variants du virus de l'hépatite C lors d'évènements de transmission / Study genetic and functionnal properties of Hepatitis C virus variants during transmission events

Guinoiseau, Thibault 29 January 2018 (has links)
Chez un individu infecté, le VHC circule sous la forme d’une population de variants viraux appelés quasi-espèce. Lors d’un évènement de transmission, certains variants viraux sont préférentiellement transmis et un effondrement de la diversité virale chez l’individu nouvellement infecté est souvent observé. Les propriétés électives de ces variants ainsi que leur rôle dans l’évolution clinique sont méconnus. L’objectif de cette étude est d’identifier si des déterminants moléculaires situés au niveau des glycoprotéines d’enveloppe du VHC sont associés à une plus grande capacité de transmission. Les propriétés fonctionnelles des variants transmis et non transmis seront étudiées, en particulier la sensibilité à la neutralisation autologue. Les échantillons étudiés proviennent de couples mère-enfant infectés chroniquement par le VHC issus de d’un essai clinique réalisé en Thaïlande. La composition des populations virales au sein de 3 paires a été étudiée à l’aide d’une technique d’amplification après dilution limite (SGA) suivie d’un séquençage profond (Illumina). Le variant majoritaire chez la mère était retrouvé majoritaire chez l’enfant pour les paires 1 et 3. Pour 2 paires (2 et 3), une moindre diversité génétique a été observée chez l’enfant par rapport à la mère témoignant d’un goulot d’étranglement génétique lors de la transmission. Après clonage des gènes E1E2, des tests d’infectivité sur cellules hépatocytaires ainsi que des tests de neutralisation par le sérum maternel sont réalisés avec le modèle de rétrovirus pseudotypés (VHCpp). Pour la 1ère paire, le variant majoritaire chez la mère (variant transmis à l’enfant) est infectieux et résistant au sérum autologue. Pour la deuxième paire, le variant minoritaire (transmis) est légèrement résistant à la neutralisation autologue. Un variant majoritaire non transmis apparait sensible à la neutralisation autologue. Des études complémentaires en système de virus réplicatifs issus de la culture cellulaire (VHCcc) sont en cours. Au final, les résultats de cette étude contribuent à comprendre les étapes précoces de l’infection par le VHC, afin de mieux appréhender de futures approches immunoprophylactiques ou vaccinales. / In infected individuals, HCV circulates as a complex mixture of genetically different, but closely related viral variants named quasispecies. In a transmission event, some viral variants are preferentially transmitted. The genetic and functional properties of these variants are still unknown. The aim of our work was to identify molecular determinants of E1E2 associates with a greater capacity of transmission. We also intend to study the functional properties of transmitted and no transmitted variants, as for example sensibility to autologous neutralization. Studied sera samples were obtained from three women and their child infected by the HCV, who were participating in HIV prevention clinical trial for the prevention of perinatal transmission of HIV in Thailand. Quasispecies were studied with single genome amplification (SGA) followed by deep sequencing (Illumina). A decrease in intra-host diversity (genetic bottleneck) was observed in the viral population of child near birth (week 6) compared with that observed in the mother (just before delivery). For 2 pairs, the major variant observed in the mother was the same as the major one identified in the child. Retroviral pseudotypes (HCVpp), bearing each transmitted and non-transmitted envelope glycoproteins were produced. For each one, the level of infectivity on HuH7 cells was measured as well as the neutralizing activity of the autologous sera. For the first pair, the major variant (transmitted) appears resistant to autologous neutralization. For the second pair, the transmitted minor variant appears slightly resistant to autologous neutralization. A non-transmitted major variant is sensitive to autologous neutralization. Complementary studies with HCV derived from cell culture (HCVcc) are in progress We hope that the results of this study may be helpful to better understand early steps of HCV infection, which is of great interest for the development of immunoprophylaxis and vaccine strategies.
153

Epidemiologia molecular do vírus da Hepatite C: análise comparativa de diferentes regiões subgenômicas aplicadas a estudos de associação genética / Hepatitis C virus molecular epidemiology: a comparative analysis between the HVR1 and NS5A subgenomic regions

Rossi, Livia Maria Gonçalves Rossi [UNESP] 18 January 2016 (has links)
Submitted by LIVIA MARIA GONÇALVES ROSSI null (liv.rossi@yahoo.com) on 2016-01-25T14:04:25Z No. of bitstreams: 1 TESE_HCV Multi-region_LIVIA ROSSI.pdf: 6853283 bytes, checksum: d2ce231bc1292f9fa0469f971cf0856b (MD5) / Approved for entry into archive by Juliano Benedito Ferreira (julianoferreira@reitoria.unesp.br) on 2016-01-25T17:23:43Z (GMT) No. of bitstreams: 1 rossi_lmg_dr_sjrp.pdf: 6853283 bytes, checksum: d2ce231bc1292f9fa0469f971cf0856b (MD5) / Made available in DSpace on 2016-01-25T17:23:43Z (GMT). No. of bitstreams: 1 rossi_lmg_dr_sjrp.pdf: 6853283 bytes, checksum: d2ce231bc1292f9fa0469f971cf0856b (MD5) Previous issue date: 2016-01-18 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / O vírus da Hepatite C (HCV) afeta cerca de 3% da população mundial. A cada ano, 3-4 milhões de novos casos são diagnosticados. A identificação de redes transmissão é complexa devido ao longo período de incubação, à falta de sintomas na fase aguda da doença e à heterogeneidade do HCV, que dificulta o estabelecimento de vínculos entre casos relacionados. Uma ampla caracterização das populações intra-hospedeiros pode ser realizada de forma eficiente através do sequenciamento de nova geração (NGS). Com base neste contexto, o sequenciamento de múltiplas regiões subgenômicas é uma solução às limitações impostas pela rápida evolução molecular do HCV. Variantes virais das regiões HVR1 e NS5A de 16 pacientes cronicamente infectados com o HCV, genótipos 1a e 1b, foram sequenciadas com a técnica de NGS. Os pacientes 1-7 compartilhavam fatores de risco, pertencendo ao mesmo grupo de usuários de drogas injetáveis, porém o parentesco genético desses casos não pode ser estabelecido com base apenas no sequenciamento da HVR1 (distância nucleotídica mínima entre 16-23). A amplificação de um fragmento maior (~450 pb), correspondente a um segmento da região NS5A, aprimorou a relação epidemiológica entre os pacientes 1-5, onde as distancias genéticas mínimas foram consideravelmente menores (9-13). Os pacientes 6 e 7 não compartilharam sequências com os outros cinco pacientes dessa rede, apresentando populações virais mais homogêneas. Adicionalmente, Median Joining Networks foram construídas para melhor analisar a variabilidade genética intra-hospedeiro. Em geral, observou-se que as sequências derivadas da NS5A formaram comunidades mais homogêneas e menos divergentes geneticamente. Assim, a tecnologia NGS e o sequenciamento das regiões subgenômicas HVR1 e NS5A podem ajudar a restaurar elos perdidos quando somente a região HVR1 é analisada, aprimorando portanto, a resolução de estudos de associação genética entre populações de HCV. / The hepatitis C virus (HCV) affects approximately 3% of the world's population. Each year 3-4 million new cases are diagnosed. The identification of transmission networks is complicated due to the characteristic long incubation period, the lack of symptoms during the acute phase of the disease and the heterogeneity of HCV, making it challenging to link related cases to a common source of infection. Extensive characterization of intra-host populations can be reliably archived using next generation sequencing (NGS) approaches. Sequencing of multiple and longer subgenomic regions has been proposed as an alternative to overcome the limitations imposed by the rapid molecular evolution of the HCV HVR1. Thus, the NS5A and HVR1 regions of 16 chronically infected individuals, genotypes 1a and 1b, were sequenced using a NGS platform. Patients 1-7 shared risk factors and belonged to the same injection drug users network. However, genetic relatedness could not be established based on the HVR1 sequences (minimal nucleotide distance ranging from 16-23). Amplification and sequencing of a larger PCR fragment (~450 bp) targeting the NS5A region reestablished lost epidemiological links between patients 1-5. The minimum genetic distances in those patients were considerable smaller than the HVR1 counterparts (9-13). Patients 6 and 7 displayed a rather homogeneous viral population and were clearly not sharing any sequences with all other five patients in this network. Additionally, Median Joining Networks analysis was carried out to further analyze the intrahost genetic variability of all seven patients. Overall, NS5A sequences were significantly less diverse than their HVR1 equivalents. Thus, NGS technology and use of both HVR1 and NS5A sequences might help restored otherwise lost links when the HVR1 region alone is analyzed, improving the resolution of HCV genetic relatedness studies. / CAPES: 33004153079P9
154

Epidemiologia molecular do vírus da Hepatite C : análise comparativa de diferentes regiões subgenômicas aplicadas a estudos de associação genética /

Rossi, Livia Maria Gonçalves Rossi January 2016 (has links)
Orientador: Paula Rahal / Resumo: O vírus da Hepatite C (HCV) afeta cerca de 3% da população mundial. A cada ano, 3-4 milhões de novos casos são diagnosticados. A identificação de redes transmissão é complexa devido ao longo período de incubação, à falta de sintomas na fase aguda da doença e à heterogeneidade do HCV, que dificulta o estabelecimento de vínculos entre casos relacionados. Uma ampla caracterização das populações intra-hospedeiros pode ser realizada de forma eficiente através do sequenciamento de nova geração (NGS). Com base neste contexto, o sequenciamento de múltiplas regiões subgenômicas é uma solução às limitações impostas pela rápida evolução molecular do HCV. Variantes virais das regiões HVR1 e NS5A de 16 pacientes cronicamente infectados com o HCV, genótipos 1a e 1b, foram sequenciadas com a técnica de NGS. Os pacientes 1-7 compartilhavam fatores de risco, pertencendo ao mesmo grupo de usuários de drogas injetáveis, porém o parentesco genético desses casos não pode ser estabelecido com base apenas no sequenciamento da HVR1 (distância nucleotídica mínima entre 16-23). A amplificação de um fragmento maior (~450 pb), correspondente a um segmento da região NS5A, aprimorou a relação epidemiológica entre os pacientes 1-5, onde as distancias genéticas mínimas foram consideravelmente menores (9-13). Os pacientes 6 e 7 não compartilharam sequências com os outros cinco pacientes dessa rede, apresentando populações virais mais homogêneas. Adicionalmente, Median Joining Networks foram construídas para ... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: The hepatitis C virus (HCV) affects approximately 3% of the world's population. Each year 3-4 million new cases are diagnosed. The identification of transmission networks is complicated due to the characteristic long incubation period, the lack of symptoms during the acute phase of the disease and the heterogeneity of HCV, making it challenging to link related cases to a common source of infection. Extensive characterization of intra-host populations can be reliably archived using next generation sequencing (NGS) approaches. Sequencing of multiple and longer subgenomic regions has been proposed as an alternative to overcome the limitations imposed by the rapid molecular evolution of the HCV HVR1. Thus, the NS5A and HVR1 regions of 16 chronically infected individuals, genotypes 1a and 1b, were sequenced using a NGS platform. Patients 1-7 shared risk factors and belonged to the same injection drug users network. However, genetic relatedness could not be established based on the HVR1 sequences (minimal nucleotide distance ranging from 16-23). Amplification and sequencing of a larger PCR fragment (~450 bp) targeting the NS5A region reestablished lost epidemiological links between patients 1-5. The minimum genetic distances in those patients were considerable smaller than the HVR1 counterparts (9-13). Patients 6 and 7 displayed a rather homogeneous viral population and were clearly not sharing any sequences with all other five patients in this network. Additionally, Median Joining... (Complete abstract click electronic access below) / Doutor
155

Analysis of autoimmune lesions in grey matter

Hermann, Moritz 20 February 2018 (has links)
No description available.
156

Filosfera de citros sob manejo convencional e ecológico: estrutura da comunidade bacteriana e monitoramento de cobre / Phyllosphere of citros under conventional and ecological management: structure of bacterial community and copper monitoring approach

Carolinne Rosa de Carvalho 18 January 2018 (has links)
O manejo agrícola aplicado a um agrossistema pode determinar a qualidade e produtividade da área, além das interações biológicas que podem ser estabelecidas entre o cultivar e ecossistema local. A agricultura convencional é bastante reconhecida como um manejo eficiente e lucrativo. Por outro lado, a agroecologia tem ganhado visibilidade na agroindústria em reflexo do aumento na demanda por alternativas mais sustentáveis de produção. As diferenças entre ambos manejos podem refletir sobre a dinâmica microbiana, alterando a composição e estruturação das comunidades ali presentes. Os micro-organismos que habitam a superfície foliar da planta compõem o micro-ambiente denominado filosfera, descrito como um dos hábitats colonizáveis mais extensos. Devido a sua alta exposição a variáveis ambientais, diversos fatores podem interferir na comunidade bacteriana e definir a filosfera. Desta forma, o principal objetivo nesse estudo foi avaliar como o manejo agrícola interfere na composição bacteriana na filosfera, analisando ainda em escala temporal sua estrutura e abundância. A área experimental amostrada foi cedida pelo Centro de Pesquisa \"Mokiti Okada\", em Mogi Guaçu, São Paulo. As amostras foram coletadas de maneira representativa em diferentes linhas de tratamento, uma sob manejo convencional e outra sob manejo ecológico. Análises microbiológicas dependentes e independentes de cultivo permitiram identificar a comunidade bacteriana residente da filosfera de citros, a qual era compartilhada por ambos os manejos. Entretanto, análises de sequenciamento NGS (New Generation Sequencing) mostraram uma diferença significativa entre as comunidades bacterianas dos dois manejos, com o ecológico apresentando uma maior diversidade. Apesar do manejo ter se mostrado um importante fator na composição bacteriana, quando avaliado em função temporal, viu-se que as épocas de coleta interferem mais intensamente na estrutura das bactérias (p=0,0001), mostrando uma sobreposição dos diversos fatores ambientais que atuam sobre a filosfera. Os resultados ainda indicaram uma redução na abundância de bactérias, a qual pode estar relacionada com a aplicação extra de produtos cúpricos em ambas as áreas, em função do acometimento da \"pinta preta\" no pomar, o que instigou monitorar o cobre no tecido foliar. Quimicamente, micro-análises de XRF (X-Ray Fluorescence) mostraram que há uma maior concentração de cobre nas folhas provenientes da área convencional, o que é resultado das maiores quantidades do produto que são aplicadas nesse tratamento. Além disso, foi possível o isolamento de bactérias do gênero Enterococcus na filosfera, as quais apresentam mecanismos de tolerância ao cobre, demonstrando que os produtos cúpricos podem ter selecionado esses organismos. Logo, esse estudo apresentou uma importante perspectiva do efeito do manejo agrícola sobre a filosfera, contribuindo para a compreensão da dinâmica microbiana na agricultura. / The agricultural management applied to a agrosystem is an important determinant for the quality and productivity of the crop yield, also for the biological interactions that can be stablished between the plants and the local ecosystem. Conventional agriculture has being well known as an efficient and lucrative crop management. On the other hand, agroecology has gaining visibility in the agroindustry due to increasing demand for a more sustainable production alternative. The differences between both approaches can reflect on the microbial dynamic, affecting the composition and structure of these communities. Microrganisms inhabitating the foliar surface correspond to a microenvironment called phyllosphere, which is described as one of the most extensive habitats. Due to its constant exposition to environmental variables, several factors can influence on the bacterial community and modulate the phyllosphere. Thus, the main purpose of this study was to evaluate how the agricultural management can impact on the phyllospheric bacteria, also considering a temporal effect on structure and abundance of these organisms. The experimental area was provided by \"Mokiti Okada\" Research Center, located at em Mogi-Guaçu, São Paulo. The samples were representativelly collected from two treatment lines, one under convencional management, and the other under ecological management. Afterwards, culture-dependent and independent microbiological analysis allowed to identify the resident bacterial community in citros phyllosphere, which was greatly shared bewteen both treatments. However, NSG (New Generation Sequencing) analysis demonstrated a significative difference between the bacterial community under conventional and ecological management, where the second one demonstrated a higher diversity, which can be related to the different approaches applied. Although the agricultural method have demonstrated an important factor on bacterial composition, when temporally evaluated, it was observed a more intense interferance on the bacterial structure by the time of sampling (p=0,0001), representing a possible overlap of environmental factors on the phyllosphere. The data also indicate a decrease in the abundance of bacteria that might be resulted from the extra use of cupric products, related to the impairment of \"black spot\" on the crop, what lead to a copper monitoring in the foliar tissue. Chemically, XRF micro-analysis (X-Ray Fluorescence) demonstrated that there is a higher concentration of copper on the leaves from the conventional area, which is resulted of the higher application of its products by this method. Moreover, a search for copper-tolerant microrganisms was conducted, and it was possible to isolate Enterococcus bacteria, which have copper tolerance mechanisms. This result implicate that the use of cupric products may have selected these microrganisms on citros phyllosphere. Therefore, this study presented an important perspective of how the agricultural management can influence the phyllosfere, which can contribute to undertand about the microbial dynamic and its roles on the agriculture.
157

Diversité génétique des populations de cerfs élaphe (cervus elaphus) en Île-de-France en liaison avec l'anthropisation / Genetic diversity of the red deer (cervus elaphus) populations in Île-de-France in association with anthropization

Suez, Marie 24 September 2015 (has links)
Au cours des 60 dernières années le développement des infrastructures de transports (Autoroutes, Lignes Grandes Vitesse, Nationales doubles voies) a fragmenté l'habitat des cerfs élaphe (Cervus elaphus). D'après les observations naturalistes, cette anthropisation a causé la fragmentation de deux populations géographiques existantes en sept dans la partie Sud et d'une en trois dans la partie Nord. Afin d'évaluer l'impact de ces infrastructures sur la structuration génétique de ces populations de cerfs, nous avons échantillonné chacune de ces populations grâce à la coopération de trois fédérations de chasse. Le cours laps de temps écoulé depuis la construction de ces infrastructures nous a conduits à choisir comme marqueurs moléculaires les microsatellites, efficaces dans l'inférence d'évènements récents. Les nouvelles techniques de séquençages (NGS) permettent d'obtenir d'importants jeux de données rapidement, nous avons choisi d'utiliser ces méthodes de séquençage pour obtenir nos données. Aucun logiciel ne permettant de traiter les données de séquençage haut débit des microsatellites pour des espèces dont le génome n'est pas complètement séquencé, nous avons alors réalisé un programme, MicNeSs qui permet de génotyper rapidement et objectivement (sans intervention humaine) un grand nombre d'individus et de locus. Nous avons utilisé MicNeSs pour génotyper 345 individus pour 17 locus microsatellites. A partir de ce jeu de données, nous avons montré l'existence d'une structuration génétique des populations de cerfs élaphe en Île-de-France en liaison avec les infrastructures routières et ferroviaires. Nous avons mis en évidence un effet fort des jumelages autoroutes/LGV et une efficacité différentielle des passages grande faune de 2ème et 3ème génération sur les populations de cerfs élaphe en Île-de-France. / During the last 60 years, the development of urban areas, main roads, highways and railways in Île de France, has fragmented the habitat of the red deer (Cervus elaphus). According to naturalistic observations, it caused the fragmentation of the two existing putative populations in the South in to seven putative populations and one in three in the North.In order to estimate the impact of the infrastructure on the genetic structure of these populations we sampled each of the putative population with the help of three hunting societies. Due to the short time passed since the first highway construction we chose microsatellite loci as molecular markers, efficient in the inference of recent events. The next generation sequencing (NGS) enable to have quickly important data set, we chose to use this technic to obtain our data. No software permits to treat microsatellites data from NGS for the species without complete genome, we made one program, MicNeSs which genotypes quickly and objectively a lot of individuals and loci. We used MicNeSs to genotype 345 individuals for 17 microsatellite loci. With this data set we showed the presence of a genetic structure of the red deer populations in association with the road and rail infrastructure. We highlighted a strong impact of the paired of highway/railway and a differential efficiency of the wildlife passages of the second and third generation on the red deer populations in Île-de-France.
158

Employing Limited Next Generation Sequence Data for the Development of Genetic Loci of Phylogenetic and Population Genetic Utility

Evenstone, Lauren 02 July 2015 (has links)
Massively parallel high throughput sequencers are transforming the scientific research by reducing the cost and time necessary to sequence entire genomes. The goal of this project is to produce preliminary genome assemblies of calliphorid flies using Life Technologies’ Ion Torrent sequencing and Illumina’s MiSeq sequencing. I located, assembled, and annotated a novel mitochondrial genome for one such fly, the little studied Chrysomya pacifica that is central to one hypothesis about blow fly evolution. With sequencing data from Chrysomya megacephala, its forensically relevant sister species, much insight can be gained by alignments, sequence and protein analysis, and many more tools within the CLC Genomics Workbench software program. I present these analyses here of these recently diverged species.
159

Improved Error Correction of NGS Data

Alic, Andrei Stefan 15 July 2016 (has links)
[EN] The work done for this doctorate thesis focuses on error correction of Next Generation Sequencing (NGS) data in the context of High Performance Computing (HPC). Due to the reduction in sequencing cost, the increasing output of the sequencers and the advancements in the biological and medical sciences, the amount of NGS data has increased tremendously. Humans alone are not able to keep pace with this explosion of information, therefore computers must assist them to ease the handle of the deluge of information generated by the sequencing machines. Since NGS is no longer just a research topic (used in clinical routine to detect cancer mutations, for instance), requirements in performance and accuracy are more stringent. For sequencing to be useful outside research, the analysis software must work accurately and fast. This is where HPC comes into play. NGS processing tools should leverage the full potential of multi-core and even distributed computing, as those platforms are extensively available. Moreover, as the performance of the individual core has hit a barrier, current computing tendencies focus on adding more cores and explicitly split the computation to take advantage of them. This thesis starts with a deep analysis of all these problems in a general and comprehensive way (to reach out to a very wide audience), in the form of an exhaustive and objective review of the NGS error correction field. We dedicate a chapter to this topic to introduce the reader gradually and gently into the world of sequencing. It presents real problems and applications of NGS that demonstrate the impact this technology has on science. The review results in the following conclusions: the need of understanding of the specificities of NGS data samples (given the high variety of technologies and features) and the need of flexible, efficient and accurate tools for error correction as a preliminary step of any NGS postprocessing. As a result of the explosion of NGS data, we introduce MuffinInfo. It is a piece of software capable of extracting information from the raw data produced by the sequencer to help the user understand the data. MuffinInfo uses HTML5, therefore it runs in almost any software and hardware environment. It supports custom statistics to mould itself to specific requirements. MuffinInfo can reload the results of a run which are stored in JSON format for easier integration with third party applications. Finally, our application uses threads to perform the calculations, to load the data from the disk and to handle the UI. In continuation to our research and as a result of the single core performance limitation, we leverage the power of multi-core computers to develop a new error correction tool. The error correction of the NGS data is normally the first step of any analysis targeting NGS. As we conclude from the review performed within the frame of this thesis, many projects in different real-life applications have opted for this step before further analysis. In this sense, we propose MuffinEC, a multi-technology (Illumina, Roche 454, Ion Torrent and PacBio -experimental), any-type-of-error handling (mismatches, deletions insertions and unknown values) corrector. It surpasses other similar software by providing higher accuracy (demonstrated by three type of tests) and using less computational resources. It follows a multi-steps approach that starts by grouping all the reads using a k-mers based metric. Next, it employs the powerful Smith-Waterman algorithm to refine the groups and generate Multiple Sequence Alignments (MSAs). These MSAs are corrected by taking each column and looking for the correct base, determined by a user-adjustable percentage. This manuscript is structured in chapters based on material that has been previously published in prestigious journals indexed by the Journal of Citation Reports (on outstanding positions) and relevant congresses. / [ES] El trabajo realizado en el marco de esta tesis doctoral se centra en la corrección de errores en datos provenientes de técnicas NGS utilizando técnicas de computación intensiva. Debido a la reducción de costes y el incremento en las prestaciones de los secuenciadores, la cantidad de datos disponibles en NGS se ha incrementado notablemente. La utilización de computadores en el análisis de estas muestras se hace imprescindible para poder dar respuesta a la avalancha de información generada por estas técnicas. El uso de NGS transciende la investigación con numerosos ejemplos de uso clínico y agronómico, por lo que aparecen nuevas necesidades en cuanto al tiempo de proceso y la fiabilidad de los resultados. Para maximizar su aplicabilidad clínica, las técnicas de proceso de datos de NGS deben acelerarse y producir datos más precisos. En este contexto es en el que las técnicas de comptuación intensiva juegan un papel relevante. En la actualidad, es común disponer de computadores con varios núcleos de proceso e incluso utilizar múltiples computadores mediante técnicas de computación paralela distribuida. Las tendencias actuales hacia arquitecturas con un mayor número de núcleos ponen de manifiesto que es ésta una aproximación relevante. Esta tesis comienza con un análisis de los problemas fundamentales del proceso de datos en NGS de forma general y adaptado para su comprensión por una amplia audiencia, a través de una exhaustiva revisión del estado del arte en la corrección de datos de NGS. Esta revisión introduce gradualmente al lector en las técnicas de secuenciación masiva, presentando problemas y aplicaciones reales de las técnicas de NGS, destacando el impacto de esta tecnología en ciencia. De este estudio se concluyen dos ideas principales: La necesidad de analizar de forma adecuada las características de los datos de NGS, atendiendo a la enorme variedad intrínseca que tienen las diferentes técnicas de NGS; y la necesidad de disponer de una herramienta versátil, eficiente y precisa para la corrección de errores. En el contexto del análisis de datos, la tesis presenta MuffinInfo. La herramienta MuffinInfo es una aplicación software implementada mediante HTML5. MuffinInfo obtiene información relevante de datos crudos de NGS para favorecer el entendimiento de sus características y la aplicación de técnicas de corrección de errores, soportando además la extensión mediante funciones que implementen estadísticos definidos por el usuario. MuffinInfo almacena los resultados del proceso en ficheros JSON. Al usar HTML5, MuffinInfo puede funcionar en casi cualquier entorno hardware y software. La herramienta está implementada aprovechando múltiples hilos de ejecución por la gestión del interfaz. La segunda conclusión del análisis del estado del arte nos lleva a la oportunidad de aplicar de forma extensiva técnicas de computación de altas prestaciones en la corrección de errores para desarrollar una herramienta que soporte múltiples tecnologías (Illumina, Roche 454, Ion Torrent y experimentalmente PacBio). La herramienta propuesta (MuffinEC), soporta diferentes tipos de errores (sustituciones, indels y valores desconocidos). MuffinEC supera los resultados obtenidos por las herramientas existentes en este ámbito. Ofrece una mejor tasa de corrección, en un tiempo muy inferior y utilizando menos recursos, lo que facilita además su aplicación en muestras de mayor tamaño en computadores convencionales. MuffinEC utiliza una aproximación basada en etapas multiples. Primero agrupa todas las secuencias utilizando la métrica de los k-mers. En segundo lugar realiza un refinamiento de los grupos mediante el alineamiento con Smith-Waterman, generando contigs. Estos contigs resultan de la corrección por columnas de atendiendo a la frecuencia individual de cada base. La tesis se estructura por capítulos cuya base ha sido previamente publicada en revistas indexadas en posiciones dest / [CAT] El treball realitzat en el marc d'aquesta tesi doctoral se centra en la correcció d'errors en dades provinents de tècniques de NGS utilitzant tècniques de computació intensiva. A causa de la reducció de costos i l'increment en les prestacions dels seqüenciadors, la quantitat de dades disponibles a NGS s'ha incrementat notablement. La utilització de computadors en l'anàlisi d'aquestes mostres es fa imprescindible per poder donar resposta a l'allau d'informació generada per aquestes tècniques. L'ús de NGS transcendeix la investigació amb nombrosos exemples d'ús clínic i agronòmic, per la qual cosa apareixen noves necessitats quant al temps de procés i la fiabilitat dels resultats. Per a maximitzar la seua aplicabilitat clínica, les tècniques de procés de dades de NGS han d'accelerar-se i produir dades més precises. En este context és en el que les tècniques de comptuación intensiva juguen un paper rellevant. En l'actualitat, és comú disposar de computadors amb diversos nuclis de procés i inclús utilitzar múltiples computadors per mitjà de tècniques de computació paral·lela distribuïda. Les tendències actuals cap a arquitectures amb un nombre més gran de nuclis posen de manifest que és esta una aproximació rellevant. Aquesta tesi comença amb una anàlisi dels problemes fonamentals del procés de dades en NGS de forma general i adaptat per a la seua comprensió per una àmplia audiència, a través d'una exhaustiva revisió de l'estat de l'art en la correcció de dades de NGS. Esta revisió introduïx gradualment al lector en les tècniques de seqüenciació massiva, presentant problemes i aplicacions reals de les tècniques de NGS, destacant l'impacte d'esta tecnologia en ciència. D'este estudi es conclouen dos idees principals: La necessitat d'analitzar de forma adequada les característiques de les dades de NGS, atenent a l'enorme varietat intrínseca que tenen les diferents tècniques de NGS; i la necessitat de disposar d'una ferramenta versàtil, eficient i precisa per a la correcció d'errors. En el context de l'anàlisi de dades, la tesi presenta MuffinInfo. La ferramenta MuffinInfo és una aplicació programari implementada per mitjà de HTML5. MuffinInfo obté informació rellevant de dades crues de NGS per a afavorir l'enteniment de les seues característiques i l'aplicació de tècniques de correcció d'errors, suportant a més l'extensió per mitjà de funcions que implementen estadístics definits per l'usuari. MuffinInfo emmagatzema els resultats del procés en fitxers JSON. A l'usar HTML5, MuffinInfo pot funcionar en gairebé qualsevol entorn maquinari i programari. La ferramenta està implementada aprofitant múltiples fils d'execució per la gestió de l'interfície. La segona conclusió de l'anàlisi de l'estat de l'art ens porta a l'oportunitat d'aplicar de forma extensiva tècniques de computació d'altes prestacions en la correcció d'errors per a desenrotllar una ferramenta que suport múltiples tecnologies (Illumina, Roche 454, Ió Torrent i experimentalment PacBio). La ferramenta proposada (MuffinEC), suporta diferents tipus d'errors (substitucions, indels i valors desconeguts). MuffinEC supera els resultats obtinguts per les ferramentes existents en este àmbit. Oferix una millor taxa de correcció, en un temps molt inferior i utilitzant menys recursos, la qual cosa facilita a més la seua aplicació en mostres més gran en computadors convencionals. MuffinEC utilitza una aproximació basada en etapes multiples. Primer agrupa totes les seqüències utilitzant la mètrica dels k-mers. En segon lloc realitza un refinament dels grups per mitjà de l'alineament amb Smith-Waterman, generant contigs. Estos contigs resulten de la correcció per columnes d'atenent a la freqüència individual de cada base. La tesi s'estructura per capítols la base de la qual ha sigut prèviament publicada en revistes indexades en posicions destacades de l'índex del Journal of Citation Repor / Alic, AS. (2016). Improved Error Correction of NGS Data [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/67630 / TESIS
160

Porovnání eukaryotních genomů / Eucaryotic Genomes Comparison

Puterová, Janka January 2015 (has links)
Main motive of this master thesis was the need of good bioinformatics tools for genome comparison and improvement of one of the existing tools - RepeatExplorer. This work offers an overview of transposable elements in DNA, existing tools for identification and analysis of repetitions in sequenced genomes, summary of currently used genome sequencing methods. This work describes shortcomings of RepeatExplorer tool with focus on comparative analysis of genomes. Two solutions to remove these problems were designed and implemented. The first solution is designed for comparing pairs of genomes. The principle of this solution is based on comparison of similarity of distribution of contigs coverages using Kolmogorov-Smirnov test, thanks to which we are able to determine different parts in the genomes.The second solution, which is used to compare multiple genomes, is based on the method of mapping reads from compared genomes to the reference genome contigs and provides contigs coverage graphs, by which we are able to determine the variability of the repeats.Their functionality was verified on real NGS data of organism Silene latifolia.

Page generated in 0.0537 seconds