Global ETD Search

161	Analýza nádorové predispozice a funkční analýza variant nejasného významu / Analysis of cancer predisposition and functional analysis of variants of unknown significance Stolařová, Lenka January 2021 (has links) On average, 5-10% of all cancers occur in patients with hereditary tumors, who may have mutations in tens to hundreds of tumor predisposition genes. The phenotypes in mutation carriers overlap, and parallel analyses with sequencing panels is the method of choice in diagnostics. In our laboratory, we designed a universal panel and a targeted panel for a specific cancer, which allowed us to identify genetic alterations in patients with ovarian cancer, breast cancer, melanoma, and other cancers in the Czech Republic. The results of next generation sequencing (NGS) analyses show that the most frequent genetic alteration in ovarian cancers patients in the Czech Republic are hereditary mutations in BRCA1 (in 24% of unselected patients) and in malignant melanoma patients CDKN2A (in 2% of high risk patients). The presence of hereditary alterations is a clinically significant phenomenon affecting the prognosis and treatment of the disease. However, the interpretation of NGS findings is complicated by the presence of variants of unknown significance (VUS). We participate in the interpretation of VUS in the main predisposing genes BRCA1 and BRCA2 within the international consortium ENIGMA (Evidence-based Network for the Interpretation of Germline Mutant Alleles). Our and international results of the most...
162	Re-analýza pacientů se suspektním FAP onemocněním (familiární adenomatózní polypóza) / Re-analysis of suspected patients with FAP disease (Familial adenomatous polyposis) Slavíková, Petra January 2021 (has links) Familial adenomatous polyposis (FAP) is a condition caused by germline mutations in tumor suppressor gene APC, inherited in autosomal dominant manner. Patients with FAP develop hundreds to thousands of adenomatous colorectal polyps with extremely high risk of malignant reversal into adenocarcinoma of colon and/or rectum. The aim of this thesis is to re-analyze a cohort of highly suspected FAP probands from years 1993-2004 whose diagnosis previously failed to be confirmed by at that time commonly used methods of molecular diagnostics. Next generation sequencing on MiSeq and NextSeq platforms (Illumina®) was performed on 78 samples of probands' DNA, isolated from peripheral blood, using gene panel CZECANCA version 1.2 (Czech Cancer Panel for Clinical Application). The panel enables sequencing of exons and exon-intron junctions of 226 genes linked to hereditary cancer predispositions, newly also including the diagnostically important promoter 1B region of APC. Pathogenic variant in the APC gene was detected in 18 % of re-analyzed probands, 11 % of probands carry pathogenic variants in other genes associated with colorectal polyps. Additional 13 % of probands are carriers of a variants of unknown clinical significance. NGS gene panel CZECANCA enabled diagnosis confirmation or re-evaluation of 22 FAP...
163	Development of a simplified and cost effective norovirus capsid typing method using next generation sequencing Eriksson, Ronnie January 2023 (has links) Human noroviruses are a major cause of acute gastroenteritis worldwide and can betransmitted through consumption of contaminated raw food. Shellfish like oysters can becontaminated by human sewage during production and accumulate multiple Norovirus strainsin low concentrations. Here we developed a simplified and cost effective targetedmetagenomic approach by sequencing PCR amplicons with next generation sequencing(NGS) of the capsid (VP1) viral gene. New design of reverse primers using the codehopstrategy and direct addition of illumina adapter with one step RT-PCR and sequencing onnano chip reduced hand on time and cost of the analysis. A mix of faecal samples and oystersamples associated with outbreaks were used to evaluate the ability and limitations in theidentification of strains from Norovirus genogroup I (GI) and genogroup II (GII). Withsamples containing only one genotype the method was able to identify all strains. Usingartificially mixed samples the method was able to identify almost all strains except a few GIIat low concentrations. Oyster samples showed more limitations for the method and it waswere only able to identify the strain in some of the samples but did find multiple GI strains inmore than one sample. Despite some limitations, the simplified method for VP1-targetedmetagenomics is a sensitive approach allowing the study of norovirus diversity incontaminated oysters and the identification of norovirus strains implicated in outbreaks. Thisat a lower cost and hands on time compared to published methods. PCR Foodborne virus Food safety Illumina NGS Infectious Medicine Infektionsmedicin Microbiology in the medical area
164	The Differential Regulation of Transfer RNA in Higher Eukaryotes and Their Emerging Role in Malignancy Pinkard, Otis William, III 26 May 2023 (has links) No description available. Biology Bioinformatics Molecular Biology Organismal Biology Nutrition
165	Pipeline for Next Generation Sequencing data of phage displayed libraries to support affinity ligand discovery Schleimann-Jensen, Ella January 2022 (has links) Affinity ligands are important molecules used in affinity chromatography for purification of significant substances from complex mixtures. To find affinity ligands specific to important target molecules could be a challenging process. Cytiva uses the powerful phage display technique to find new promising affinity ligands. The phage display technique is a method run in several enrichment cycles. When developing new affinity ligands, a protein scaffold library with a diversity of up to 1010-1011 different protein scaffold variants is run through the enrichment cycles. The result from the phage display rounds is screened for target molecule binding followed by sequencing, usually with one of the conventional screening methods ELISA or Biacore followed by Sanger sequencing. However, the throughput of these analyses are unfortunately very low, often with only a few hundred screened clones. Therefore, Next Generation Sequencing or NGS, has become an increasingly popular screening method for phage display libraries which generates millions of sequences from each phage display round. This creates a need for a robust data analysis pipeline to be able to interpret the large amounts of data. In this project, a pipeline for analysis of NGS data of phage displayed libraries has been developed at Cytiva. Cytiva uses NGS as one of their screening methods of phage displayed protein libraries because of the high throughput compared to the conventional screening methods. The purpose is to find new affinity ligands for purification of essential substances used in drugs. The pipeline has been created using the object-oriented programming language R and consists of several analyses covering the most important steps to be able to find promising results from the NGS data. With the developed pipeline the user can analyze the data on both DNA and protein sequence level and per position residue breakdown, as well as filter the data based on specific amino acids and positions. This gives a robust and thorough analysis which can lead to promising results that can be used in the development of novel affinity ligands for future purification products. NGS next generation sequencing phage display affinity ligands data analysis bioinformatics Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Computer Sciences Datavetenskap (datalogi)
166	Improved Error Correction of NGS Data Alic, Andrei Stefan 15 July 2016 (has links) Tesis por compendio / [EN] The work done for this doctorate thesis focuses on error correction of Next Generation Sequencing (NGS) data in the context of High Performance Computing (HPC). Due to the reduction in sequencing cost, the increasing output of the sequencers and the advancements in the biological and medical sciences, the amount of NGS data has increased tremendously. Humans alone are not able to keep pace with this explosion of information, therefore computers must assist them to ease the handle of the deluge of information generated by the sequencing machines. Since NGS is no longer just a research topic (used in clinical routine to detect cancer mutations, for instance), requirements in performance and accuracy are more stringent. For sequencing to be useful outside research, the analysis software must work accurately and fast. This is where HPC comes into play. NGS processing tools should leverage the full potential of multi-core and even distributed computing, as those platforms are extensively available. Moreover, as the performance of the individual core has hit a barrier, current computing tendencies focus on adding more cores and explicitly split the computation to take advantage of them. This thesis starts with a deep analysis of all these problems in a general and comprehensive way (to reach out to a very wide audience), in the form of an exhaustive and objective review of the NGS error correction field. We dedicate a chapter to this topic to introduce the reader gradually and gently into the world of sequencing. It presents real problems and applications of NGS that demonstrate the impact this technology has on science. The review results in the following conclusions: the need of understanding of the specificities of NGS data samples (given the high variety of technologies and features) and the need of flexible, efficient and accurate tools for error correction as a preliminary step of any NGS postprocessing. As a result of the explosion of NGS data, we introduce MuffinInfo. It is a piece of software capable of extracting information from the raw data produced by the sequencer to help the user understand the data. MuffinInfo uses HTML5, therefore it runs in almost any software and hardware environment. It supports custom statistics to mould itself to specific requirements. MuffinInfo can reload the results of a run which are stored in JSON format for easier integration with third party applications. Finally, our application uses threads to perform the calculations, to load the data from the disk and to handle the UI. In continuation to our research and as a result of the single core performance limitation, we leverage the power of multi-core computers to develop a new error correction tool. The error correction of the NGS data is normally the first step of any analysis targeting NGS. As we conclude from the review performed within the frame of this thesis, many projects in different real-life applications have opted for this step before further analysis. In this sense, we propose MuffinEC, a multi-technology (Illumina, Roche 454, Ion Torrent and PacBio -experimental), any-type-of-error handling (mismatches, deletions insertions and unknown values) corrector. It surpasses other similar software by providing higher accuracy (demonstrated by three type of tests) and using less computational resources. It follows a multi-steps approach that starts by grouping all the reads using a k-mers based metric. Next, it employs the powerful Smith-Waterman algorithm to refine the groups and generate Multiple Sequence Alignments (MSAs). These MSAs are corrected by taking each column and looking for the correct base, determined by a user-adjustable percentage. This manuscript is structured in chapters based on material that has been previously published in prestigious journals indexed by the Journal of Citation Reports (on outstanding positions) and relevant congresses. / [ES] El trabajo realizado en el marco de esta tesis doctoral se centra en la corrección de errores en datos provenientes de técnicas NGS utilizando técnicas de computación intensiva. Debido a la reducción de costes y el incremento en las prestaciones de los secuenciadores, la cantidad de datos disponibles en NGS se ha incrementado notablemente. La utilización de computadores en el análisis de estas muestras se hace imprescindible para poder dar respuesta a la avalancha de información generada por estas técnicas. El uso de NGS transciende la investigación con numerosos ejemplos de uso clínico y agronómico, por lo que aparecen nuevas necesidades en cuanto al tiempo de proceso y la fiabilidad de los resultados. Para maximizar su aplicabilidad clínica, las técnicas de proceso de datos de NGS deben acelerarse y producir datos más precisos. En este contexto es en el que las técnicas de comptuación intensiva juegan un papel relevante. En la actualidad, es común disponer de computadores con varios núcleos de proceso e incluso utilizar múltiples computadores mediante técnicas de computación paralela distribuida. Las tendencias actuales hacia arquitecturas con un mayor número de núcleos ponen de manifiesto que es ésta una aproximación relevante. Esta tesis comienza con un análisis de los problemas fundamentales del proceso de datos en NGS de forma general y adaptado para su comprensión por una amplia audiencia, a través de una exhaustiva revisión del estado del arte en la corrección de datos de NGS. Esta revisión introduce gradualmente al lector en las técnicas de secuenciación masiva, presentando problemas y aplicaciones reales de las técnicas de NGS, destacando el impacto de esta tecnología en ciencia. De este estudio se concluyen dos ideas principales: La necesidad de analizar de forma adecuada las características de los datos de NGS, atendiendo a la enorme variedad intrínseca que tienen las diferentes técnicas de NGS; y la necesidad de disponer de una herramienta versátil, eficiente y precisa para la corrección de errores. En el contexto del análisis de datos, la tesis presenta MuffinInfo. La herramienta MuffinInfo es una aplicación software implementada mediante HTML5. MuffinInfo obtiene información relevante de datos crudos de NGS para favorecer el entendimiento de sus características y la aplicación de técnicas de corrección de errores, soportando además la extensión mediante funciones que implementen estadísticos definidos por el usuario. MuffinInfo almacena los resultados del proceso en ficheros JSON. Al usar HTML5, MuffinInfo puede funcionar en casi cualquier entorno hardware y software. La herramienta está implementada aprovechando múltiples hilos de ejecución por la gestión del interfaz. La segunda conclusión del análisis del estado del arte nos lleva a la oportunidad de aplicar de forma extensiva técnicas de computación de altas prestaciones en la corrección de errores para desarrollar una herramienta que soporte múltiples tecnologías (Illumina, Roche 454, Ion Torrent y experimentalmente PacBio). La herramienta propuesta (MuffinEC), soporta diferentes tipos de errores (sustituciones, indels y valores desconocidos). MuffinEC supera los resultados obtenidos por las herramientas existentes en este ámbito. Ofrece una mejor tasa de corrección, en un tiempo muy inferior y utilizando menos recursos, lo que facilita además su aplicación en muestras de mayor tamaño en computadores convencionales. MuffinEC utiliza una aproximación basada en etapas multiples. Primero agrupa todas las secuencias utilizando la métrica de los k-mers. En segundo lugar realiza un refinamiento de los grupos mediante el alineamiento con Smith-Waterman, generando contigs. Estos contigs resultan de la corrección por columnas de atendiendo a la frecuencia individual de cada base. La tesis se estructura por capítulos cuya base ha sido previamente publicada en revistas indexadas en posiciones dest / [CA] El treball realitzat en el marc d'aquesta tesi doctoral se centra en la correcció d'errors en dades provinents de tècniques de NGS utilitzant tècniques de computació intensiva. A causa de la reducció de costos i l'increment en les prestacions dels seqüenciadors, la quantitat de dades disponibles a NGS s'ha incrementat notablement. La utilització de computadors en l'anàlisi d'aquestes mostres es fa imprescindible per poder donar resposta a l'allau d'informació generada per aquestes tècniques. L'ús de NGS transcendeix la investigació amb nombrosos exemples d'ús clínic i agronòmic, per la qual cosa apareixen noves necessitats quant al temps de procés i la fiabilitat dels resultats. Per a maximitzar la seua aplicabilitat clínica, les tècniques de procés de dades de NGS han d'accelerar-se i produir dades més precises. En este context és en el que les tècniques de comptuación intensiva juguen un paper rellevant. En l'actualitat, és comú disposar de computadors amb diversos nuclis de procés i inclús utilitzar múltiples computadors per mitjà de tècniques de computació paral·lela distribuïda. Les tendències actuals cap a arquitectures amb un nombre més gran de nuclis posen de manifest que és esta una aproximació rellevant. Aquesta tesi comença amb una anàlisi dels problemes fonamentals del procés de dades en NGS de forma general i adaptat per a la seua comprensió per una àmplia audiència, a través d'una exhaustiva revisió de l'estat de l'art en la correcció de dades de NGS. Esta revisió introduïx gradualment al lector en les tècniques de seqüenciació massiva, presentant problemes i aplicacions reals de les tècniques de NGS, destacant l'impacte d'esta tecnologia en ciència. D'este estudi es conclouen dos idees principals: La necessitat d'analitzar de forma adequada les característiques de les dades de NGS, atenent a l'enorme varietat intrínseca que tenen les diferents tècniques de NGS; i la necessitat de disposar d'una ferramenta versàtil, eficient i precisa per a la correcció d'errors. En el context de l'anàlisi de dades, la tesi presenta MuffinInfo. La ferramenta MuffinInfo és una aplicació programari implementada per mitjà de HTML5. MuffinInfo obté informació rellevant de dades crues de NGS per a afavorir l'enteniment de les seues característiques i l'aplicació de tècniques de correcció d'errors, suportant a més l'extensió per mitjà de funcions que implementen estadístics definits per l'usuari. MuffinInfo emmagatzema els resultats del procés en fitxers JSON. A l'usar HTML5, MuffinInfo pot funcionar en gairebé qualsevol entorn maquinari i programari. La ferramenta està implementada aprofitant múltiples fils d'execució per la gestió de l'interfície. La segona conclusió de l'anàlisi de l'estat de l'art ens porta a l'oportunitat d'aplicar de forma extensiva tècniques de computació d'altes prestacions en la correcció d'errors per a desenrotllar una ferramenta que suport múltiples tecnologies (Illumina, Roche 454, Ió Torrent i experimentalment PacBio). La ferramenta proposada (MuffinEC), suporta diferents tipus d'errors (substitucions, indels i valors desconeguts). MuffinEC supera els resultats obtinguts per les ferramentes existents en este àmbit. Oferix una millor taxa de correcció, en un temps molt inferior i utilitzant menys recursos, la qual cosa facilita a més la seua aplicació en mostres més gran en computadors convencionals. MuffinEC utilitza una aproximació basada en etapes multiples. Primer agrupa totes les seqüències utilitzant la mètrica dels k-mers. En segon lloc realitza un refinament dels grups per mitjà de l'alineament amb Smith-Waterman, generant contigs. Estos contigs resulten de la correcció per columnes d'atenent a la freqüència individual de cada base. La tesi s'estructura per capítols la base de la qual ha sigut prèviament publicada en revistes indexades en posicions destacades de l'índex del Journal of Citation Repor / Alic, AS. (2016). Improved Error Correction of NGS Data [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/67630 / Compendio Error Correction NGS TGS Next Generation Sequencing Statistics HTML5 C++ OpenMP Parallel Review FastQ FastA
167	Microfluidic Technology for Low-Input Epigenomic Analysis Zhu, Yan 25 May 2018 (has links) Epigenetic modifications, such as DNA methylation and histone modifications, play important roles in gene expression and regulation, and are highly involved in cellular processes such as stem cell pluripotency/differentiation and tumorigenesis. Chromatin immunoprecipitation (ChIP) is the technique of choice for examining in vivo DNA-protein interactions and has been a great tool for studying epigenetic mechanisms. However, conventional ChIP assays require millions of cells for tests and are not practical for examination of samples from lab animals and patients. Automated microfluidic chips offer the advantage to handle small sample sizes and facilitate rapid reaction. They also eliminate cumbersome manual handling. In this report, I will talk about three different projects that utilized microfluidic immunoprecipitation followed by next genereation sequencing technologies to enable low input and high through epigenomics profiling. First, I examined RNA polymerase II transcriptional regulation with microfluidic chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) assays. Second, I probed the temporal dynamics in the DNA methylome during cancer development using a transgenic mouse model with microfluidic methylated DNA immunoprecipitation followed by next generation sequencing (MeDIP-seq) assays. Third, I explored negative enrichment of circulating tumor cells (CTCs) followed by microfluidic ChIP-seq technology for studying temporal dynamic histone modification (H3K4me3) of patient-derived tumor xenograft on an immunodeficient mouse model during the course of cancer metastasis. In the first study, I adapted microfluidic ChIP-seq devices to achieve ultrahigh sensitivity to study Pol2 transcriptional regulation from scarce cell samples. I dramatically increased the assay sensitivity to an unprecedented level (~50 K cells for pol2 ChIP-seq). Importantly, this is three orders of magnitude more sensitive than the prevailing pol2 ChIP-seq assays. I showed that MNase digestion provided better ChIP-seq signal than sonication, and two-steps fixation with MNase digestion provided the best ChIP-seq quality followed by one-step fixation with MNase digestion, and lastly, no fixation with MNase digestion. In the second study, I probed dynamic epigenomic changes during tumorigenesis using mice often require profiling epigenomes using a tiny quantity of tissue samples. Conventional epigenomic tests do not support such analysis due to the large amount of materials required by these assays. In this study, I developed an ultrasensitive microfluidics-based methylated DNA immunoprecipitation followed by next-generation sequencing (MeDIP-seq) technology for profiling methylomes using as little as 0.5 ng DNA (or ~100 cells) with 1.5 h on-chip process for immunoprecipitation. This technology enabled me to examine genome-wide DNA methylation in a C3(1)/SV40 T-antigen transgenic mouse model during different stages of mammary cancer development. Using this data, I identified differentially methylated regions and their associated genes in different periods of cancer development. Interestingly, the results showed that methylomic features are dynamic and change with tumor developmental stage. In the last study, I developed a negative enrichment of CTCs followed by ultrasensitive microfluidic ChIP-seq technology for profiling histone modification (H3K4Me3) of CTCs to resolve the technical challenges associated with CTC isolation and difficulties related with tools for profiling whole genome histone modification on tiny cell samples. / Ph. D. / The human genome has been sequenced and completed over a decade ago. The information provided by the genomic map inspired numerous studies on genetic variations and their roles in diseases. However, genomic information alone is not always sufficient to explain important biological processes. Gene activation and expression are not only associated with alteration in the DNA sequence, but also affected by other changes to DNA and histones. Epigenetics refers to the molecular mechanisms that affect gene expression and phenotypes without involving changes in the DNA sequence. For example, the DNA can get methylated, the histone protein that is wrapped around by DNA can also get methylated or acetylatied, and transcription factors can bind to different part of DNA. All of these can affect gene expression without alter the DNA sequences. Epigenetic changes occur throughout all stages of cell development or in response to environmental cues. They change transcription patterns in a tissue/cell-specific fashion. For example, transcriptional silencing of tumor-suppressor genes by DNA methylation plays an important role in cancer development. Therefore, understanding of epigenetic regulations will help to improve various aspects of biomedicine. For instance, personalized medicine can be vi tailored based on epigenetic profile of certain patient to specifically control gene expression in the disease treatment. However, the technology for profiling epigenetic modifications, i.e. Chromatin Immunoprecipitation (ChIP), suffers from serious limitations. The key limitation is the sensitivity of the assay. Conventional assay requires a large number of cells (>10⁶ cells per ChIP). This is feasible when using cell lines. However, such requirement has become a major challenge when primary cells are used because very limited amounts of samples can be generated from lab animals or patients. Population heterogeneity information may also be lost when a large cell number is used. In this project, we developed an automated ultrasensitive microfluidic chromatin/DNA immunoprecipitation followed by next-generation sequencing (ChIP/MeDIP-Seq) technology for profiling epigenetic modifications (e.g., histone modifications, transcriptional regulations, and DNA methylation). We extensively optimized design parameters for each and every step of ChIP/MeDIP (e.g. sonication/crosslinking time, antibody concentration, washing conditions) in order to reach highest sensitivity of 0.1 ng DNA (or ~50-100 cells) as starting material for IP, which is roughly 4-5 orders of magnitude higher than the prevailing protocol and 2-3 orders of magnitude higher than the-state-of-the-art(~50 ng). With such sensitivity, we were able to study temporal dynamics in the DNA methylomes during the various stages of mammary cancer development from a transgenic mouse mode. We were able to investigate transcriptional regulation of RNA polymerase II from scarce cell samples. We were also able to study histone modification (H3K4Me3) of circulating tumor cells during cancer metastasis. Chromatin immunoprecipitation (ChIP) Next generation sequencing (NGS) Epigenetics Transcriptional regulations DNA methylation Histone modifications Microfluidics Circulating tumor cell (CTC)
168	Genomic and transcriptomic sequencing in chronic lymphocytic leukemia Cortese, Diego January 2016 (has links) Identification of recurrent mutations through next-generation sequencing (NGS) has given us a deeper understanding of the molecular mechanisms involved in chronic lymphocytic leukemia (CLL) development and progression and provided novel means for risk assessment in this clinically heterogeneous disease. In paper I, we screened a population-based cohort of CLL patients (n=364) for TP53, NOTCH1, SF3B1, BIRC3 and MYD88 mutations using Sanger sequencing, and confirmed the negative prognostic impact of TP53, SF3B1 or NOTCH1 aberrations, though at lower frequencies compared to previous studies. In paper II, we assessed the feasibility of targeted NGS using a gene panel including 9 CLL-related genes in a large patient cohort (n=188). We could validate 93% (144/155) of mutations with Sanger sequencing; the remaining were at the detection limit of the latter technique, and technical replication showed a high concordance (77/82 mutations, 94%). In paper III, we performed a longitudinal study of CLL patients (n=41) relapsing after fludarabine, cyclophosphamide and rituximab (FCR) therapy using whole-exome sequencing. In addition to known poor-prognostic mutations (NOTCH1, TP53, ATM, SF3B1, BIRC3, and NFKBIE), we detected mutations in a ribosomal gene, RPS15, in almost 20% of cases (8/41). In extended patient series, RPS15-mutant cases had a poor survival similar to patients with NOTCH1, SF3B1, or 11q aberrations. In vitro studies revealed that RPS15mut cases displayed reduced p53 stabilization compared to cases wildtype for RPS15. In paper IV, we performed RNA-sequencing in CLL patients (n=50) assigned to 3 clinically and biologically distinct subsets carrying stereotyped B-cell receptors (i.e. subsets #1, #2 and #4) and revealed unique gene expression profiles for each subset. Analysis of SF3B1-mutated versus wildtype subset #2 patients revealed a large number of splice variants (n=187) in genes involved in chromatin remodeling and ribosome biogenesis. Taken together, this thesis confirms the prognostic impact of recurrent mutations and provides data supporting implementation of targeted NGS in clinical routine practice. Moreover, we provide evidence for the involvement of novel players, such as RPS15, in disease progression and present transcriptome data highlighting the potential of global approaches for the identification of molecular mechanisms contributing to CLL development within prognostically relevant subgroups. chronic lymphocytic leukemia CLL genomics transcriptomics DNA RNA mutations NGS whole-exome sequencing prognostic markers TP53 SF3B1 RPS15 relapse stereotyped subsets.
169	Genome-wide analysis of selection in mammals, insects and fungi Ridout, Kate E. January 2012 (has links) Characterising and understanding factors that affect the rate of molecular evolution in proteins has played a major part in the development of evolutionary theory. The early analyses of amino acid substitutions stimulated the development of the neutral theory of molecular evolution, which later evolved into the nearly neutral theory. More recent work has lead to a better understanding of the role selection plays at the molecular level, but there is still limited understanding of how higher levels of protein organisation affect the way natural selection acts. The investigation of this question is the central aim of this thesis, which is addressed via the analysis of selective pressures in secondary protein structures in insects, mammals and fungi. The analyses for the first two groups were conducted using publically available datasets. To conduct the analyses in fungi, genome sequence data from the fungal genus Microbotryum (sequenced in our laboratory) was assembled and annotated, resulting in the development of a number of bioinformatics tools which are described here. The fungal, insect and mammalian datasets were interrogated with regard to a number of structural features, such as protein secondary structure, position of a site with regard to adaptively evolving sites, hydropathy and solvent-accessibility. These features were correlated with the signals of positive and purifying selection detected using phylogenetic maximum likelihood and Bayesian approaches. I conclude that all of the factors examined can have an effect on the rate of molecular evolution. In particular, disordered and hydrophilic regions of the protein are found to experience fewer physiochemical constraints and contain a higher proportion of adaptively evolving sites. It is also revealed that positively selected residues are ‘clustered’ together spatially, and these trends persist in the three taxa. Finally, I show that this variation in adaptive evolution is a result of both selective events and physiochemical constraint. 572.838
170	Deciphering the ontogeny of unmutated and mutated subsets of Chronic Lymphocytic Leukemia Mohamed, Ahmed January 2019 (has links) Chronic Lymphocytic Leukemia (CLL) is a type of cancer that affects the B cells of the immune system causing problems in the process of producing antibodies. It can be sorted into mutated and unmutated CLL based on the percentage of somatic mutations in the Immunoglobulin Heavy chain Variable region (IgHV). The B cells of healthy individuals can be sorted into three groups; CD27dull memory B cells (MBCs), CD27bright MBCs and naïve B cells. The hypothesis for the project was that the unmutated CLL subset originates from CD27dull MBCs and the mutated CLL subset originates from CD27bright MBCs. RNA-sequencing data from healthy individuals were acquired from a collaboration partner in Rome and CLL-patients were collected from public datasets available online. Several bioinformatic tools were used to analyze the data. First, the quality of the data files was checked, then adapter sequence from the sequencing process and low-quality bases were removed (trimming). Good quality of the files was confirmed after the trimming. Secondly, these files were mapped against the human reference genome (GRCh38/hg38) for alignment, then the resulted data was used to check for genes that showed differential expression between the different groups. Results were analyzed and visualized using Venn diagrams, Principal Component Analysis (PCA) and heatmap plots and random forest. A list of 85 genes was generated based on the different comparisons and was used in one PCA plot that showed clear separation between the different groups. The SWAP70 gene was analyzed for single nucleotide polymorphisms (SNPs). The study concluded five genes that could be used as biomarkers for CLL and the diagnosis of its subtypes where some of them were discussed in previous studies. Also, the mutated CLL subset showed a similar behavior to the healthy individuals and this could validate the original hypothesis and justifies the better disease prognosis for this subtype. CLL NGS mutated CLL unmutated CLL CD27 bright CD27 dull memory B cell RNA-sequencing Bioinformatics and Systems Biology Bioinformatik och systembiologi Immunology Immunologi

Search results