Global ETD Search

1	Coverage Analysis in Clinical Next-Generation Sequencing Odelgard, Anna January 2019 (has links) With the new way of sequencing by NGS new tools had to be developed to be able to work with new data formats and to handle the larger data sizes compared to the previous techniques but also to check the accuracy of the data. Coverage analysis is one important quality control for NGS data, the coverage indicates how many times each base pair has been sequenced and thus how trustworthy each base call is. For clinical purposes every base of interest must be quality controlled as one wrong base call could affect the patient negatively. The softwares used for coverage analysis with enough accuracy and detail for clinical applications are sparse. Several softwares like Samtools, are able to calculate coverage values but does not further process this information in a useful way to produce a QC report of each base pair of interest. My master thesis has therefore been to create a new coverage analysis report tool, named CAR tool, that extract the coverage values from Samtools and further uses this data to produce a report consisting of tables, lists and figures. CAR tool is created to replace the currently used tool, ExCID, at the Clinical Genomics facility at SciLifeLab in Uppsala and was developed to meet the needs of the bioinformaticians and clinicians. CAR tool is written in python and launched from a terminal window. The main function of the tool is to display coverage breath values for each region of interest and to extract all sub regions below a chosen coverage depth threshold. The low coverage regions are then reported together with region name, start and stop positions, length and mean coverage value. To make the tool useful to as many as possible several settings are possible by entering different flags when calling the tool. Such settings can be to generate pie charts of each region’s coverage values, filtering of the read and bases by quality or write your own entry that will be used for the coverage calculation by Samtools. The tool has been proved to find these low coverage regions very well. Most low regions found are also found by ExCID, the currently used tool, some differences did however occur and every such region was verified by IGV. The coverage values shown in IGV coincided with those found by CAR tool. CAR tool is written to find all low coverage regions even if they are only one base pair long, while ExCID instead seem to generate larger low regions not taking very short low regions into account. To read more about the functions and how to use CAR tool I refer to User instructions in the appendix and on GitHub at the repository anod6351 Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
2	Implementation of an automatic quality control of derived data files for NONMEM Sandström, Eric January 2019 (has links) A pharmacometric analysis must be based on correct data to be valid. Source clinical data is rarely ready to be modelled as is, but rather needs to be reprogrammed to fit the format required by the pharmacometric modelling software. The reprogramming steps include selecting the subsets of data relevant for modelling, deriving new information from the source and adjusting units and encoding. Sometimes, the source data may also be flawed, containing vague definitions and missing or confusing values. In either setting, the source data needs to be reprogrammed to remedy this, followed by extensive quality control to capture any errors or inconsistencies produced along the way. The quality control is a lengthy task which is often performed manually, either by the scientists conducting the pharmacometric study or by independent reviewers. This project presents an automatic data quality control with the purpose of aiding the data curation process, as to minimize any potential errors that would otherwise have to be detected by the manual quality control. The automatic quality control is implemented as an R-package and is specifically tailored for the needs of Pharmetheus. Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
3	Predicting adverse drug reactions in cancer treatment using a neural network based approach Hillerton, Thomas January 2018 (has links) No description available. Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
4	Development of a phylogenomic framework for the krill Gevorgyan, Arusjak January 2018 (has links) Over the last few decades, many krill stocks have declined in size and number,likely as a consequence of global climate change (Siegel 2016). A major risk factoris the increased level of carbon dioxide (CO2) in the ocean. A collapse of the krillpopulation has the potential to cause disruption of the ocean ecosystem, as krill arethe main connection between primary producers such as phytoplankton and largeranimals (Murphy et al. 2012). The aim of this project is to produce the firstphylogenomic framework with help of powerful comparative bioinformatics andphylogenomic methods in order to find and analyse the genes that help krill adaptto its environment. Problem with these studies is that we still do not have access toa reference genome sequence of any krill species. To strengthen and increase trustin our studies two different pipelines were performed, each with different OrthologyAssessment Toolkits (OATs), Orthograph and UPhO, in order to establish orthologyrelationships between transcripts/genes. Since UPhO produces well-supportedtrees where the majority of the gene trees match the species tree, it isrecommended as the proper OATs for generating a robust molecular phylogeny ofkrill. The second aim with his project was to estimate the level of positive selectionin E. superba in order to lay a foundation about level of selection acting on proteincodingsequences in krill. As expected, the level of selection was quite high in E.superba, which indicates that krill are adapted to the changing environment bypositive selection rather than natural genetic drift. Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
5	Evaluation of de novo assembly using PacBio long reads Che, Huiwen January 2016 (has links) New sequencing technologies show promise for the construction of complete and accurate genome sequences, by a process called de novo assembly that joins reads by overlap to longer contiguous sequences without the need for a reference genome. High-quality de novo assembly leads to better understanding in genetic variations. The purpose of this thesis is to evaluate human genome sequences obtained from the PacBio sequencing platform, which is a new technology suitable for de novo assembly of large genomes. The evaluation focuses on comparing sequence identity between our own de novo assemblies and the available human reference and through that, benchmark accuracy of our data. Sequences that are absent from the reference genome, are investigated for potential unannotated genes coordinately. We also assess the complex structural variation using different approaches. Our assemblies show high consensus with the human reference genome, with ⇠ 98.6% of the bases in the assemblies mapped to the human reference. We also detect more than ten thousand of structural variants, including some large rearrangements, with respect to the reference. Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
6	A bioinformaticians view on the evolution of smell perception Anders, Patrizia January 2006 (has links) Background: The origin of vertebrate sensory systems still contains many mysteries and thus challenges to bioinformatics. Especially the evolution of the sense of smell maintains important puzzles, namely the question whether or not the vomeronasal system is older than the main olfactory system. Here I compare receptor sequences of the two distinct systems in a phylogenetic study, to determine their relationships among several different species of the vertebrates. Results: Receptors of the two olfactory systems share little sequence similarity and prove to be a challenge in multiple sequence alignment. However, recent dramatical improvements in the area of alignment tools allow for better results and high confidence. Different strategies and tools were employed and compared to derive a high quality alignment that holds information about the evolutionary relationships between the different receptor types. The resulting Maximum-Likelihood tree supports the theory that the vomeronasal system is rather an ancestor of the main olfactory system instead of being an evolutionary novelty of tetrapods. Conclusions: The connections between the two systems of smell perception might be much more fundamental than the common architecture of receptors. A better understanding of these parallels is desirable, not only with respect to our view on evolution, but also in the context of the further exploration of the functionality and complexity of odor perception. Along the way, this work offers a practical protocol through the jungle of programs concerned with sequence data and phylogenetic reconstruction. Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
7	Using an ontology to enhance metabolic or signaling pathway comparisions by biological and chemical knowledge Pohl, Matin January 2006 (has links) Motivation: As genome-scale efforts are ongoing to investigate metabolic networks of miscellaneous organisms the amount of pathway data is growing. Simultaneously an increasing amount of gene expression data from micro arrays becomes available for reverse engineering, delivering e.g. hypothetical regulatory pathway data. To avoid outgrowing of data and keep control of real new informations the need of analysis tools arises. One vital task is the comparison of pathways for detection of similar functionalities, overlaps, or in case of reverse engineering, detection of known data corroborating a hypothetical pathway. A comparison method using ontological knowledge about molecules and reactions will feature a more biological point of view which graph theoretical approaches missed so far. Such a comparison attempt based on an ontology is described in this report. Results: An algorithm is introduced that performs a comparison of pathways component by component. The method was performed on two selected databases and the results proved it to be not satisfying using it as stand-alone method. Further development possibilities are suggested and steps toward an integrated method using several approaches are recommended. Availability: The source code, used database snapshots and pictures can be requested from the author. Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
8	Shape Analysis and Measurement for the HeLa cell classification of cultured cells in high throughput screening Huque, Enamul January 2006 (has links) Feature extraction by digital image analysis and cell classification is an important task for cell culture automation. In High Throughput Screening (HTS) where thousands of data points are generated and processed at once, features will be extracted and cells will be classified to make a decision whether the cell-culture is going on smoothly or not. The culture is restarted if a problem is detected. In this thesis project HeLa cells, which are human epithelial cancer cells, are selected for the experiment. The purpose is to classify two types of HeLa cells in culture: Cells in cleavage that are round floating cells (stressed or dead cells are also round and floating) and another is, normal growing cells that are attached to the substrate. As the number of cells in cleavage will always be smaller than the number of cells which are growing normally and attached to the substrate, the cell-count of attached cells should be higher than the round cells. There are five different HeLa cell images that are used. For each image, every single cell is obtained by image segmentation and isolation. Different mathematical features are found for each cell. The feature set for this experiment is chosen in such a way that features are robust, discriminative and have good generalisation quality for classification. Almost all the features presented in this thesis are rotation, translation and scale invariant so that they are expected to perform well in discriminating objects or cells by any classification algorithm. There are some new features added which are believed to improve the classification result. The feature set is considerably broad rather than in contrast with the restricted sets which have been used in previous work. These features are used based on a common interface so that the library can be extended and integrated into other applications. These features are fed into a machine learning algorithm called Linear Discriminant Analysis (LDA) for classification. Cells are then classified as ‘Cells attached to the substrate’ or Cell Class A and ‘Cells in cleavage’ or Cell Class B. LDA considers features by leaving and adding shape features for increased performance. On average there is higher than ninety five percent accuracy obtained in the classification result which is validated by visual classification. Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
9	Representation of Biochemical Pathway Models : Issues relating conversion of model representation from SBML to a commercial tool Naswa, Sudhir January 2005 (has links) Background: Computational simulation of complex biological networks lies at the heart of systems biology since it can confirm the conclusions drawn by experimental studies of biological networks and guide researchers to produce fresh hypotheses for further experimental validation. Since this iterative process helps in development of more realistic system models a variety of computational tools have been developed. In the absence of a common format for representation of models these tools were developed in different formats. As a result these tools became unable to exchange models amongst them, leading to development of SBML, a standard exchange format for computational models of biochemical networks. Here the formats of SBML and one of the commercial tools of systems biology are being compared to study the issues which may arise during conversion between their respective formats. A tool StoP has been developed to convert the format of SBML to the format of the selected tool. Results: The basic format of SBML representation which is in the form of listings of various elements of a biochemical reaction system differs from the representation of the selected tool which is location oriented. In spite of this difference the various components of biochemical pathways including multiple compartments, global parameters, reactants, products, modifiers, reactions, kinetic formulas and reaction parameters could be converted from the SBML representation to the representation of the selected tool. The MathML representation of the kinetic formula in an SBML model can be converted to the string format of the selected tool. Some features of the SBML are not present in the selected tool. Similarly, the ability of the selected tool to declare parameters for locations, which are global to those locations and their children, is not present in the SBML. Conclusions: Differences in representations of pathway models may include differences in terminologies, basic architecture, differences in capabilities of software’s, and adoption of different standards for similar things. But the overall similarity of domain of pathway models enables us to interconvert these representations. The selected tool should develop support for unit definitions, events and rules. Development of facility for parameter declaration at compartment level by SBML and facility for function declaration by the selected tool is recommended. Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
10	GPCR-Directed Libraries for High Throughput Screening Poudel, Sagar January 2006 (has links) Guanine nucleotide binding protein (G-protein) coupled receptors (GPCRs), the largest receptor family, is enormously important for the pharmaceutical industry as they are the target of 50-60% of all existing medicines. Discovery of many new GPCR receptors by the “human genome project”, open up new opportunities for developing novel therapeutics. High throughput screening (HTS) of chemical libraries is a well established method for finding new lead compounds in drug discovery. Despite some success this approach has suffered from the near absence of more focused and specific targeted libraries. To improve the hit rates and to maximally exploit the full potential of current corporate screening collections, in this thesis work, identification and analysis of the critical drug-binding positions within the GPCRs were done, based on their overall sequence, their transmembrane regions and their drug binding fingerprints. A proper classification based on drug binding fingerprints on the basis for a successful pharmacophore modelling and virtual screening were done, which facilities in the development of more specific and focused targeted libraries for HTS. Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)

Search results