• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 156
  • 8
  • Tagged with
  • 164
  • 164
  • 164
  • 164
  • 164
  • 31
  • 30
  • 19
  • 18
  • 17
  • 17
  • 16
  • 16
  • 12
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Siamese Neural Networks for Regression: Similarity-BasedPairing and Uncertainty Quantification

Zhang, Yumeng January 2022 (has links)
Here we present a similarity-based pairing method for generating compound pairs to train a Siamese Neural Network. In comparison with the conventional exhaustive pairing of N2/2 pairs (N being the sizeof the training set), this method results in N-1 pairs, significantly reducing the training time. It exhibits a better prediction performance consistently on the three physicochemical property datasets, using a multilayer perceptron with the ECFP4 fingerprint. We further include into the Siamese Neural Network the pre-trained Chemformer which extracts task-specific chemical features from the input SMILES strings. With the n-shot learning, we propose a means to measure the prediction uncertainty. Our results demonstrate that the higher accuracy is indeed associated with the lower prediction uncertainty. In addition, we discuss implications of the similarity principle in machine learning.
32

Mathematical modelling simulation data and artificial intelligence for the study of tumour-macrophage interaction

Chaliha, Jaysmita Khanindra January 2023 (has links)
The study explores the integration of mathematical modelling and machine learning to understand tumour-macrophage interactions in the tumour microenvironment. It details mathematical models based on biochemistry and physics for predicting tumour dynamics, highlighting the role of macrophages. Machine learning, particularly unsupervised and supervised techniques like K-means clustering, logistic regression, and support vector machines, are implemented to analyse simulation data. The thesis's integration of K-means clustering reveals distinct tumour behaviour patterns through the classification of tumour cells based on their microenvironmental interactions. This segmentation is crucial for understanding tumour heterogeneity and its implications for treatment. Additionally, the application of logistic regression provides insights into the probability of macrophage polarization states in the tumour microenvironment. This statistical model underscores the significant factors influencing macrophage behaviour and their consequent impact on tumour progression. These analytical approaches enhance the understanding of the complex dynamics within the tumour microenvironment, contributing to more effective tumour study strategies. The study presents a comprehensive analysis of tumour growth, macrophage polarization, and their impact on cancer treatment and prognosis. Ethical considerations and future directions focus on enhancing model accuracy and integrating experimental data for improved cancer diagnosis and treatment strategies. The thesis concludes with the potential of this hybrid approach in advancing cancer biology and therapeutic approaches. / <p>Det finns övrigt digitalt material (t.ex. film-, bild- eller ljudfiler) eller modeller/artefakter tillhörande examensarbetet som ska skickas till arkivet.</p><p>There are other digital material (eg film, image or audio files) or models/artifacts that belongs to the thesis and need to be archived.</p>
33

Little big data - extending plastid genome databases using marine planktonic metagenomes

Huber, Thomas M. January 2022 (has links)
No description available.
34

BacIL - En Bioinformatisk Pipeline för Analys av Bakterieisolat / BacIL - A Bioinformatic Pipeline for Analysis of Bacterial Isolates

Östlund, Emma January 2019 (has links)
Listeria monocytogenes and Campylobacter spp. are bacteria that sometimes can cause severe illness in humans. Both can be found as contaminants in food that has been produced, stored or prepared improperly, which is why it is important to ensure that the handling of food is done correctly. The National Food Agency (Livsmedelsverket) is the Swedish authority responsible for food safety. One important task is to, in collaboration with other authorities, track and prevent food-related disease outbreaks. For this purpose bacterial samples are regularly collected from border control, at food production facilities and retail as well as from suspected food items and drinking water during outbreaks, and epidemiological analyses are employed to determine the type of bacteria present and whether they can be linked to a common source. One part of these epidemiological analyses involve bioinformatic analyses of the bacterial DNA. This includes determination of sequence type and serotype, as well as calculations of similarities between samples. Such analyses require data processing in several different steps which are usually performed by a bioinformatician using different computer programs. Currently the National Food Agency outsources most of these analyses to other authorities and companies, and the purpose of this project was to develop a pipeline that would allow for these analyses to be performed in-house. The result was a pipeline named BacIL - Bacterial Identification and Linkage which has been developed to automatically perform sequence typing, serotyping and SNP-analysis of Listeria monocytogenes as well as sequence typing and SNP-analysis of Campylobacter jejuni, C. coli and C. lari. The result of the SNP-analysisis is used to create clusters which can be used to identify related samples. The pipeline decreases the number of programs that have to be manually started from more than ten to two.
35

Framtidens biomarkörer : En prioritering av proteinerna i det humana plasmaproteomet

Antonsson, Elin, Eulau, William, Fitkin, Louise, Johansson, Jennifer, Levin, Fredrik, Lundqvist, Sara, Palm, Elin January 2019 (has links)
In this report, we rank possible protein biomarkers based on different criteria for use in Olink Proteomics’ protein panels. We started off with a list compiled through the Human Plasma Proteome Project (HPPP) and have in different ways used this to obtain the final results. To complete this task we compared the list with Olink’s and its competitors’ protein catalogs, identified diseases beyond Olink’s coverage and the proteins linked with these. We also created a scoring system used to fa- cilitate detection of good biomarkers. From this, we have concluded that Olink should focus on proteins that the competitors have in their catalogs and proteins that can be found in many pathways and are linked with many diseases. From each of the methods used, we have been able to identify a number of proteins that we recommend Olink to investigate further.
36

Method for recognizing local descriptors of protein structures using Hidden Markov Models

Björkholm, Patrik January 2008 (has links)
Being able to predict the sequence-structure relationship in proteins will extend the scope of many bioinformatics tools relying on structure information. Here we use Hidden Markov models (HMM) to recognize and pinpoint the location in target sequences of local structural motifs (local descriptors of protein structure, LDPS) These substructures are composed of three or more segments of amino acid backbone structures that are in proximity with each other in space but not necessarily along the amino acid sequence. We were able to align descriptors to their proper locations in 41.1% of the cases when using models solely built from amino acid information. Using models that also incorporated secondary structure information, we were able to assign 57.8% of the local descriptors to their proper location. Further enhancements in performance was yielded when threading a profile through the Hidden Markov models together with the secondary structure, with this material we were able assign 58,5% of the descriptors to their proper locations. Hidden Markov models were shown to be able to locate LDPS in target sequences, the performance accuracy increases when secondary structure and the profile for the target sequence were used in the models.
37

The role of RFX-target genes in neurodevelopmental and psychiatric disorders

Ganesan, Abhishekapriya January 2021 (has links)
Neurodevelopmental disorders such as autism spectrum disorder (ASD) and psychiatric disorders, for example, schizophrenia (SCZ) represent a large spectrum of disorders that manifest through cognitive and behavioural problems. ASD and SCZ are both highly heritable, and some phenotypic similarities between ASD and SCZ have sparked an interest in understanding their genetic commonalities. The genetics of both disorders exhibit significant heterogeneity. Developments in genomics and systems biology, continually increases people’s understanding of these disorders. Recently, pathogenic genetic variants in the regulatory factor X (RFX) family of transcription factors have been identified in a number of ASD cases. In this thesis, common genetic variants and expression patterns of genes identified to have a conserved promotor X-Box motif region, a binding site of RFX factors, are studied. Significant common variants identified through expression quantitative trait loci (eQTLs) and genome wide association studies (GWAS) are mapped to the regulatory regions of these genes and analysed for putative enrichment. In addition, single-cell RNA sequencing data is utilised to examine enrichment of cell types having high X-Box gene expression in the developing human cortex. Through the study, genes that have eQTLs or SNPs in the genomic regulatory regions of the X-Box genes have been identified. While there were no eQTLs or GWAS SNPs in the X-Box motifs, in the X-Box promoter regions some common variants were found. By hypergeometric distribution testing and the subsequent p-values obtained, all of these distributions are statistically under-enriched. Further, major cell types in the cortical region with increased expression of the X-Box genes and most expressed genes among these enriched cell types have been identified. Among the 11 cell types seven were found to be enriched for X-Box genes and many of the most expressed genes in these cell-types were similar. A further study into the cell types and genes identified, along with additional systems biological data analysis, could reveal a larger list of X-Box genes involved in ASD and SCZ and the specific roles of these genes.
38

Clustering biological data using a hybrid approach : Composition of clusterings from different features

Keller, Jens January 2008 (has links)
Clustering of data is a well-researched topic in computer sciences. Many approaches have been designed for different tasks. In biology many of these approaches are hierarchical and the result is usually represented in dendrograms, e.g. phylogenetic trees. However, many non-hierarchical clustering algorithms are also well-established in biology. The approach in this thesis is based on such common algorithms. The algorithm which was implemented as part of this thesis uses a non-hierarchical graph clustering algorithm to compute a hierarchical clustering in a top-down fashion. It performs the graph clustering iteratively, with a previously computed cluster as input set. The innovation is that it focuses on another feature of the data in each step and clusters the data according to this feature. Common hierarchical approaches cluster e.g. in biology, a set of genes according to the similarity of their sequences. The clustering then reflects a partitioning of the genes according to their sequence similarity. The approach introduced in this thesis uses many features of the same objects. These features can be various, in biology for instance similarities of the sequences, of gene expression or of motif occurences in the promoter region. As part of this thesis not only the algorithm itself was implemented and evaluated, but a whole software also providing a graphical user interface. The software was implemented as a framework providing the basic functionality with the algorithm as a plug-in extending the framework. The software is meant to be extended in the future, integrating a set of algorithms and analysis tools related to the process of clustering and analysing data not necessarily related to biology. The thesis deals with topics in biology, data mining and software engineering and is divided into six chapters. The first chapter gives an introduction to the task and the biological background. It gives an overview of common clustering approaches and explains the differences between them. Chapter two shows the idea behind the new clustering approach and points out differences and similarities between it and common clustering approaches. The third chapter discusses the aspects concerning the software, including the algorithm. It illustrates the architecture and analyses the clustering algorithm. After the implementation the software was evaluated, which is described in the fourth chapter, pointing out observations made due to the use of the new algorithm. Furthermore this chapter discusses differences and similarities to related clustering algorithms and software. The thesis ends with the last two chapters, namely conclusions and suggestions for future work. Readers who are interested in repeating the experiments which were made as part of this thesis can contact the author via e-mail, to get the relevant data for the evaluation, scripts or source code.
39

Timing of chromosomal alterations during tumour development

Viklund, Björn January 2017 (has links)
During cancer development, tumour cells will accumulate a lot of both somatic point mutations and copy number alterations. It is not unusual that affected genes have a copy number that differs from the usual two. Due to the loss of DNA repair mechanisms the cells can mutate independent from each other which gives rise to different subclones within the tumour. A tumour cell and its future daughter cells that gets an advantage in cell division speed compared to its competing neighbours, will eventually make up a large portion of the tumour. All the mutations that the subclone’s most recent common ancestor acquired until the expansion will be shared across the subclone. In this project, we have developed a method using the mutation frequencies from publicly available whole genome sequencing data, to quantify the amount of competing subclones in a sample and determining the time to its copy number duplications. This method could be further developed to be an extension to regular copy number analysis. A heterogeneous tumour can grow faster and be more resistant to treatment. Therefore, it is important to learn more about cancer development and get a greater understanding of the order in which copy number alterations occur.
40

A comparative validation of the human variant simulator SIMdrom

Ånäs, Sofia January 2017 (has links)
The past decade’s progress in next generation sequencing has drastically decreased the price of whole genome and exome sequencing, making it available as a clinical tool for diagnosing patients with genetic disease. However, finding a disease-causing mutation among millions of non-pathogenic variants in a patient’s genome, is not an easy task. Therefore, algorithms for finding variants relevant for clinicians to investigate more closely are needed and constantly developed. To test these algorithms a software called SIMdrom has been developed to simulate test data. In this project, the simulated data is validated through comparison to real genetic data to ensure that it is suitable to use as test data. Through ensuring the data’s reliability and finding possible improvements, the development of algorithms for finding disease-causing mutations can be facilitated. This in-turn could lead to better diagnosing-possibilities for clinicians. When visualizing simulated data together with real genomes using principal components analysis, it clusters near it’s real counterpart. This shows that the simulated data resembles the real genomes. Simulated exomes also performed well when used as a part in one of three training sets for the classifier in the Prioritization of Exome Data by Image Analysis study. Here they perform second best after an in-house data set consisting of real exomes. To conclude, the SIMdrom simulated data performs well in both parts of this project. Additional tests of its validity should include testing against larger real data sets, an improvement possibility could be to implement a simulation option for spiking in noise.

Page generated in 0.1448 seconds