• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 562
  • 24
  • 19
  • 8
  • 2
  • Tagged with
  • 615
  • 532
  • 341
  • 188
  • 167
  • 165
  • 152
  • 88
  • 58
  • 58
  • 52
  • 49
  • 49
  • 48
  • 46
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

Nanometa Live : A real-time metagenomic analysis pipeline and interface for species classification and pathogen characterization

Sandås, Kristoffer January 2023 (has links)
Metagenomics studies the totality of genomes of all species in a microbial community. It is a young, growing field with medical, industrial, and ecological applications. Abundant metagenomic data is being produced today, but there is a lack of interpretation and visualization tools. The aim of this project was to create Nanometa Live: a user-friendly, real-time data processing pipeline and graphical user interface that enables visualization of the general species content in a sample, as well as detection of a set of predetermined pathogens. The pipeline was created using Snakemake, with classification by Kraken 2, and sequence validation by BLAST, with the input of the pipeline being fastq batch files from an Oxford Nanopore. The interface was coded in Python using the framework Dash, and utilizes the data produced by the pipeline to visualize results. A Sankey plot and a list of most abundant taxa displays the general species content, while a separate table and a gauge, colored to show the pathogenicity of the sample, displays the user-determined pathogens that the program looks for. Further exploration of the species composition is enabled by a sunburst plot and an icicle chart. Nanometa Live is a fully functioning prototype and can be considered on par with existing tools when it comes to analysis speed, computer requirements, and general user-friendliness. Its strengths are ease of interpretation and flexibility in visualizations, with weaknesses being lack of functionality, such as antibiotic resistance detection, and imperfections in code, structure and packaging.
142

Allelic Diversity and Signs of Natural Selection Associated with Environmental Adaptation in European Aspen

Huser, Linn Zetterberg January 2023 (has links)
No description available.
143

Image Processing and other bioinformatic tools for Neurobiology / Bildbearbeitung und andere bioinformatische Werkzeuge für die Neurobiologie

Prada Salcedo, Juan Pablo January 2018 (has links) (PDF)
Neurobiology is widely supported by bioinformatics. Due to the big amount of data generated from the biological side a computational approach is required. This thesis presents four different cases of bioinformatic tools applied to the service of Neurobiology. The first two tools presented belong to the field of image processing. In the first case, we make use of an algorithm based on the wavelet transformation to assess calcium activity events in cultured neurons. We designed an open source tool to assist neurobiology researchers in the analysis of calcium imaging videos. Such analysis is usually done manually which is time consuming and highly subjective. Our tool speeds up the work and offers the possibility of an unbiased detection of the calcium events. Even more important is that our algorithm not only detects the neuron spiking activity but also local spontaneous activity which is normally discarded because it is considered irrelevant. We showed that this activity is determinant in the calcium dynamics in neurons and it is involved in important functions like signal modulation and memory and learning. The second project is a segmentation task. In our case we are interested in segmenting the neuron nuclei in electron microscopy images of c.elegans. Marking these structures is necessary in order to reconstruct the connectome of the organism. C.elegans is a great study case due to the simplicity of its nervous system (only 502 neurons). This worm, despite its simplicity has taught us a lot about neuronal mechanisms. There is still a lot of information we can extract from the c.elegans, therein lies the importance of reconstructing its connectome. There is a current version of the c.elegans connectome but it was done by hand and on a single subject which leaves a big room for errors. By automatizing the segmentation of the electron microscopy images we guarantee an unbiased approach and we will be able to verify the connectome on several subjects. For the third project we moved from image processing applications to biological modeling. Because of the high complexity of even small biological systems it is necessary to analyze them with the help of computational tools. The term in silico was coined to refer to such computational models of biological systems. We designed an in silico model of the TNF (Tumor necrosis factor) ligand and its two principal receptors. This biological system is of high relevance because it is involved in the inflammation process. Inflammation is of most importance as protection mechanism but it can also lead to complicated diseases (e.g. cancer). Chronic inflammation processes can be particularly dangerous in the brain. In order to better understand the dynamics that govern the TNF system we created a model using the BioNetGen language. This is a rule based language that allows one to simulate systems where multiple agents are governed by a single rule. Using our model we characterized the TNF system and hypothesized about the relation of the ligand with each of the two receptors. Our hypotheses can be later used to define drug targets in the system or possible treatments for chronic inflammation or lack of the inflammatory response. The final project deals with the protein folding problem. In our organism proteins are folded all the time, because only in their folded conformation are proteins capable of doing their job (with some very few exceptions). This folding process presents a great challenge for science because it has been shown to be an NP problem. NP means non deterministic Polynomial time problem. This basically means that this kind of problems cannot be efficiently solved. Nevertheless, somehow the body is capable of folding a protein in just milliseconds. This phenomenon puzzles not only biologists but also mathematicians. In mathematics NP problems have been studied for a long time and it is known that given the solution to one NP problem we could solve many of them (i.e. NP-complete problems). If we manage to understand how nature solves the protein folding problem then we might be able to apply this solution to many other problems. Our research intends to contribute to this discussion. Unfortunately, not to explain how nature solves the protein folding problem, but to explain that it does not solve the problem at all. This seems contradictory since I just mentioned that the body folds proteins all the time, but our hypothesis is that the organisms have learned to solve a simplified version of the NP problem. Nature does not solve the protein folding problem in its full complexity. It simply solves a small instance of the problem. An instance which is as simple as a convex optimization problem. We formulate the protein folding problem as an optimization problem to illustrate our claim and present some toy examples to illustrate the formulation. If our hypothesis is true, it means that protein folding is a simple problem. So we just need to understand and model the conditions of the vicinity inside the cell at the moment the folding process occurs. Once we understand this starting conformation and its influence in the folding process we will be able to design treatments for amyloid diseases such as Alzheimer's and Parkinson's. In summary this thesis project contributes to the neurobiology research field from four different fronts. Two are practical contributions with immediate benefits, such as the calcium imaging video analysis tool and the TNF in silico model. The neuron nuclei segmentation is a contribution for the near future. A step towards the full annotation of the c.elegans connectome and later for the reconstruction of the connectome of other species. And finally, the protein folding project is a first impulse to change the way we conceive the protein folding process in nature. We try to point future research in a novel direction, where the amino code is not the most relevant characteristic of the process but the conditions within the cell. / Neurobiologie wird durch Bioinformatik unterstützt, aufgrund der großen Datenmengen, die von biologischer Seite her anfallen, bedarf es eines rechnerischen Ansatzes, um diese Daten sinnvoll zu interpretieren. Im Rahmen der vorliegenden Dissertation werden vier Werkzeuge aus dem Bereich der Bioinformatik für die Anwendung in der Neurobiologie vorgestellt. Die ersten beiden Werkzeuge gehören zum Bereich der digitalen Bildverarbeitung. Das erste Werkzeug nutzt einen Algorithmus basierend auf der Wavelet-Transformation, um Calciumaktivität in Neuronenkulturen zu bewerten. Hierzu wurde Open-Source-Software entwickelt, die Neurobiologen bei der Analyse von Videoaufnahmen unterstützt. Diese Analyse wird herkömmlicherweise manuell vorgenommen, sodass der Prozess zeitintensiv und sehr subjektiv ist. Die entwickelte Software beschleunigt den Arbeitsprozess und ermöglicht eine unverzerrte Detektion der Ereignisse in Bezug auf Calcium. Von noch größerer Bedeutsamkeit ist die Tatsache, dass der entwickelte Algorithmus nicht nur neuronale Spiking-Aktivität detektiert, sondern auch lokale Spontanaktivität, die herkömmlicherweise als irrelevant betrachtet und daher verworfen wird. Wir konnten zeigen, dass diese Spontanaktivität hohe Relevanz für die Dynamik von Calcium in den Neuronen besitzt und wahrscheinlich an wichtigen Funktionen beteiligt ist, wie der Signalmodulation, Lernen und Gedächtnis. Beim zweiten Projekt handelt es sich um eine Segmentierungsaufgabe. Wir sind daran interessiert, die neuronalen Zellkerne in elektromikroskopischen Aufnahmen des C.elegans zu segmentieren. Die Kennzeichnung dieser Struktur ist notwendig, um das Konnektom dieses Organismus zu rekonstruieren. Als Studienobjekt eignet sich C.elegans aufgrund der Simplizität seines Nervensystems (er besteht lediglich aus 502 Neuronen). Trotz der Simplizität des Nervensystems dieses Wurms konnten wichtige Erkenntnisse im Hinblick auf neuronale Mechanismen durch die Untersuchung dieses Modellorganismus gewonnen werden. Daher ist die Bestimmung des Konnektoms bedeutsam. Es existiert bereits eine Version des Konnektoms, doch diese wurde händig für lediglich ein Subjekt rekonstruiert und ist daher möglicherweise fehlerbehaftet. Die automatisierte Segmentierung der elektronenmikroskopischen Aufnahmen ermöglicht einen weniger verzerrten Ansatz, der zudem die Verifizierung an mehreren Subjekten gestattet. Das dritte Projekt dieser Dissertation ist ein Projekt zur Modellierung und Simulation eines biologischen Systems. Aufgrund der hohen Komplexität selbst kleinster biologischer Systeme ist die computergestützte Analyse notwendig. Der Begriff in silico wurde für die computergestützte Simulation biologischer Systeme geprägt. Wir haben ein in silico Modell des TNF (Tumornekrosefaktor) Ligand und seiner zwei Hauptrezeptoren entwickelt. Dieses biologische System ist von hoher Bedeutsamkeit, da es am Entzündungsprozess beteiligt ist, der höchste Wichtigkeit als Schutzmechanismus hat, aber es kann auch komplizierte Erkrankungen auslösen (beispielsweise Krebs), falls es zu einer chronischen Entzündungsreaktion kommt. Derartige Entzündungsprozesse können besonders gefährlich im Gehirn sein. Das System muss eine schwierige Balance zwischen protektiver Funktion und möglicher Krankheitsursache behalten. Um die Dynamiken besser zu verstehen, die das TNF System leiten, haben wir ein Modell mittels der BioNetGen Sprache erstellt. Diese regelbasierte Sprache ermöglicht es ein System zu simulieren, in dem multiple Agenten geleitet werden von einer Regel. Mithilfe unseres Modells charakterisieren wir das TNF System und stellen Hypothesen über die Beziehung des Liganden mit den beiden Rezeptoren auf. Diese Hypothesen können später genutzt werden, um mögliche Ziele im System für Arzneimittel, mögliche Behandlungen für chronische Entzündungen oder das Fehlen einer Entzündungsreaktion zu bestimmen. Im abschießenden Projekt wird das Proteinfaltungsproblem behandelt. In unserem Organismus werden ständig Proteine gefaltet, denn nur im gefalteten Zustand können sie ihrer Aufgabe nachkommen (mit sehr wenigen Ausnahmen). Dieser Faltungsprozess stellt eine große Herausforderung für die Wissenschaft dar, weil gezeigt wurde, dass der Faltungsprozess ein NP Problem ist. NP steht dabei für nichtdeterministisch polynomielles Zeitproblem. Dies bedeutet im Grunde, dass es nicht effizient gelöst werden kann. Nichtsdestotrotz ist der Körper in der Lage, ein Protein in Millisekunden zu falten. Dieses Phänomen stellt nicht nur Biologen sondern auch Mathematiker vor Rätsel. In der Mathematik wurde diese Probleme schon lange studiert und es ist bekannt, dass die Kenntnis der Lösung eines NP Problems die Lösung vieler bedeuten würde (insbesondere NP-kompletter Probleme). Daher ist die Idee, dass viele Probleme gelöst werden könnten, durch das Verständnis davon, wie die Natur das Problem löst. Unsere Forschung zielt darauf ab, zu dieser Diskussion beizutragen, allerdings nicht durch die Erklärung davon, wie die Natur das Problem löst, sondern durch die Erklärung, dass die Natur das Problem nicht löst. Dies scheint zunächst widersprüchlich, da der Körper ständig Proteine faltet. Unsere Hypothese besagt jedoch, dass der Organismus gelernt hat, eine vereinfachte Version des NP Problems zu lösen. Die Natur löst das Problem nicht in seiner vollen Komplexität, sondern nur eine kleine Instanz davon. Eine Instanz, die ein konvexes Optimierungsproblem darstellt. Wir formulieren das Proteinfaltungsproblem als konvexes Optimierungsproblem und zur Illustrierung unserer Behauptung nutzen wir theoretische Beispiele. Wenn die Hypothese zutrifft, bedeutet dies, dass das Proteinfaltungsproblem ein einfaches ist und wir müssen lediglich die Ausgangskonstellation der Umgebung in der Zelle verstehen und modellieren, in dem Moment in dem die Faltung passiert. Sobald wir die Ausgangskonstellation und den Einfluss auf den Faltungsprozess verstehen, können wir Behandlungen für Amyloid-Krankheiten, wie Alzheimer-Demenz und Morbus Parkinson entwickeln. Zusammenfassend trägt die vorliegende Dissertation zu neurobiologischer Forschung durch vier Ansätze bei. Zwei sind praktische Beiträge mit sofortigem Nutzen für die Forschung, dazu zählen das Videoanalyse Tool für Calcium Aufnahmen und das TNF in silico Modell. Die neuronale Zellkernsegmentierung ist ein Beitrag für die nahe Zukunft – ein Schritt zur Vervollständigung des Konnektoms des C.elegans und langfristig zur Rekonstruktion der Konnektome anderer Spezies. Und schließlich ist das Proteinfaltungsprojekt ein erster Impuls den Proteinfaltungsprozess anders zu denken. Wir versuchen zukünftige Forschung in eine andere Richtung zu lenken, wobei nicht der Aminosäurecode das relevanteste Charakteristikum des Prozesses ist, sondern vielmehr die Bedingungen innerhalb der Zelle.
144

A mathematical optimal control based approach to pharmacological modulation with regulatory networks and external stimuli / Ein auf mathematischer Optimalkontrolle basierender Ansatz für pharmakologische Modulation mit regulatorischen Netzwerken und externen Stimuli

Breitenbach, Tim January 2019 (has links) (PDF)
In this work models for molecular networks consisting of ordinary differential equations are extended by terms that include the interaction of the corresponding molecular network with the environment that the molecular network is embedded in. These terms model the effects of the external stimuli on the molecular network. The usability of this extension is demonstrated with a model of a circadian clock that is extended with certain terms and reproduces data from several experiments at the same time. Once the model including external stimuli is set up, a framework is developed in order to calculate external stimuli that have a predefined desired effect on the molecular network. For this purpose the task of finding appropriate external stimuli is formulated as a mathematical optimal control problem for which in order to solve it a lot of mathematical methods are available. Several methods are discussed and worked out in order to calculate a solution for the corresponding optimal control problem. The application of the framework to find pharmacological intervention points or effective drug combinations is pointed out and discussed. Furthermore the framework is related to existing network analysis tools and their combination for network analysis in order to find dedicated external stimuli is discussed. The total framework is verified with biological examples by comparing the calculated results with data from literature. For this purpose platelet aggregation is investigated based on a corresponding gene regulatory network and associated receptors are detected. Furthermore a transition from one to another type of T-helper cell is analyzed in a tumor setting where missing agents are calculated to induce the corresponding switch in vitro. Next a gene regulatory network of a myocardiocyte is investigated where it is shown how the presented framework can be used to compare different treatment strategies with respect to their beneficial effects and side effects quantitatively. Moreover a constitutively activated signaling pathway, which thus causes maleficent effects, is modeled and intervention points with corresponding treatment strategies are determined that steer the gene regulatory network from a pathological expression pattern to physiological one again. / In dieser Arbeit werden Modelle für molekulare Netzwerke bestehend aus gewöhnlichen Differentialgleichungen durch Terme erweitert, die die Wechselwirkung zwischen dem entsprechenden molekularen Netzwerk und der Umgebung berücksichtigen, in die das molekulare Netzwerk eingebettet ist. Diese Terme modellieren die Effekte von externen Stimuli auf das molekulare Netzwerk. Die Nutzbarkeit dieser Erweiterung wird mit einem Modell der circadianen Uhr demonstriert, das mit gewissen Termen erweitert wird und Daten von mehreren verschiedenen Experimenten zugleich reproduziert. Sobald das Modell einschließlich der externen Stimuli aufgestellt ist, wird eine Grundstruktur entwickelt um externe Stimuli zu berechnen, die einen gewünschten vordefinierte Effekt auf das molekulare Netzwerk haben. Zu diesem Zweck wird die Aufgabe, geeignete externe Stimuli zu finden, als ein mathematisches optimales Steuerungsproblem formuliert, für welches, um es zu lösen, viele mathematische Methoden zur Verfügung stehen. Verschiedene Methoden werden diskutiert und ausgearbeitet um eine Lösung für das entsprechende optimale Steuerungsproblem zu berechnen. Auf die Anwendung dieser Grundstruktur pharmakologische Interventionspunkte oder effektive Wirkstoffkombinationen zu finden, wird hingewiesen und diese diskutiert. Weiterhin wird diese Grundstruktur in Bezug zu existierenden Netzwerkanalysewerkzeugen gesetzt und ihre Kombination für die Netzwerkanalyse diskutiert um zweckbestimmte externe Stimuli zu finden. Die gesamte Grundstruktur wird mit biologischen Beispielen verifiziert, indem man die berechneten Ergebnisse mit Daten aus der Literatur vergleicht. Zu diesem Zweck wird die Blutplättchenaggregation untersucht basierend auf einem entsprechenden genregulatorischen Netzwerk und damit assoziierte Rezeptoren werden detektiert. Weiterhin wird ein Wechsel von einem T-Helfer Zelltyp in einen anderen in einer Tumorumgebung analysiert, wobei fehlende Agenzien berechnet werden um den entsprechenden Wechsel in vitro zu induzieren. Als nächstes wird ein genregulatorisches Netzwerk eines Myokardiozyten untersucht, wobei gezeigt wird wie die präsentierte Grundstruktur genutzt werden kann um verschiedene Behandlungsstrategien in Bezug auf ihre nutzbringenden Wirkungen und Nebenwirkungen quantitativ zu vergleichen. Darüber hinaus wird ein konstitutiv aktivierter Signalweg, der deshalb unerwünschte Effekte verursacht, modelliert und Interventionspunkte mit entsprechenden Behandlungsstrategien werden bestimmt, die das genregulatorische Netzwerk wieder von einem pathologischen Expressionsmuster zu einem physiologischen steuern.
145

Prediction of hub miRNAs and their associated pathways in Alzheimer's disease with miRNA-mRNA-TF network using bioinformatic tools

Kumari, Monika January 2021 (has links)
The prevalence of Alzheimer’s in Europe is increasing strikingly over the past decade. Addressing this neurodegenerative disorder can be arduous as the underlying cause isn’t often reversible. The past research has mainly focused on identifying degenerated genes and miRNAs as their interrelation is often useful to describe a medical condition and provides us with indispensable information which can be further exploited to devise a diagnostic plan or devise a therapy. These degenerated genes indeed act as useful diagnostic and prognostic biomarkers. Though their expression and manifestation have varied from patient to patient, it is often helpful to understand their dysregulation. The current research has employed the knowledge of bioinformatics tools and software to determine the deregulated genes and transcriptional factors. This information was utilized to create a complex network that indicated the impact of a specific gene or transcriptional factor(s) on the corresponding transcriptional factor(s) or genes. Through previous literature,their possible association with neurons, neurodegeneration, memory, cognition, and associated biological processes were gathered to establish their association in Alzheimer’s.
146

Expanding the application of a novel proteomics tool for drug target and mechanism identification

Yuan Andersson, Linnéa January 2022 (has links)
In drug discovery and development, characterization of the drug targets and mechanisms of action is an essential step. ProTargetMiner is a publicly available proteome signature library of anticancer molecules and its automated bioinformatics platform can be used for drug target and mechanism deconvolution. The possibility of expanding ProTargetMiner to treatments that are non-anticancer is investigated in this project. A new proteome signature library was built for 15 versatile drugs with diverse indications, e.g. against allergies, hypertension, and depression. To comprehensively cover the proteome response to these treatments, deep expression profiling was performed in human fibroblast, breast cancer MCF7, and neuron-like SHSY5Y cells using multiplexed LC-MS/MS analysis at an optimized duration of 48h. Here, each collected proteome signature is contrasted against other signatures using OPLS-DA models to deconvolute drug targets, similar to the approach devised in the original ProTargetMiner platform. Furthermore, the drugs are further profiled by a validation technique called Proteome Integral Solubility Alteration (PISA) assay to identify the protein targets that are directly engaged by the molecules. Several known targets and mechanistic proteins are identified in the deep expression profiling experiment and are further verified by the PISA assay. Further testing and literature research could uncover novel targets for the treatments. This platform is expandable to novel drugs and provides a resource for target deconvolution of compounds in preclinical and clinical testing.
147

Comprehensive Analysis of lncRNA and circRNA Mediated ceRNA network in Psoriasis

Imran, Saima January 2022 (has links)
Evidence is accumulating that noncoding RNAs and circRNA are involved in psoriasis; however, the competing endogenous RNA (ceRNA) mediated regulatory mechanisms in psoriasis are rarely reported. The research study aimed to comprehensively investigate the differences in the expression levels of circular RNA (circRNA), long non-coding RNA (lncRNA), microRNA (miRNA/miR), and mRNA in psoriasis. In addition, key lncRNA/circRNA-miRNA-mRNA-ceRNA interactions were screened using the GSE145305 microarray dataset from the Gene Expression Omnibus database. After data preprocessing, differentially expressed circRNAs (DECs), lncRNAs (DELs), miRNAs (DEMs), or genes (DEGs) were identified, and normal controls using the linear models for the microarray data method. A protein-protein interaction (PPI) network was constructed for DEGs based on protein databases, followed by a module analysis. The ceRNA network was constructed based on the interaction between miRNAs and mRNAs and lncRNAs/circRNAs and miRNAs. The present study identified that in the case of mRNA 10 genes are significantly down-regulated, 86 genes are significantly up-regulated and in the case of miRNA 48 are significantly down-regulated and 75 genes are significantly up-regulated between patients with psoriasis and controls. miRNA, mRNA, lncRNA, and circRNA target predictionswere made. Then combined construction of a ceRNA network using mRNA-miRNA-lncRNA and mRNAmiRNA-circRNA. The current research has employed the knowledge of bioinformatics tools and software to determine the hub module and PPI network. Taken together, these identified ceRNA interactions may be crucial targets for the treatment of psoriasis.
148

Identifying prognostic biomarkers for severe sepsis disease and 28 days mortality

Massoud, Gaprielle January 2022 (has links)
Sepsis is a complex, deadly, and difficult-to-diagnose disease characterized by anomalies in numerous life-threatening organ failures caused by an improper host response to an infecting organism such as bacteria, fungi, or viruses. Patient characteristics such as age and immunologic state, infection factors, and environmental factors such as nutritional status affect sepsis prognosis and make it difficult and a common cause of mortality. This project aimed to identifysepsis prognostic biomarkers by identifying significantly differentially expressed biomarkers across patient groups, then developing and evaluating a classification model that can help predict patients' prognosis. The project used input data consisting of 368 protein measurementsrepresented as Normalized Protein expression. These data have been preprocessed, split, and analyzed using the Wilcoxon rank-sum test to identify the significantly expressed biomarkers in each patient's subgroup, one in the ICU admission and six in the non-survived subgroups. These significantly expressed biomarkers were Volcano plotted, then integrated into different supervised and unsupervised multivariate statistical models. The best prognosis models for ICU admission were the KNN models based solely on either procalcitonin or C-reactive protein with AUCs of 1.00 (95% Cl: 1.00-1.00). The best prognosis model for the 28 days mortality was the KNN model of the tenascin-C with an AUC of 1.00 (95% Cl: 1.00-1.00). However, further studies are suggested using a larger sample size in order to lessen the likelihood of bias. Some of the identified significantly expressed biomarkers, procalcitonin, and CRP, could generate KNN models with high AUC that can be used to prognosis the ICU admission or the 28 days of mortality due to sepsis.
149

Genomic comparison of shiga toxin-producing E. coli O157:H7 from ruminants and humans

Good, Linnéa January 2022 (has links)
Shiga toxin-producing E. coli (STEC) are zoonotic pathogens that frequently colonise ruminants without them showing any symptoms. In humans, STEC cause diarrhoeal disease and occasionally leads to the life-threatening disease haemolytic-uraemic syndrome (HUS). In this study, the aim is to identify any genomic differences between Swedish STEC O157:H7 isolates that have caused HUS and isolates that did not, as well as between isolates taken from animals and isolates taken from humans. I constructed a pan-genome analysis pipeline and performed statistical analyses to find genes that differed between these groups. I also constructed a phylogenetic analysis pipeline to visualise any clustering of isolates based on different categories. The results from the phylogenetic analysis showed that the isolates tended to not form clear clusters based on their category. When comparing isolates from animals to isolates from humans, an elastic net regression analyses yielded a list of 23 genes that differed between them, while a statistical analysis using Scoary found 1854 genes. The genes found by the regression analysis consists largely of genes associated with metabolism, with other notable genes being transposases as well as two genes from the prp operon. Gene ontology analysis of the genes from Scoary showed that no particular molecular functions or biological processes stand out when compared to the background frequency of gene ontology terms. When comparing isolates that caused HUS against isolates that did not, no genes were found to be statistically significant. In order to find more conclusive results about the genomic differences between STEC in animals and humans, as well as between STEC that leads to HUS and STEC that does not, further studies are needed.
150

Siamese Neural Networks for Regression: Similarity-BasedPairing and Uncertainty Quantification

Zhang, Yumeng January 2022 (has links)
Here we present a similarity-based pairing method for generating compound pairs to train a Siamese Neural Network. In comparison with the conventional exhaustive pairing of N2/2 pairs (N being the sizeof the training set), this method results in N-1 pairs, significantly reducing the training time. It exhibits a better prediction performance consistently on the three physicochemical property datasets, using a multilayer perceptron with the ECFP4 fingerprint. We further include into the Siamese Neural Network the pre-trained Chemformer which extracts task-specific chemical features from the input SMILES strings. With the n-shot learning, we propose a means to measure the prediction uncertainty. Our results demonstrate that the higher accuracy is indeed associated with the lower prediction uncertainty. In addition, we discuss implications of the similarity principle in machine learning.

Page generated in 0.049 seconds