1 |
Enhancements to the Microbial Source Tracking Process Through the Utilization of Clustering and K-nearest Clusters AlgorithmLai, Tram B 01 March 2018 (has links) (PDF)
Bacterial contamination in water sources is a serious health risk and the sources of the bacterial strains must be identified to keep people safe. This project is the result of a collaboration effort at Cal Poly to develop a new library-dependent Microbial Source Tracking method for determining sources of fecal contamination in the environment. The library used in this study is called Cal Poly Library of Pyroprints (CPLOP). The process of building CPLOP requires students to collect fecal samples from a multitude of sources in the San Luis Obispo area. A novel method developed by the biologists at Cal Poly called pyroprinting is then applied on the two intergenic regions of the E. coli isolates from these samples to obtain their fingerprints. These fingerprints are stored in the CPLOP database. In our study, we consider any E. coli samples whose fingerprints match above a certain threshold to be in the same group of bacterial strain. However, there has not yet been a final MST method that produces an acceptable level of accuracy. In this thesis, we propose a two-step MST classifier that combines two previous works: pyro-DBSCAN and k-RAP. These algorithms were developed specifically for CPLOP. We call our classifier HAP - Hybrid Algorithm for Pyroprints. The classifier works as follows. Given an unknown isolate, the first step requires performing clustering on the known isolates in the library and comparing the unknown isolate against the resulting clusters. If the isolate falls into a cluster, its classification will be returned as the dominant species of that cluster. Otherwise, we apply the k-Nearest Clusters Algorithm on this isolate to determine its final classification. Ultimately, HAP provides us a set of 16 decision strategies that identify the host species of an unknown sample with high accuracy.
|
2 |
An Assessment of Potential False Positive E.coli Pyroprints in the CPLOP DatabaseGordon, Skyler A 01 February 2017 (has links)
The genetic information found in each species of organism is unique, and can be used as a tool to differentiate at the molecular level. This has caused rapid genotyping methods to become the cornerstone of a new area of research dependent on reading the genome as a form of identification. One of these specific identification methods, known as pyroprinting, relies on the small variation of DNA sequences within the same species to develop a unique, reproducible fingerprint. By simultaneously pyrosequencing multiple polymorphic loci within the ribosomal operons known as the intergenic transcribed spacers, a reproducible output is obtained, known as a pyroprint, which can be used like a fingerprint to identify that organism. This section of the genome not only differs between species but also between isolated bacteria within that species, allowing for the differentiation of species subtypes, referred to as strains. While this is a viable method for generating reproducible fingerprints from individual strains it may be possible to obtain identical fingerprints from non-identical organisms. The following report uses direct sequence comparison and in silico pyrosequencing of E. coli isolates housed in the Center for Applications in Biotechnology at California Polytechnic State University, San Luis Obispo that have matching pyroprints to show that it is possible to receive near identical pyroprints from non-identical sequences of intergenic transcribed spacers. Although the exact likelihood and cause of this false positive result remains undetermined due to limitations in the sequencing method, its existence questions the accuracy of using pyroprints of the ITS regions as a method of strain classification.
|
Page generated in 0.0138 seconds