Spelling suggestions: "subject:"[een] BIOINFORMATICS"" "subject:"[enn] BIOINFORMATICS""
611 |
The impact of affective computing in raising awareness of Subjective Well-Being and its influence on adherence and Quality of Life| An experience among patients suffering from Alpha-1 Antitrypsin Deficiency-Associated COPDStachel, Richard D. 21 October 2016 (has links)
<p> Nearly half of all Americans are living with a chronic condition, and one in four have Multiple Chronic Conditions (MCC). Chronic Obstructive Pulmonary Disease (COPD) is one such chronic disorder. Affecting 15 million Americans, COPD is the third leading cause of death in the United States, and treatment for COPD costs the healthcare system in excess of $32 billion annually. One significant factor leading to the high cost of care is non-adherence to medication and lifestyle recommendations. In an effort to keep chronic-disease sufferers healthier longer, thereby ameliorating the issue of avoidable costs, public health policy leaders, researchers, healthcare systems, and clinicians are attempting to discover ways to help individuals maintain appropriate adherence levels to medication prescriptions and lifestyle recommendations. In addition, previous research found connections between positivity and improved health and health outcomes. This study investigated one potential method of helping chronic-disease sufferers stay adherent and maintain their overall health. This research studied a mobile and on-line affective-computing tool that utilized Ecological Momentary Assessment (EMA) in an effort to raise the cognitive awareness of users’ subjective well-being and positivity. The study’s objective was to determine if use of this tool led to improvement in adherence, Subjective Well-Being (SWB), positivity, and overall Quality of Life (QoL). This study used an embedded mixed methods approach and involved 96 respondents diagnosed with Alpha-1 Antitrypsin Deficiency-Associated (AATD) COPD. Alpha-1 Antitrypsin Deficiency is a rare, genetic disease affecting approximately 100,000 Americans. The most significant complications for those suffering with Alpha-1 are lung or liver diseases. This study included only those diagnosed with lung disease. Participants used the affective-computing tool over a two-month period. The research measured their levels of positivity and quality of life prior to their use of the system and subsequently following it. This study also measured participants’ use of the affective-computing tool including frequency of response to push messages and response times. It then compared these variables for users who engaged with the system through email as opposed to those who participated by text messaging or Short Message Service (SMS).</p><p> Results indicated a small but insignificant increase in adherence rates, as well as improved but insignificant QoL scores between the pre and posttest periods. However, the analyses indicated a significant increase in subjective well-being scores between the two periods. They also revealed a 91.3% average compliance rate to the study push messages over the two-month period. While the research revealed faster compliance for those using text messaging, there was no significant difference in compliance rates for those answering using text messaging compared to those using email.</p><p> While the results indicated that the use of an EMA-associated system designed to raise awareness of SWB is one way of improving overall well-being and health of chronically-ill individuals, they did more significantly reveal areas of further study among other disease states, over longer study periods, and with larger sample sizes.</p>
|
612 |
Algorithms for integrated analysis of glycomics and glycoproteomics by LC-MS/MSKlein, Joshua Adam 01 August 2019 (has links)
The glycoproteome is an intricate and diverse component of a cell, and it plays a key role in the definition of the interface between that cell and the rest of its world. Methods for studying the glycoproteome have been developed for released glycan glycomics and site-localized bottom-up glycoproteomics using liquid chromatography-coupled mass spectrometry and tandem mass spectrometry (LC-MS/MS), which is itself a complex problem.
Algorithms for interpreting these data are necessary to be able to extract biologically meaningful information in a high throughput, automated context. Several existing solutions have been proposed but may be found lacking for larger glycopeptides, for complex samples, different experimental conditions, different instrument vendors, or even because they simply ignore fundamentals of glycobiology. I present a series of open algorithms that approach the problem from an instrument vendor neutral, cross-platform fashion to address these challenges, and integrate key concepts from the underlying biochemical context into the interpretation process.
In this work, I created a suite of deisotoping and charge state deconvolution algorithms for processing raw mass spectra at an LC scale from a variety of instrument types. These tools performed better than previously published algorithms by enforcing the underlying chemical model more strictly, while maintaining a higher degree of signal fidelity. From this summarized, vendor-normalized data, I composed a set of algorithms for interpreting glycan profiling experiments that can be used to quantify glycan expression. From this I constructed a graphical method to model the active biosynthetic pathways of the sample glycome and dig deeper into those signals than would be possible from the raw data alone. Lastly, I created a glycopeptide database search engine from these components which is capable of identifying the widest array of glycosylation types available, and demonstrate a learning algorithm which can be used to tune the model to better understand the process of glycopeptide fragmentation under specific experimental conditions to outperform a simpler model by between 10% and 15%. This approach can be further augmented with sample-wide or site-specific glycome models to increase depth-of-coverage for glycoforms consistent with prior beliefs.
|
613 |
Support vector machine prediction of HIV-1 drug resistance using The Viral Nucleotide patternsAraya, Seare Tesfamichael 23 February 2007 (has links)
Student Number : 0213068F -
MSc Dissertation -
School of Computer Science -
Faculty of Science / Drug resistance of the HI virus due to its fast replication and error-prone mutation is a key factor
in the failure to combat the HIV epidemic. For this reason, performing pre-therapy drug
resistance testing and administering appropriate drugs or combination of drugs accordingly is
very useful. There are two approaches to HIV drug resistance testing: phenotypic (clinical)
and genotypic (based on the particular virus’s DNA). Genotyping tests HIV drug resistance by
detecting specific mutations known to confer drug resistance. It is cheaper and can be computerised.
However, it requires being able to know or learn what mutations confer drug resistance.
Previous research using pattern recognition techniques has been promising, but the performance
needs to be improved. It is also important for techniques that can quickly learn new rules when
faced with new mutations or drugs.
A relatively recent addition to these techniques is the Support Vector Machines (SVMs).
SVMs have proved very successful in many benchmark applications such as face recognition,
text recognition, and have also performed well in many computational biology problems where
the number of features targeted is large compared to the number of available samples. This
paper explores the use of SVMs in predicting the drug resistance of an HIV strain extracted
from a patient based on the genetic sequence of those parts of the viral DNA encoding for the
two enzymes, Reverse Transcriptase or Protease, which are critical for the replication of the
HIV virus. In particular, it is the aim of this reseach to design the model without incorporating
the biological knowledge at hand to enable the resulting classifier accommodate new drugs and
mutations.
To evaluate the performance of SVMs we used cross validation technique to measure the
unbiased estimate on 2045 data points. The accuracy of classification and the area under the receiver
operating characteristics curve (AUC) was used as a performance measure. Furthermore,
to compare the performance of our SVMs model we also developed other prediction models
based on popular classification algorithms, namely neural networks, decision trees and logistic
regressions.
The results show that SVMs are a highly successful classifier and out-perform other techniques
with performance ranging between (94.13%–96.33%) accuracy and (81.26% - 97.49%)
AUC. Decision trees were rated second and logistic regression performed the worst.
|
614 |
Error-Informed Likelihood Calculations for More Realistic Genetic AnalysesUnknown Date (has links)
Next generation sequencing can rapidly analyze entire genomes in just hours. However, due to the nature of the sequencing process, errors may arise which limit the accuracy of the
reads obtained. Luckily, modern sequencing technologies associate with their reads, a quality score, derived from the sequencing procedures, which represents our confidence in each
nucleotide in the sequence. Currently, these quality scores are used as a criteria for the removal or modification of reads in the data set. These methods result in the loss of information
contained in those sequences and rely on parameters that are somewhat arbitrary; this may lead to a biased sample and inaccurate analyses. I propose an alternative method for incorporating
the error of the sequences without discarding poor quality reads by including the error probabilities of the reads in the likelihood calculations used for sequence analysis. It was found
that, despite introducing variability, using the error-informed likelihood method improved analyses compared with those which ignored the error altogether. While this method will likely
result in analyses with less definite results compared with those in which the data was treated with a preprocessing technique, these results will utilize all of the provided data and will
be more grounded in reality as we take into account the uncertainty that we have in our sequenced samples. / A Thesis submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Master of Science. / Fall Semester 2015. / November 6, 2015. / error, likelihood, ngs, sequencing / Includes bibliographical references. / Peter Beerli, Professor Directing Thesis; Anke Meyer-Baese, Committee Member; Alan Lemmon, Committee Member.
|
615 |
Utilities for Off-Target DNA Mining in Non-Model Organisms and Querying for Phylogenetic PatternsUnknown Date (has links)
High throughput sequencing data are rich in information and contain many off-target sequences (reads) that are often ignored but may be biologically relevant. Seed extension, a combination of reference and de novo based assembly methods, can be used to extract the information but it is time-consuming to implement because it requires that multiple seeds (sequences from one or many closely related species) be gathered in advance. A new tool is presented here, SeedSQrrL, that can automatically crawl the web to gather the seeds from the closest taxonomic relative for each gene and store it into a relational database. The seeds can then be used to create multiple seed extensions which are later combined into a reference or used for downstream phylogenetic analysis. Patterns in the resulting gene trees can be searched for using the traditional methods of tree comparison (Robinson-Foulds topological distance and branch-length comparison methods). Currently, no open source tree pattern matching program exists that allows the user to modify algorithms and create their own custom pattern matching functions. I have worked on such a tool, called Treematcher, and it will be made available in the ETE Toolkit (a Python Environment for Tree Exploration). Three biological case studies will be included included to demonstrate the capabilities of the two programs: 1) a custom function in Treematcher to perform a regular expression-like query, 2) SeedSQrrL will be used to isolate mitochondrial genes from snakes and chloroplast genes from angiosperms, and 3) a large case study of animals will be assembled. / A Dissertation submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 2, 2018. / Automated Gene Reference Collection, Gene Tree Pattern Matching, High Throughput Sequence Analysis, NCBI Taxonomy, Open Source Software for Bioinformatics, Python / Includes bibliographical references. / Alan Lemmon, Professor Directing Dissertation; Michelle Arbeitman, University Representative; Anke Meyer-Baese, Committee Member; Peter Beerli, Committee Member; Dennis Slice, Committee Member.
|
616 |
Bioinformatic mining and analysis of genetic elements in genomes. / CUHK electronic theses & dissertations collectionJanuary 2013 (has links)
在海量的生物數據中發掘重要的功能元件、揭示其功能特徵及相應的潛在生物機制是後基因組時代的一個巨大的挑戰。這裡,以特定的基因組為對象,運用生物信息學的理論與方法,對基因組島及後翻譯修飾系統進行了系統的挖掘、分析。 / 首先,收集源於7個真核生物的超過70,000個試驗驗證的翻譯後修飾事件。對照不帶有任何後翻譯修飾靶點的蛋白, 對受多種翻譯後修飾調控的蛋白 (MTP-蛋白) 的特性和功能進行了分析比較。(1) MTP-蛋白顯著傾向於形成蛋白質複合物,並能與更多的蛋白質相互作用,同時偏好於在蛋白質-蛋白質相互作用網絡中擔當樞紐。(2) MTP-蛋白還具有獨特的功能偏好以及特定的亞細胞定位。(3) 約80的後翻譯修飾位點位於蛋白的無序區域。同時MTP-蛋白比不受後翻譯修飾調控的蛋白擁有更多的無序區域。(4) 擁有較少無序區域的MTP-蛋白主要和蛋白質-DNA複合物的形成相關。(5) 只有一小部分單個後翻譯修飾事件對結合能的影響大於2kcal/mol,但組合的多種後翻譯修飾,如磷酸化加上乙酰化, 對結合能的影響大 幅提升。 / 隨後,對74真菌基因組中泛素化系統的不同組件(分別為泛素,E1,E2,E3和E3的底物) 進行註釋並比較分析。 (1) 與擔子菌的其他基因組相比, 菇類基因組中具有顯著多的泛素。 (2) 儘管E1的數目在目標基因組之間波動極小, 菇類基因組中E2的數目仍顯著高於其他擔子菌。 (3) 對於候選的E3,菇類基因組中Paracaspase和F-box的數目也顯著高於其他擔子菌。這些結果表明,泛素化系統很可能在真菌形態分化、尤其是菇的形成中扮演著重要角色。 / 然後,與全基因組相比,發現基因組島具有顯著高的轉錄起始信號富集. 基於這種特異的轉錄調控信號,設計了一個新的基因組島預測程序(命名GIST)。通過分析顯示GIST具有較高的靈敏度和準確性. 最後,運用GIST,對最近在德國暴發的菌株TY-2482中的基因組島進行了首次的檢測和分析。 / 總之,這些工作不僅大大拓展了我們關於特定功能元素的理解,如MTP-蛋白和基因組島,同時也為進一步的相關研究提供了重要的工具和線索,如GIST以及菇類基因組中的泛素化系統。 / In the post-genomic era, it is a huge challenge to detect the functional elements in the "ocean" of data and provide meaningful biological inferences. Here, many interesting functional elements have been characterized and analyzed among targeted genomes. / First, through compiling more than 70,000 experimentally determined posttranslational modification (PTM) events from 7 eukaryotic organisms, the features and functions of proteins regulated by multiple types of PTMs (Mtp-Proteins) are detected and analyzed by compared with proteins harboring no known target site of PTMs. (1) The Mtp-Proteins are found significantly enriched in protein complexes, having more protein partners and preferred to act as hubs in protein-protein interaction network. (2) Mtp-Proteins also possess distinct function focus and biased subcellular locations. (3) Overall, about 80% analyzed PTM events are embedded in intrinsic disordered regions (IDRs). And most Mtp-Proteins have more IDRs than proteins without PTM sites. It suggests IDR may account most for why some proteins can harbor so many extraordinary functions. (4) Interestingly, some particular Mtp-Proteins biased carrying PTMs located in ordered regions are observed mainly related to "protein-DNA complex assembly". (5) We further evaluated the energetic effects of PTMs on stability of PPI and found that only a small fraction of single PTM event influence the binding energy more than 2kcal/mol; but combinational use of PTM types i.e. combinational phosphorylation and acetylation can change the binding energy dramatically. / On the second part, the different components in ubiquitination system, respectively ubiquitin, E1, E2, E3 and the substrates of E3, are identified and analyzed comparatively across 74 fungi genomes. The results mainly include: (1) the ubiquitin number is significantly higher within the mushroom-forming genomes compared to other basidiomycota genomes. (2) The number of E1, with the average of 2.92, is consistent among most genomes. However, the number of E2 is different between mushroom-forming genomes and other basidiomycota genomes. (3) For the E3 candidates, it is found that the number of domain Paracaspase and F-box in the mushroom-forming genomes is significantly higher than the other basidiomycota genomes. These results suggest that the ubiquitination system may play vital role in divergence of fungi morphogenesis, especially, such as the formation of mushroom. / Then, the focus shift to genomic islands (GIs). Compared to the whole genome, highly enriched transcription initiation positions are firstly found to be precipitated in GI regions. Based on this heterogeneous transcriptional regulatory signal, a novel procedure GIST (Genome-island Identification by Signals of Transcription) for genomic island detection is designed. Interestingly, our method demonstrates higher sensitivity in detecting genomic islands harboring genes with biased GI-like function, preferenced subcellular localization, skewed GC property and shorter gene length. Finally, using the GIST, many interesting GIs are detected and analyzed in the German outbreak strain TY-2482 for the first time. / In summary, these work not only considerably expand our understanding of several functional genetic elements, such as genomic island and proteins regulated by combinational multiple PTMs, but also provide important tool and clues, such as GIST and potential E3 expansion in mushroom-forming fungi, for further related studies. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Huang, Qianli. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 161-186). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese. / Abstract --- p.i / 論文摘要 --- p.iii / Abbreviations --- p.v / Acknowledgements --- p.vi / Declaration --- p.viii / Table of Contents --- p.ix / List of Figures --- p.xi / List of Tables --- p.xiv / Chapter Chapter 1 --- Literature Review --- p.1 / Chapter 1.1 --- General introduction --- p.1 / Chapter 1.2 --- Post-translational modification --- p.2 / Chapter 1.2.1 --- Combinational multiple types of post-translational modification --- p.2 / Chapter 1.3 --- Genomic islands --- p.7 / Chapter 1.3.1 --- Brief introduction --- p.7 / Chapter 1.3.2 --- Bioinformatic tools and database for identification of Genomic islands --- p.9 / Chapter 1.4 --- Objectives and significance --- p.13 / Chapter Chapter 2 --- Systematic analysis on features and functions of proteins regulated by combinational multiple types of post-translational modifications --- p.15 / Chapter 2.1 --- Introduction --- p.15 / Chapter 2.2 --- Materials and Methods --- p.18 / Chapter 2.2.1 --- Annotation of PTM pattern and analyses on target residues --- p.18 / Chapter 2.2.2 --- Classification of Human Proteins --- p.19 / Chapter 2.2.3 --- Dataset of human protein-protein interactions (PPIs) and Construction of PPI network --- p.19 / Chapter 2.2.4 --- Calculation of Binding Energy --- p.20 / Chapter 2.2.5 --- Functional characterization and subcellular localization analysis --- p.21 / Chapter 2.2.5 --- Annotating IDR regions --- p.22 / Chapter 2.2.7 --- Statistical analyses --- p.23 / Chapter 2.3 --- Results --- p.23 / Chapter 2.3.1 --- Combinational interactions of multiple PTM types are undergoing evolutionary selection --- p.23 / Chapter 2.3.2 --- Evolutionary profile of modified amino acid residues --- p.33 / Chapter 2.3.3 --- Mtp-Proteins are enriched in the protein complex --- p.43 / Chapter 2.3.4 --- Multiple PTMs enable target protein function as hub or super-hub in PPI network --- p.46 / Chapter 2.3.5 --- Energetic effect of PTMs on the Stability of protein-protein binding --- p.60 / Chapter 2.3.6 --- Mtp-Proteins demonstrate distinct function focus --- p.65 / Chapter 2.3.7 --- Mtp-Proteins: located preferedly in Cytoplasm and Nucleus --- p.69 / Chapter 2.3.8 --- Why Mtp-Proteins possess so many special features : importance of IDR --- p.75 / Chapter 2.4 --- Discussion --- p.82 / Chapter 2.4.1 --- The hints from the features of Mtp-Proteins --- p.82 / Chapter 2.4.2 --- The implication of combinational interaction between two different functional PTM categories: biased locating in IDRs and ordered regions respectively --- p.84 / Chapter Chapter 3 --- Genome-wide comparative analyses of ubiquitome among basidiomycota and other typical fungi genomes --- p.87 / Chapter 3.1 --- Introduction --- p.87 / Chapter 3.2 --- Materials and Methods --- p.89 / Chapter 3.2.1 --- Genome sequences and annotation acquirement. --- p.89 / Chapter 3.2.2 --- Bioinformatic prediction of components in ubiquitome --- p.89 / Chapter 3.3 --- Results --- p.90 / Chapter 3.3.1 --- Identification of ubiquitin candidates among 74 fungi genomes --- p.90 / Chapter 3.3.2 --- Detection of potential E1 and E2 among all considered genomes --- p.94 / Chapter 3.3.3 --- Prediction and comparative analysis of different types of E3 --- p.98 / Chapter 3.3.4 --- The possible substrates of E3 --- p.104 / Chapter 3.4 --- Discussion --- p.107 / Chapter Chapter 4 --- Genomic islands Identification by Signals of Transcription --- p.109 / Chapter 4.1 --- Introduction --- p.109 / Chapter 4.2 --- Materials and Methods --- p.112 / Chapter 4.2.1 --- Genome sequence and annotation data --- p.112 / Chapter 4.2.2 --- Transcription start points (TSPs) scanning --- p.113 / Chapter 4.2.3 --- Genomic island dataset construction --- p.114 / Chapter 4.2.4 --- GIST: Genomic-island Identification by Signal of Transcription --- p.115 / Chapter 4.2.5 --- Functional characterization and subcellular localization analysis --- p.116 / Chapter 4.2.6 --- Codon usage, GC content and gene length --- p.117 / Chapter 4.2.7 --- Statistical analyses --- p.118 / Chapter 4.3 --- Results --- p.132 / Chapter 4.3.1 --- High-density transcriptional initiation signals associated with GIs --- p.132 / Chapter 4.3.2 --- Predict the potential novel GIs through GIST: Genomic-island Identification by Signal of Transcription --- p.134 / Chapter 4.3.3 --- Comparative Analysis: Distribution of gene function categories --- p.138 / Chapter 4.3.4 --- Comparative Analysis: Divergence of subcellular locations --- p.140 / Chapter 4.3.5 --- Comparative Analysis: GC property and gene length --- p.144 / Chapter 4.3.6 --- Hints of "non-optimal" codon usage bias --- p.145 / Chapter 4.3.7 --- Application of GIST to analyze GIs in the German E. coli O104:H4 outbreak strain --- p.147 / Chapter 4.4 --- Discussion --- p.152 / Chapter Chapter 5 --- Concluding remarks --- p.158 / References --- p.161
|
617 |
Functional and evolutionary implications of in silico gene deletionsJacobs, Christopher 12 February 2016 (has links)
Understanding how genetic modifications, individual or in combination, affect organismal fitness or other phenotypes is a challenge common to several areas of biology, including human health & genetics, metabolic engineering, and evolutionary biology. The importance of a gene can be quantified by measuring the phenotypic impact of its associated genetic perturbations "here and now", e.g. the growth rate of a mutant microbe. However, each gene also maintains a historical record of its cumulative importance maintained throughout millions of years of natural selection in the form of its degree of sequence conservation along phylogenetic branches. This thesis focuses on whether and how the phenotypic and evolutionary importance of genes are related to each other.
Towards this goal, I developed a new approach for characterizing the phenotypic consequences of genetic modifications in genome-scale biochemical networks using constraint-based computational models of metabolism. In particular, I investigated the impact of gene loss events on fitness in the model organism Saccharomyces cerevisiae, and found that my new metric for estimating the cost of gene deletion correlates with gene evolutionary rate. I found that previous failures to uncover this correlation using similar techniques may have been the result of an incorrect assumption about how isoenzymes deletions affect the reaction they catalyze.
I next hypothesized that the improvement my metric showed in predicting the cost of isoenzyme loss could translate into an improved capacity to predict the impact of pairs of gene deletions involving isoenzymes. Studies of such pair-wise genetic perturbations are important, because the extent to which a genetic perturbation modifies any given phenotype is often dependent on the genetic background upon which it has been performed. This lack of independence within sets of perturbations is termed epistasis. My results showed that, indeed, the new metric displays an increased capacity to predict epistatic interactions between pairs of genes.
In addition to shedding light on the relationship between the functional and evolutionary importance of genes, further developments of our approach may lead to better prediction of gene knockout phenotypes, with applications ranging from metabolic engineering to the search for gene targets for therapeutic applications.
|
618 |
Coverage Analysis in Clinical Next-Generation SequencingOdelgard, Anna January 2019 (has links)
With the new way of sequencing by NGS new tools had to be developed to be able to work with new data formats and to handle the larger data sizes compared to the previous techniques but also to check the accuracy of the data. Coverage analysis is one important quality control for NGS data, the coverage indicates how many times each base pair has been sequenced and thus how trustworthy each base call is. For clinical purposes every base of interest must be quality controlled as one wrong base call could affect the patient negatively. The softwares used for coverage analysis with enough accuracy and detail for clinical applications are sparse. Several softwares like Samtools, are able to calculate coverage values but does not further process this information in a useful way to produce a QC report of each base pair of interest. My master thesis has therefore been to create a new coverage analysis report tool, named CAR tool, that extract the coverage values from Samtools and further uses this data to produce a report consisting of tables, lists and figures. CAR tool is created to replace the currently used tool, ExCID, at the Clinical Genomics facility at SciLifeLab in Uppsala and was developed to meet the needs of the bioinformaticians and clinicians. CAR tool is written in python and launched from a terminal window. The main function of the tool is to display coverage breath values for each region of interest and to extract all sub regions below a chosen coverage depth threshold. The low coverage regions are then reported together with region name, start and stop positions, length and mean coverage value. To make the tool useful to as many as possible several settings are possible by entering different flags when calling the tool. Such settings can be to generate pie charts of each region’s coverage values, filtering of the read and bases by quality or write your own entry that will be used for the coverage calculation by Samtools. The tool has been proved to find these low coverage regions very well. Most low regions found are also found by ExCID, the currently used tool, some differences did however occur and every such region was verified by IGV. The coverage values shown in IGV coincided with those found by CAR tool. CAR tool is written to find all low coverage regions even if they are only one base pair long, while ExCID instead seem to generate larger low regions not taking very short low regions into account. To read more about the functions and how to use CAR tool I refer to User instructions in the appendix and on GitHub at the repository anod6351
|
619 |
Algorithmic Enhancements to Data Colocation Grid Frameworks for Big Data Medical Image ProcessingBao, Shunxing 19 April 2019 (has links)
<p> Large-scale medical imaging studies to date have predominantly leveraged in-house, laboratory-based or traditional grid computing resources for their computing needs, where the applications often use hierarchical data structures (e.g., Network file system file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance for laboratory-based approaches reveal that performance is impeded by standard network switches since typical processing can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. On the other hand, the grid may be costly to use due to the dedicated resources used to execute the tasks and lack of elasticity. With increasing availability of cloud-based big data frameworks, such as Apache Hadoop, cloud-based services for executing medical imaging studies have shown promise.</p><p> Despite this promise, our studies have revealed that existing big data frameworks illustrate different performance limitations for medical imaging applications, which calls for new algorithms that optimize their performance and suitability for medical imaging. For instance, Apache HBases data distribution strategy of region split and merge is detrimental to the hierarchical organization of imaging data (e.g., project, subject, session, scan, slice). Big data medical image processing applications involving multi-stage analysis often exhibit significant variability in processing times ranging from a few seconds to several days. Due to the sequential nature of executing the analysis stages by traditional software technologies and platforms, any errors in the pipeline are only detected at the later stages despite the sources of errors predominantly being the highly compute-intensive first stage. This wastes precious computing resources and incurs prohibitively higher costs for re-executing the application. To address these challenges, this research propose a framework - Hadoop & HBase for Medical Image Processing (HadoopBase-MIP) - which develops a range of performance optimization algorithms and employs a number of system behaviors modeling for data storage, data access and data processing. We also introduce how to build up prototypes to help empirical system behaviors verification. Furthermore, we introduce a discovery with the development of HadoopBase-MIP about a new type of contrast for medical imaging deep brain structure enhancement. And finally we show how to move forward the Hadoop based framework design into a commercialized big data / High performance computing cluster with cheap, scalable and geographically distributed file system.</p><p>
|
620 |
Dissecting Transcriptional Regulatory Networks with Systems Biology ApproachesZhou, Xiang January 2011 (has links)
In the past decade, technologies such as the DNA microarray and ChIP-on-chip have generated a large amount of high-throughput data for biologists. Although these data has provided us systems-level information about gene regulation, a major challenge in systems biology is to derive methodologies that will infer the underlying dynamics and mechanisms of gene regulation. This thesis research is focused on understanding these mechanisms of transcriptional regulation using systems biology approaches. Transcription regulatory networks play an important role in mediating external stimuli and coordinating responses to changing environments. Different methods that infer regulatory interactions directly from microarray data have been developed in the recent past. However, the implicit assumption in these methods that the transcription factor (TF) mRNA expression can be used as a proxy of its activity at protein level is not always correct, due to post-transcriptional and post-translational modifications of TFs. In this study, a method named iARACNe was developed. It uses the inferred TF activities to estimate the regulatory activity between TFs and their targets. The study demonstrated that the accuracy of the inferred networks using this method was greatly improved. Two additional methods, OmniMiner and coEDGi, which allow a better understanding of the physical interactions between TFs and target genes, were developed in this thesis research. OmniMiner detects and predicts the potential binding sites for the TFs of interest, while coEDGi enables identification of common enhancers upstream of co-regulated genes. Compared to other approaches which only allow isolated analyses, the systems biology approaches developed in this research provide an opportunity for biologists to study transcriptional regulations from both functional genomics and regulatory sequence perspectives simultaneously.
|
Page generated in 0.0754 seconds