Global ETD Search

11	Integration of Biological Data Jakonienė, Vaida January 2006 (has links) Data integration is an important procedure underlying many research tasks in the life sciences, as often multiple data sources have to be accessed to collect the relevant data. The data sources vary in content, data format, and access methods, which often vastly complicates the data retrieval process. As a result, the task of retrieving data requires a great deal of effort and expertise on the part of the user. To alleviate these difficulties, various information integration systems have been proposed in the area. However, a number of issues remain unsolved and new integration solutions are needed. The work presented in this thesis considers data integration at three different levels. 1) Integration of biological data sources deals with integrating multiple data sources from an information integration system point of view. We study properties of biological data sources and existing integration systems. Based on the study, we formulate requirements for systems integrating biological data sources. Then, we define a query language that supports queries commonly used by biologists. Also, we propose a high-level architecture for an information integration system that meets a selected set of requirements and that supports the specified query language. 2) Integration of ontologies deals with finding overlapping information between ontologies. We develop and evaluate algorithms that use life science literature and take the structure of the ontologies into account. 3) Grouping of biological data entries deals with organizing data entries into groups based on the computation of similarity values between the data entries. We propose a method that covers the main steps and components involved in similarity-based grouping procedures. The applicability of the method is illustrated by a number of test cases. Further, we develop an environment that supports comparison and evaluation of different grouping strategies. The work is supported by the implementation of: 1) a prototype for a system integrating biological data sources, called BioTRIFU, 2) algorithms for ontology alignment, and 3) an environment for evaluating strategies for similarity-based grouping of biological data, called KitEGA. Datalogi integration grouping databases ontologies biological data ioinformatics KitEGA Datalogi Engineering and Technology Teknik och teknologier
12	Analysis of large-scale molecular biological data using self-organizing maps Wirth, Henry 06 December 2012 (has links) Modern high-throughput technologies such as microarrays, next generation sequencing and mass spectrometry provide huge amounts of data per measurement and challenge traditional analyses. New strategies of data processing, visualization and functional analysis are inevitable. This thesis presents an approach which applies a machine learning technique known as self organizing maps (SOMs). SOMs enable the parallel sample- and feature-centered view of molecular phenotypes combined with strong visualization and second-level analysis capabilities. We developed a comprehensive analysis and visualization pipeline based on SOMs. The unsupervised SOM mapping projects the initially high number of features, such as gene expression profiles, to meta-feature clusters of similar and hence potentially co-regulated single features. This reduction of dimension is attained by the re-weighting of primary information and does not entail a loss of primary information in contrast to simple filtering approaches. The meta-data provided by the SOM algorithm is visualized in terms of intuitive mosaic portraits. Sample-specific and common properties shared between samples emerge as a handful of localized spots in the portraits collecting groups of co-regulated and co-expressed meta-features. This characteristic color patterns reflect the data landscape of each sample and promote immediate identification of (meta-)features of interest. It will be demonstrated that SOM portraits transform large and heterogeneous sets of molecular biological data into an atlas of sample-specific texture maps which can be directly compared in terms of similarities and dissimilarities. Spot-clusters of correlated meta-features can be extracted from the SOM portraits in a subsequent step of aggregation. This spot-clustering effectively enables reduction of the dimensionality of the data in two subsequent steps towards a handful of signature modules in an unsupervised fashion. Furthermore we demonstrate that analysis techniques provide enhanced resolution if applied to the meta-features. The improved discrimination power of meta-features in downstream analyses such as hierarchical clustering, independent component analysis or pairwise correlation analysis is ascribed to essentially two facts: Firstly, the set of meta-features better represents the diversity of patterns and modes inherent in the data and secondly, it also possesses the better signal-to-noise characteristics as a comparable collection of single features. Additionally to the pattern-driven feature selection in the SOM portraits, we apply statistical measures to detect significantly differential features between sample classes. Implementation of scoring measurements supplements the basal SOM algorithm. Further, two variants of functional enrichment analyses are introduced which link sample specific patterns of the meta-feature landscape with biological knowledge and support functional interpretation of the data based on the ‘guilt by association’ principle. Finally, case studies selected from different ‘OMIC’ realms are presented in this thesis. In particular, molecular phenotype data derived from expression microarrays (mRNA, miRNA), sequencing (DNA methylation, histone modification patterns) or mass spectrometry (proteome), and also genotype data (SNP-microarrays) is analyzed. It is shown that the SOM analysis pipeline implies strong application capabilities and covers a broad range of potential purposes ranging from time series and treatment-vs.-control experiments to discrimination of samples according to genotypic, phenotypic or taxonomic classifications. info:eu-repo/classification/ddc/000 ddc:000
13	Um algoritmo para a construção de vetores de sufixo generalizados em memória externa / External memory generalized suffix array construction algorithm Louza, Felipe Alves da 17 December 2013 (has links) O vetor de sufixo é uma estrutura de dados importante utilizada em muitos problemas que envolvem cadeias de caracteres. Na literatura, muitos trabalhos têm sido propostos para a construção de vetores de sufixo em memória externa. Entretanto, esses trabalhos não enfocam conjuntos de cadeias, ou seja, não consideram vetores de sufixo generalizados. Essa limitação motiva esta dissertação, a qual avança no estado da arte apresentando o algoritmo eGSA, o primeiro algoritmo proposto para a construção de vetores de sufixo generalizados aumentado com o vetor de prefixo comum mais longo (LCP) e com a transformada de Burrows-Wheeler (BWT) em memória externa. A dissertação foi desenvolvida dentro do contexto de bioinformática, já que avanços tecnológicos recentes têm aumentado o volume de dados biológicos disponíveis, os quais são armazenados como cadeias de caracteres. O algoritmo eGSA foi validado por meio de testes de desempenho com dados reais envolvendo sequências grandes, como DNA, e sequências pequenas, como proteínas. Com relação aos testes comparativos com conjuntos de grandes cadeias de DNA, o algoritmo proposto foi comparado com o algoritmo correlato mais eficiente na literatura de construção de vetores de sufixo, o qual foi adaptado para construção de vetores generalizados. O algoritmo eGSA obteve um tempo médio de 3,2 a 8,3 vezes menor do que o algoritmo correlato e consumiu 50% menos de memória. Para conjuntos de cadeias pequenas de proteínas, foram realizados testes de desempenho apenas com o eGSA, já que no melhor do nosso conhecimento, não existem trabalhos correlatos que possam ser adaptados. Comparado com o tempo médio para conjuntos de cadeias grandes, o eGSA obteve tempos competitivos para conjuntos de cadeias pequenas. Portanto, os resultados dos testes demonstraram que o algoritmo proposto pode ser aplicado eficientemente para indexar tanto conjuntos de cadeias grandes quanto conjuntos de cadeias pequenas / The suffix array is an important data structure used in several string processing problems. In the literature, several approaches have been proposed to deal with external memory suffix array construction. However, these approaches are not specifically aimed to index sets of strings, that is, they do not consider generalized suffix arrays. This limitation motivates this masters thesis, which presents eGSA, the first external memory algorithm developed to construct generalized suffix arrays enhanced with the longest common prefix array (LCP) and the Burrows-Wheeler transform (BWT). We especially focus on the context of bioinformatics, as recent technological advances have increased the volume of biological data available, which are stored as strings. The eGSA algorithm was validated through performance tests with real data from DNA and proteins sequences. Regarding performance tests with large strings of DNA, we compared our algorithm with the most efficient and related suffix array construction algorithm in the literature, which was adapted to construct generalized arrays. The results demonstrated that our algorithm reduced the time spent by a factor of 3.2 to 8.3 and consumed 50% less memory. For sets of small strings of proteins, tests were performed only with the eGSA, since to the best of our knowledge, there is no related work that can be adapted. Compared to the average time spent to index sets of large strings, the eGSA obtained competitive times to index sets of small strings. Therefore, the performance tests demonstrated that the proposed algorithm can be applied efficiently to index both sets of large strings and sets of small strings Biological data Dados biológicos External memory Generalized suffix array Genome assembly Indexação Indexing Memória externa Montagem de genomas Vetor de sufixo generalizado
14	Um algoritmo para a construção de vetores de sufixo generalizados em memória externa / External memory generalized suffix array construction algorithm Felipe Alves da Louza 17 December 2013 (has links) O vetor de sufixo é uma estrutura de dados importante utilizada em muitos problemas que envolvem cadeias de caracteres. Na literatura, muitos trabalhos têm sido propostos para a construção de vetores de sufixo em memória externa. Entretanto, esses trabalhos não enfocam conjuntos de cadeias, ou seja, não consideram vetores de sufixo generalizados. Essa limitação motiva esta dissertação, a qual avança no estado da arte apresentando o algoritmo eGSA, o primeiro algoritmo proposto para a construção de vetores de sufixo generalizados aumentado com o vetor de prefixo comum mais longo (LCP) e com a transformada de Burrows-Wheeler (BWT) em memória externa. A dissertação foi desenvolvida dentro do contexto de bioinformática, já que avanços tecnológicos recentes têm aumentado o volume de dados biológicos disponíveis, os quais são armazenados como cadeias de caracteres. O algoritmo eGSA foi validado por meio de testes de desempenho com dados reais envolvendo sequências grandes, como DNA, e sequências pequenas, como proteínas. Com relação aos testes comparativos com conjuntos de grandes cadeias de DNA, o algoritmo proposto foi comparado com o algoritmo correlato mais eficiente na literatura de construção de vetores de sufixo, o qual foi adaptado para construção de vetores generalizados. O algoritmo eGSA obteve um tempo médio de 3,2 a 8,3 vezes menor do que o algoritmo correlato e consumiu 50% menos de memória. Para conjuntos de cadeias pequenas de proteínas, foram realizados testes de desempenho apenas com o eGSA, já que no melhor do nosso conhecimento, não existem trabalhos correlatos que possam ser adaptados. Comparado com o tempo médio para conjuntos de cadeias grandes, o eGSA obteve tempos competitivos para conjuntos de cadeias pequenas. Portanto, os resultados dos testes demonstraram que o algoritmo proposto pode ser aplicado eficientemente para indexar tanto conjuntos de cadeias grandes quanto conjuntos de cadeias pequenas / The suffix array is an important data structure used in several string processing problems. In the literature, several approaches have been proposed to deal with external memory suffix array construction. However, these approaches are not specifically aimed to index sets of strings, that is, they do not consider generalized suffix arrays. This limitation motivates this masters thesis, which presents eGSA, the first external memory algorithm developed to construct generalized suffix arrays enhanced with the longest common prefix array (LCP) and the Burrows-Wheeler transform (BWT). We especially focus on the context of bioinformatics, as recent technological advances have increased the volume of biological data available, which are stored as strings. The eGSA algorithm was validated through performance tests with real data from DNA and proteins sequences. Regarding performance tests with large strings of DNA, we compared our algorithm with the most efficient and related suffix array construction algorithm in the literature, which was adapted to construct generalized arrays. The results demonstrated that our algorithm reduced the time spent by a factor of 3.2 to 8.3 and consumed 50% less memory. For sets of small strings of proteins, tests were performed only with the eGSA, since to the best of our knowledge, there is no related work that can be adapted. Compared to the average time spent to index sets of large strings, the eGSA obtained competitive times to index sets of small strings. Therefore, the performance tests demonstrated that the proposed algorithm can be applied efficiently to index both sets of large strings and sets of small strings Dados biológicos Indexação Memória externa Montagem de genomas Vetor de sufixo generalizado Biological data External memory Generalized suffix array Genome assembly Indexing
15	Reversed Voodoo Dolls: An exploration of physical visualizations of biological data / Omvända voodoodockor: en undersökning av fysisk visualisering av biologisk data Rodriguez Palacios, Miguel Andres January 2015 (has links) Physical visualizations are artifacts that materialize abstract data. They take advantage of human natural abilities to interact with information in the physical world. These visualizations present an opportunity to be applied on new application domains. With the objective of discovering if physical visualizations can support remote monitoring of biological data, a technology probe is presented in the form of a reversed voodoo doll. This probe uses the natural affordance of an anthropomorphic figure to represent a person and reverses the concept of voodoo dolls in a playful way. The scenario of safety is selected for testing physical visualizations of bio-data. Two measurements from the human body, heart rate and motion are chosen as a light way to monitor remotely over a person’s conditions. During the study, a group of six participants were exposed to the technology probe and their interactions with it were observed. The study reports on the users’ interpretations of the data and uses given to the alternative modalities of the probe. The results suggest that the data mapping to the object’s body parts was effective for conveying meaning. Additionally, the results confirm that the use of multiple modalities in physical visualizations offers an opportunity to present information in situated contexts in the real world. The degree of physicality achieved by the reversed voodoo doll and the effects of the selected metaphors are discussed. In conclusion, it is argued that the responses and interpretations from the users indicate that the reversed voodoo doll served as a means in its own right to transmit information for monitoring of bio-data. / Fysiska visualiseringar är artefakter som materialiserar abstrakt data. Genom att använda sig av mänskliga naturliga förmågor interagerar de med information i den fysiska världen. Dessa visualiseringar skapar möjligheter för appliceringar inom nya tillämpningsområden. För att undersöka om fysiska visualiseringar kan stödja fjärrövervakning av biologisk data introducerades en sond i form av en omvänd voodoodocka. Med en människolik figur representerar denna sond en verklig person. På så sätt utnyttjar den naturliga associationer till mänskliga egenskaper och omvänder konceptet vodoodockor på ett lekfullt sätt. De fysiska visualiseringarna av biologisk data testas ur ett säkerhetsperspektiv. Två värden, hjärtfrekvens och rörelse, mäts från en människokropp för att göra det möjligt att övervaka en persons tillstånd på distans. Under studien observeras sex användare då de interagerar med sonden. Studien visar hur användarna tolkar sondens data och hur användningen varierar med avseende på sondens olika modaliteter. Resultaten från denna studie tyder på att datamappningen till sondens kroppsdelar effektivt ökade förståelsen. Dessutom bekräftar resultaten att användning av flera modaliteter i fysiska visualiseringar gör det möjligt att presentera information, anpassat till olika situationer i den verkliga världen. Till vilken grad voodoodockan ger en känsla av kroppslighet samt konsekvenser av de valda metaforerna diskuteras. I slutsatsen hävdas att användarnas svar och tolkningar tyder på att den omvända voodoodockan fungerade som ett medel för att övervaka biologisk data. Physical data visualization technology probes ambient displays data sculptures biological data data physicalizations Human Computer Interaction
16	Towards Accurate and Efficient Cell Tracking During Fly Wing Development Blasse, Corinna 05 December 2016 (has links) (PDF) Understanding the development, organization, and function of tissues is a central goal in developmental biology. With modern time-lapse microscopy, it is now possible to image entire tissues during development and thereby localize subcellular proteins. A particularly productive area of research is the study of single layer epithelial tissues, which can be simply described as a 2D manifold. For example, the apical band of cell adhesions in epithelial cell layers actually forms a 2D manifold within the tissue and provides a 2D outline of each cell. The Drosophila melanogaster wing has become an important model system, because its 2D cell organization has the potential to reveal mechanisms that create the final fly wing shape. Other examples include structures that naturally localize at the surface of the tissue, such as the ciliary components of planarians. Data from these time-lapse movies typically consists of mosaics of overlapping 3D stacks. This is necessary because the surface of interest exceeds the field of view of todays microscopes. To quantify cellular tissue dynamics, these mosaics need to be processed in three main steps: (a) Extracting, correcting, and stitching individ- ual stacks into a single, seamless 2D projection per time point, (b) obtaining cell characteristics that occur at individual time points, and (c) determine cell dynamics over time. It is therefore necessary that the applied methods are capable of handling large amounts of data efficiently, while still producing accurate results. This task is made especially difficult by the low signal to noise ratios that are typical in live-cell imaging. In this PhD thesis, I develop algorithms that cover all three processing tasks men- tioned above and apply them in the analysis of polarity and tissue dynamics in large epithelial cell layers, namely the Drosophila wing and the planarian epithelium. First, I introduce an efficient pipeline that preprocesses raw image mosaics. This pipeline accurately extracts the stained surface of interest from each raw image stack and projects it onto a single 2D plane. It then corrects uneven illumination, aligns all mosaic planes, and adjusts brightness and contrast before finally stitching the processed images together. This preprocessing does not only significantly reduce the data quantity, but also simplifies downstream data analyses. Here, I apply this pipeline to datasets of the developing fly wing as well as a planarian epithelium. I additionally address the problem of determining cell polarities in chemically fixed samples of planarians. Here, I introduce a method that automatically estimates cell polarities by computing the orientation of rootlets in motile cilia. With this technique one can for the first time routinely measure and visualize how tissue polarities are established and maintained in entire planarian epithelia. Finally, I analyze cell migration patterns in the entire developing wing tissue in Drosophila. At each time point, cells are segmented using a progressive merging ap- proach with merging criteria that take typical cell shape characteristics into account. The method enforces biologically relevant constraints to improve the quality of the resulting segmentations. For cases where a full cell tracking is desired, I introduce a pipeline using a tracking-by-assignment approach. This allows me to link cells over time while considering critical events such as cell divisions or cell death. This work presents a very accurate large-scale cell tracking pipeline and opens up many avenues for further study including several in-vivo perturbation experiments as well as biophysical modeling. The methods introduced in this thesis are examples for computational pipelines that catalyze biological insights by enabling the quantification of tissue scale phenomena and dynamics. I provide not only detailed descriptions of the methods, but also show how they perform on concrete biological research projects. Bildanalyse Bildverarbeitung Biologische Daten Zellsegmentierung Verfolgung von Zellen Image Analysis Image Processing Biological data Cell segmentation Cell tracking ddc:004 rvk:ST 330
17	Análise metadimensional em inferência de redes gênicas e priorização Marchi, Carlos Eduardo January 2017 (has links) Orientador: Prof. Dr. David Corrêa Martins Júnior / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2017. PRIORIZAÇÃO GÊNICA INFERÊNCIA DE REDES GÊNICAS INTEGRAÇÃO DE DADOS BIOLÓGICOS GENE PRIORITIZATION GENE NETWORKS INFERENCE BIOLOGICAL DATA INTEGRATION
18	Towards Accurate and Efficient Cell Tracking During Fly Wing Development Blasse, Corinna 23 September 2016 (has links) Understanding the development, organization, and function of tissues is a central goal in developmental biology. With modern time-lapse microscopy, it is now possible to image entire tissues during development and thereby localize subcellular proteins. A particularly productive area of research is the study of single layer epithelial tissues, which can be simply described as a 2D manifold. For example, the apical band of cell adhesions in epithelial cell layers actually forms a 2D manifold within the tissue and provides a 2D outline of each cell. The Drosophila melanogaster wing has become an important model system, because its 2D cell organization has the potential to reveal mechanisms that create the final fly wing shape. Other examples include structures that naturally localize at the surface of the tissue, such as the ciliary components of planarians. Data from these time-lapse movies typically consists of mosaics of overlapping 3D stacks. This is necessary because the surface of interest exceeds the field of view of todays microscopes. To quantify cellular tissue dynamics, these mosaics need to be processed in three main steps: (a) Extracting, correcting, and stitching individ- ual stacks into a single, seamless 2D projection per time point, (b) obtaining cell characteristics that occur at individual time points, and (c) determine cell dynamics over time. It is therefore necessary that the applied methods are capable of handling large amounts of data efficiently, while still producing accurate results. This task is made especially difficult by the low signal to noise ratios that are typical in live-cell imaging. In this PhD thesis, I develop algorithms that cover all three processing tasks men- tioned above and apply them in the analysis of polarity and tissue dynamics in large epithelial cell layers, namely the Drosophila wing and the planarian epithelium. First, I introduce an efficient pipeline that preprocesses raw image mosaics. This pipeline accurately extracts the stained surface of interest from each raw image stack and projects it onto a single 2D plane. It then corrects uneven illumination, aligns all mosaic planes, and adjusts brightness and contrast before finally stitching the processed images together. This preprocessing does not only significantly reduce the data quantity, but also simplifies downstream data analyses. Here, I apply this pipeline to datasets of the developing fly wing as well as a planarian epithelium. I additionally address the problem of determining cell polarities in chemically fixed samples of planarians. Here, I introduce a method that automatically estimates cell polarities by computing the orientation of rootlets in motile cilia. With this technique one can for the first time routinely measure and visualize how tissue polarities are established and maintained in entire planarian epithelia. Finally, I analyze cell migration patterns in the entire developing wing tissue in Drosophila. At each time point, cells are segmented using a progressive merging ap- proach with merging criteria that take typical cell shape characteristics into account. The method enforces biologically relevant constraints to improve the quality of the resulting segmentations. For cases where a full cell tracking is desired, I introduce a pipeline using a tracking-by-assignment approach. This allows me to link cells over time while considering critical events such as cell divisions or cell death. This work presents a very accurate large-scale cell tracking pipeline and opens up many avenues for further study including several in-vivo perturbation experiments as well as biophysical modeling. The methods introduced in this thesis are examples for computational pipelines that catalyze biological insights by enabling the quantification of tissue scale phenomena and dynamics. I provide not only detailed descriptions of the methods, but also show how they perform on concrete biological research projects. info:eu-repo/classification/ddc/004 ddc:004
19	Vizualizace značených buněk modelového organismu / Visualization of Marked Cells of a Model Organism Kubíček, Radek Unknown Date (has links) This master thesis is focused on volumetric data rendering and on highlighting and visualization of the selected cells of the model organisms. These data are captured by a confocal deconvolution microscope. Input data form one large volumetric block containing separate slices. This data block is rendered by an applicable method and then are identified and visualized the cells marked by the GFP (Green Fluorescent Protein) process or by chlorophyle fluorescency. The principal aim of this work is to find out the preferably optimal effective method enabling this highlighting, most preferably working without a manual check. Due to the data structure, this ambition seems hardly realizable, so it suffices to find out a manual working method. The last step is to embed the results of this work into FluorCam application, the confocal deconvolution microscope data visualizer.
20	Protein Interaction networks and their applications to protein characterization and cancer genes prediction Aragüés Peleato, Ramón 13 July 2007 (has links) La importancia de comprender los procesos biológicos ha estimulado el desarrollo de métodos para la detección de interacciones proteína-proteína. Esta tesis presenta PIANA (Protein Interactions And Network Analysis), un programa informático para la integración y el análisis de redes de interacción proteicas. Además, describimos un método que identifica motivos de interacción basándose en que las proteínas con parejas de interacción comunes tienden a interaccionar con esas parejas a través del mismo motivo de interacción. Encontramos que las proteínas altamente conectadas (i.e., hubs) con múltiples motivos tienen mayor probabilidad de ser esenciales para la viabilidad de la célula que los hubs con uno o dos motivos. Finalmente, presentamos un método que predice genes relacionados con cáncer mediante la integración de redes de interacción proteicas, datos de expresión diferenciada y propiedades estructurales, funcionales y evolutivas. El valor de predicción positiva es 71% con sensitividad del 1%, superando a otros métodos usados independientemente. / The importance of understanding cellular processes prompted the development of experimental approaches that detect protein-protein interactions. Here, we describe a software platform called PIANA (Protein Interactions And Network Analysis) that integrates interaction data from multiple sources and automates the analysis of protein interaction networks. Moreover, we describe a method that delineates interacting motifs by relying on the observation that proteins with common interaction partners tend to interact with these partners through the same interacting motif. We find that highly connected proteins (i.e., hubs) with multiple interacting motifs are more likely to be essential for cellular viability than hubs with one or two interacting motifs. Furthermore, we present a method that predicts cancer genes by integrating protein interaction networks, differential expression studies and structural, functional and evolutionary properties. For a sensitivity of 1%, the positive predictive value is 71%, which outperforms the use of any of the methods independently. differential expression studies cancer gene prediction essential proteins hub proteins interacting motifs PIANA biological data integration protein Interaction Networks expresión diferenciada proteínas esenciales proteínas hub motivos de interacción PIANA integración datos biológicos redes de interacción proteicas 575 576 616

Search results