Spelling suggestions: "subject:"tietoriskit"" "subject:"retoriktips""
1 |
Analyses of Unorthodox Overlapping Gene Segments in Oxytricha TrifallaxStich, Shannon 21 March 2019 (has links)
A ciliate is a phylum of protozoa that has two types of nuclei, macronuclei and micronuclei. There may be more than one of each type of nucleus in the organism [1]. The macronucleus is the structure where protein synthesis and cell metabolism occur [1]. The micronucleus stores genetic information and is mobilized during a sexual reproduction process called conjugation [1]. The somatic macronucleus (MAC) is developed from the germ-line micronucleus (MIC) through genome rearrangement during a sexual reproduction process called conjugation [6, 8]. Segments of the MIC that form the MAC during conjugation are called macronuclear destined sequences (MDSs) [8]. During sequencing each MDS is given coordinates where the MDS sequences begin and end in the MIC. The orientation of a MDS in the MIC can be taken to be positive or negative. If the direction of the MDS in the MIC agrees with the direction in the MAC then the orientation is positive otherwise it is a negative orientation. In this thesis we analyze various aspects of the gene assembly during the rearrangment process of the ciliate Oxytricha trifallax that were recently sequenced [15]. Some of the properties analyzed include overlapping MDSs, orientation, MDSs starting and ending position in the MIC and the gaps of overlapping MDS pairs. A gap of an overlapping MDS pair is the order difference of two MDSs for a particular MAC contig that overlap in the MIC contig. We use 120 MAC contigs from [15] that have overlaps among their own MDSs. These 120 MAC contigs make up the data set we call D4.
We explore the patterns of overlapping MDSs in the MIC in D4. To quantify such patterns, we associate a vector V (An) to each MAC contig An, where V (An) = (v1(An), v2(An), v3(An)) is a vector in R3. The first entry is the number of overlapping MDS pairs divided by the number of MDSs. The second entry is the sum of gaps of overlapping MDS pairs divided by the sum of all possible gaps. The final entry is the total number of overlapping base pairs divided by the total length of the MAC contig. We computed the distance matrixM = (dij) where dij is the Euclidean distance between V (Ai) and V (Aj). The MAC contig vectors and M were computed using Python.
To analyze D4 we applied Topological Data Analysis (TDA). TDA uses topological constructs to assess shapes in data [3, 12]. From the data entries of the distance matrix M = (dij) we applied a Vietoris-Rips filtration to generate the barcodes of the persistent homology in dimensions 0, 1 and 2. The persistence barcode of 0-dimensional homology illustrates clusters of the data while the 1-dimensional homology represents non-trivial loops in the simplicial complex [3, 13]. The application of TDA on the ciliate Oxytricha trifallax identified ten MAC contig clusters at epsilon= 0.1 in D4 and several loops that were persistent for two or three epsilon values. Other TDA methods can be applied to the Vietoris-Rips filtration to further identify which MAC contigs appear in each cluster.
|
2 |
The Persistent Topology of Geometric FiltrationsWang, Qingsong 06 September 2022 (has links)
No description available.
|
3 |
Persistence, Metric Invariants, and SimplificationOkutan, Osman Berat 02 October 2019 (has links)
No description available.
|
4 |
Contributions to Persistence TheoryDu, Dong 27 June 2012 (has links)
No description available.
|
5 |
Topological inference from measures / Inférence topologique à partir de mesuresBuchet, Mickaël 01 December 2014 (has links)
La quantité de données disponibles n'a jamais été aussi grande. Se poser les bonnes questions, c'est-à-dire des questions qui soient à la fois pertinentes et dont la réponse est accessible est difficile. L'analyse topologique de données tente de contourner le problème en ne posant pas une question trop précise mais en recherchant une structure sous-jacente aux données. Une telle structure est intéressante en soi mais elle peut également guider le questionnement de l'analyste et le diriger vers des questions pertinentes. Un des outils les plus utilisés dans ce domaine est l'homologie persistante. Analysant les données à toutes les échelles simultanément, la persistance permet d'éviter le choix d'une échelle particulière. De plus, ses propriétés de stabilité fournissent une manière naturelle pour passer de données discrètes à des objets continus. Cependant, l'homologie persistante se heurte à deux obstacles. Sa construction se heurte généralement à une trop large taille des structures de données pour le travail en grandes dimensions et sa robustesse ne s'étend pas au bruit aberrant, c'est-à-dire à la présence de points non corrélés avec la structure sous-jacente.Dans cette thèse, je pars de ces deux constatations et m'applique tout d'abord à rendre le calcul de l'homologie persistante robuste au bruit aberrant par l'utilisation de la distance à la mesure. Utilisant une approximation du calcul de l'homologie persistante pour la distance à la mesure, je fournis un algorithme complet permettant d'utiliser l'homologie persistante pour l'analyse topologique de données de petite dimension intrinsèque mais pouvant être plongées dans des espaces de grande dimension. Précédemment, l'homologie persistante a également été utilisée pour analyser des champs scalaires. Ici encore, le problème du bruit aberrant limitait son utilisation et je propose une méthode dérivée de l'utilisation de la distance à la mesure afin d'obtenir une robustesse au bruit aberrant. Cela passe par l'introduction de nouvelles conditions de bruit et l'utilisation d'un nouvel opérateur de régression. Ces deux objets font l'objet d'une étude spécifique. Le travail réalisé au cours de cette thèse permet maintenant d'utiliser l'homologie persistante dans des cas d'applications réelles en grandes dimensions, que ce soit pour l'inférence topologique ou l'analyse de champs scalaires. / Massive amounts of data are now available for study. Asking questions that are both relevant and possible to answer is a difficult task. One can look for something different than the answer to a precise question. Topological data analysis looks for structure in point cloud data, which can be informative by itself but can also provide directions for further questioning. A common challenge faced in this area is the choice of the right scale at which to process the data.One widely used tool in this domain is persistent homology. By processing the data at all scales, it does not rely on a particular choice of scale. Moreover, its stability properties provide a natural way to go from discrete data to an underlying continuous structure. Finally, it can be combined with other tools, like the distance to a measure, which allows to handle noise that are unbounded. The main caveat of this approach is its high complexity.In this thesis, we will introduce topological data analysis and persistent homology, then show how to use approximation to reduce the computational complexity. We provide an approximation scheme to the distance to a measure and a sparsifying method of weighted Vietoris-Rips complexes in order to approximate persistence diagrams with practical complexity. We detail the specific properties of these constructions.Persistent homology was previously shown to be of use for scalar field analysis. We provide a way to combine it with the distance to a measure in order to handle a wider class of noise, especially data with unbounded errors. Finally, we discuss interesting opportunities opened by these results to study data where parts are missing or erroneous.
|
Page generated in 0.0303 seconds