Global ETD Search

1	The impact of genotype on the cellular architecture of dilated and arrhythmogenic cardiomyopathies Lindberg, Eric Lars-Helge 12 May 2023 (has links) Herzinsuffizienz ist ein klinisches Syndrom, welches durch funktionelle und strukturelle Anomalien des Herzens verursacht wird, und ist weltweit die häufigste Todesursache. Die dilatative Kardiomyopathie, welche durch eine Vergrößerung der linken Herzkammer definiert ist, und die arrhythmogene Kardiomyopathie, welche im Gegensatz durch eine Dysfunktion der rechten Herzkammer definiert ist, sind eine der häufigsten Ursachen für Herzinsuffizienz. Trotz vieler Bemühungen die molekularen Veränderungen der Herzinsuffizienz zu charakterisieren, sind Zelltypzusammensetzung, Genexpressionsänderungen, und zelluläre Interaktionen unter pathologischen Bedingungen unbekannt. Um diese Fragen zu adressieren wurde ein Protokoll zur Isolation intakter Zellkerne entwickelt um Einzelkernsequenzierung im Herzen durchzuführen. Anschließend wurde mit dem entwickelten Protokoll die zelluläre Zusammensetzung des erwachsenen gesunden menschlichen Herzens charakterisiert. Hier war mein Fokus die Charakterisierung und Identifikation von Subformen von Fibroblasten, und deren Genexpressionsunterschiede in den linken und rechten Vorhöfen und Herzkammern. Basierend auf dieser Annotation wurden die Zelltypen und Subtypen von ungefähr 900.000 Zellkernen von 61 nicht-ischämischen Herzinsuffizienzpatienten mit unterschliedlichen pathogenen Varianten in DCM- und ACM-assoziierten Genen oder idiopathischen Erkrankungen charakterisiert und mit 18 gesunden Spenderherzen verglichen. Dieser Datensatz zeigte spezifische Unterschiede des linken und rechten Ventrikels mit differenziell regulierten Genen und Signalwegen, and Veränderungen in der Zusammensetzung der verschiedenen Zelltypen und Subtypen. Um genotyp-spezifische Antworten unabhängig zu bestätigen wurden Algorithmen des maschinellen Lernens angewendet, welche die zugehörige Genotyp-Untergruppe des Patienten mit hoher Genauigkeit vorhersagten. Zusammenfassend stellen die in dieser Arbeit veröffentlichten Daten das vorherrschende Dogma in Frage, dass Herzinsuffizienz auf einen gemeinsamen finalen Signalweg zurückzuführen ist. / Heart failure is a clinical syndrom and leading cause of death worldwide, caused by functional and structural abnormalities of the heart. Dilated Cardiomyopathy, defined by a left ventricular enlargement, and arrhythmogenic cardiomyopathy, defined by a right ventricular dysfunction, are leading causes of heart failure. Despite previous efforts to characterise molecular changes in the failing heart, little is known on cell-type specific abundance and expression changes under pathological conditions, and how individual cell-types interact during heart failure and cardiac remodelling. To address this question, a protocol for the isolation of intact nuclei was firstly established to perform robust single-nucleus RNA sequencing in the heart. Next, the cell-type composition of the healthy adult human heart was characterised. Here my focus was on the fibroblast nieche by characterising fibroblast states, their composition and their atria- and ventricle-specific expression patterns. Cell type and state annotation was then used to characterize the transcriptome of roughly 900,000 nuclei from 61 failing, non-ischemic human hearts with distinct pathogenic variants in DCM and ACM genes or idiopathic disease and compared those to 18 healthy donor hearts. This dataset revealed distinct responses of the right and left ventricle with differently regulated genes and pathways, and compositional changes across cell types and states. To independently confirm genotype-specific responses, machine learning approaches were applied, predicting genotype subgroups with high accuracy. Taken together, the findings published in this thesis upend the prevalent dogma that heart failure results in a final common pathway. Herz Molekularbiologie Einzellzellsequenzierung Herzinsuffizienz Heart Molecular Biology Single cell sequencing Cardiomyopathy 006 Spezielle Computerverfahren 570 Biologie ddc:006 ddc:005 ddc:570
2	Frequent itemset mining on multiprocessor systems Schlegel, Benjamin 08 May 2014 (has links) (PDF) Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets. Data Mining Assoziationsanalyse Mehrprozessorsysteme Paralleles Data Mining SIMD Apriori Eclat FP-growth Data mining Association rule mining Multiprocessor Systems Parallel mining SIMD Compression Apriori Eclat FP-growth ddc:004 rvk:ST 530 Datenverarbeitung Informatik Computerprogrammierung Programme Daten Spezielle Computerverfahren Data Mining Algorithmen Multithreading SIMD Datenkompression
3	Frequent itemset mining on multiprocessor systems Schlegel, Benjamin 30 May 2013 (has links) Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets. info:eu-repo/classification/ddc/004 ddc:004
4	Measuring coselectional constraint in learner corpora: A graph-based approach Shadrova, Anna Valer'evna 24 July 2020 (has links) Die korpuslinguistische Arbeit untersucht den Erwerb von Koselektionsbeschränkungen bei Lernerinnen des Deutschen als Fremdsprache in einem quasi-longitudinalen Forschungsdesign anhand des Kobalt-Korpus. Neben einigen statistischen Analysen wird vordergründig eine graphbasierte Analyse entwickelt, die auf der Graphmetrik Louvain-Modularität aufbaut. Diese wird für diverse Subkorpora nach verschiedenen Kriterien berechnet und mit Hilfe verschiedener Samplingtechniken umfassend intern validiert. Im Ergebnis zeigen sich eine Abhängigkeit der gemessenen Modularitätswerte vom Sprachstand der Teilnehmerinnen, eine höhere Modularität bei Muttersprachlerinnen, niedrigere Modularitätswerte bei weißrussischen vs. chinesischen Lernerinnen sowie ein U-Kurven-förmiger Erwerbsverlauf bei weißrussischen, nicht aber chinesischen Lerner*innen. Unterschiede zwischen den Gruppen werden aus typologischer, kognitiver, diskursiv-kultureller und Registerperspektive diskutiert. Abschließend werden Vorschläge für den Einsatz von graphbasierten Modellierungen in kernlinguistischen Fragestellungen entwickelt. Zusätzlich werden theoretische Lücken in der gebrauchsbasierten Beschreibung von Koselektionsphänomenen (Phraseologie, Idiomatizität, Kollokation) aufgezeigt und ein multidimensionales funktionales Modell als Alternative vorgeschlagen. / The thesis located in corpus linguistics analyzes the acquisition of coselectional constraint in learners of German as a second language in a quasi-longitudinal design based on the Kobalt corpus. Supplemented by a number of statistical analyses, the thesis primarily develops a graph-based analysis making use of Louvain modularity. The graph metric is computed for a range of subcorpora chosen by various criteria. Extensive internal validation is performed through a number of sampling techniques. Results robustly indicate a dependency of modularity on language acquisition progress, higher modularity in L1 vs. L2, lower modularity in Belarusian vs. Chinese learners, and a u-shaped learning development in Belarusian, but not in Chinese learners. Group differences are discussed from a typological, cognitive, cultural and cultural discourse, and register perspective. Finally, future applications of graph-based modeling in core-linguistic research are outlined. In addition, some gaps in the theoretical discussion of coselection phenomena (phraseology, idiomaticity, collocation) in usage-based linguistics are discussed and a multidimensional and functional model is proposed as an alternative. gebrauchsbasierte Linguistik Koselektion idiomatisches Prinzip Fremdsprachenerwerb quantitative Linguistik Korpuslinguistik Kollokation usage-based linguistics coselection second language acquisition idiom principle quantitative linguistics corpus linguistics collocation 410 Linguistik 430 Deutsch und verwandte Sprachen 006 Spezielle Computerverfahren GB 3026 ddc:410 ddc:430 ddc:006

1

Page generated in 0.0653 seconds