Global ETD Search

11	Frequent itemset mining on multiprocessor systems Schlegel, Benjamin 30 May 2013 (has links) Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets. info:eu-repo/classification/ddc/004 ddc:004
12	Collective Information Processing and Criticality, Evolution and Limited Attention. Klamser, Pascal 23 August 2021 (has links) Im ersten Teil analysiere ich die Selbstorganisation zur Kritikalität (hier ein Phasenübergang von Ordnung zu Unordnung) und untersuche, ob Evolution ein möglicher Organisationsmechanismus ist. Die Kernfrage ist, ob sich ein simulierter kohäsiver Schwarm, der versucht, einem Raubtier auszuweichen, durch Evolution selbst zum kritischen Punkt entwickelt, um das Ausweichen zu optimieren? Es stellt sich heraus, dass (i) die Gruppe den Jäger am besten am kritischen Punkt vermeidet, aber (ii) nicht durch einer verstärkten Reaktion, sondern durch strukturelle Veränderungen, (iii) das Gruppenoptimum ist evolutionär unstabiler aufgrund einer maximalen räumlichen Selbstsortierung der Individuen. Im zweiten Teil modelliere ich experimentell beobachtete Unterschiede im kollektiven Verhalten von Fischgruppen, die über mehrere Generationen verschiedenen Arten von größenabhängiger Selektion ausgesetzt waren. Diese Größenselektion soll Freizeitfischerei (kleine Fische werden freigelassen, große werden konsumiert) und die kommerzielle Fischerei mit großen Netzbreiten (kleine/junge Individuen können entkommen) nachahmen. Die zeigt sich, dass das Fangen großer Fische den Zusammenhalt und die Risikobereitschaft der Individuen reduziert. Beide Befunde lassen sich mechanistisch durch einen Aufmerksamkeits-Kompromiss zwischen Sozial- und Umweltinformationen erklären. Im letzten Teil der Arbeit quantifiziere ich die kollektive Informationsverarbeitung im Feld. Das Studiensystem ist eine an sulfidische Wasserbedingungen angepasste Fischart mit einem kollektiven Fluchtverhalten vor Vögeln (wiederholte kollektive Fluchttauchgängen). Die Fische sind etwa 2 Zentimeter groß, aber die kollektive Welle breitet sich über Meter in dichten Schwärmen an der Oberfläche aus. Es zeigt sich, dass die Wellengeschwindigkeit schwach mit der Polarisation zunimmt, bei einer optimalen Dichte am schnellsten ist und von ihrer Richtung relativ zur Schwarmorientierung abhängt. / In the first part, I focus on the self-organization to criticality (here an order-disorder phase transition) and investigate if evolution is a possible self-tuning mechanism. Does a simulated cohesive swarm that tries to avoid a pursuing predator self-tunes itself by evolution to the critical point to optimize avoidance? It turns out that (i) the best group avoidance is at criticality but (ii) not due to an enhanced response but because of structural changes (fundamentally linked to criticality), (iii) the group optimum is not an evolutionary stable state, in fact (iv) it is an evolutionary accelerator due to a maximal spatial self-sorting of individuals causing spatial selection. In the second part, I model experimentally observed differences in collective behavior of fish groups subject to multiple generation of different types of size-dependent selection. The real world analog to this experimental evolution is recreational fishery (small fish are released, large are consumed) and commercial fishing with large net widths (small/young individuals can escape). The results suggest that large harvesting reduces cohesion and risk taking of individuals. I show that both findings can be mechanistically explained based on an attention trade-off between social and environmental information. Furthermore, I numerically analyze how differently size-harvested groups perform in a natural predator and fishing scenario. In the last part of the thesis, I quantify the collective information processing in the field. The study system is a fish species adapted to sulfidic water conditions with a collective escape behavior from aerial predators which manifests in repeated collective escape dives. These fish measure about 2 centimeters, but the collective wave spreads across meters in dense shoals at the surface. I find that wave speed increases weakly with polarization, is fastest at an optimal density and depends on its direction relative to shoal orientation. Kollektives Verhalten Numerische Simulation Agentenbasierte Modelle Phasenübergang Kritikalität Evolution Künstliche Selektion Jäger-Beute collective behavior numerical simulations agent-based models artificial selection phase transition criticality predator prey criticality 530 Physik 570 Biologie 576 Genetik und Evolution WH 2500 WH 5000 WT 2027 WT 3827 WT 2527 ddc:530 ddc:570 ddc:576 ddc:005
13	Computational mapping of regulatory domains of human genes Patarčić, Inga 02 November 2021 (has links) Ljudski genom sadrži milijune regulatornih elemenata - enhancera - koji kvantitativno reguliraju ekspresiju gena. Unatoč ogromnom napretku u razumijevanju načina na koji enhanceri reguliraju ekspresiju gena, području još uvijek nedostaje pristup koji je sustavan, integrativan i dostupan za otkrivanje i dokumentiranje cis-regulatornih odnosa u cijelom genomu. Razvili smo novu računalnu metodu - reg2gene - koja modelira i integrira aktivnost enhancera~ekspresije gena. reg2gene sastoji se od tri glavna koraka: 1) kvantifikacija podataka, 2) modeliranje podataka i procjena značaja, i 3) integracija podataka prikupljenih u reg2gene R paketu. Kao rezultat toga, identificirali smo dva skupa enhancer-gen interakcija (EGA): fleksibilni skup od ~ 230K EGA (flexibleC) i strogi skup od ~ 60K EGA (stringentC). Utvrdili smo velike razlike u prethodno objavljenim računalnim modelima enhancer-gen interakcija; uglavnom u lokaciji, broju i svojstvima definiranih enhancera i EGA. Izveli smo detaljno mjerenje performansi sedam skupova računalno modeliranih EGA-a, ali smo pokazali da se niti jedan od trenutno dostupnih skupova referentnih podataka ne može koristiti kao referentni skup podataka "zlatnI standard". Definirali smo dodatni referentni skup pozitivnih i negativnih EGA -a pomoću kojih smo pokazali da stringentC ima najveću pozitivnu prediktivnu vrijednost (PPV). Pokazali smo potencijal EGA-a za identifikaciju genskih meta nekodirajucih SNP-ova. Proveli smo funkcionalnu analizu kako bismo otkrili nove genske mete, pleiotropiju enhancera i mehanizme aktivnosti enhancera. Ovaj rad poboljšava naše razumijevanje regulacije ekspresije gena posredovane enhancerima. / Das menschliche Genom enthält Millionen von regulatorischen Elementen - Enhancern -, die die Genexpression quantitativ regulieren. Trotz des enormen Fortschritts beim Verständnis, wie Enhancer die Genexpression steuern, fehlt es in diesem Bereich immer noch an einem systematischen, integrativen und zugänglichen Ansatz zur Entdeckung und Dokumentation von cis-regulatorischen Beziehungen im gesamten Genom. Wir haben eine neuartige Methode - reg2gene - entwickelt, die Genexpression~Enhancer-Aktivität modelliert und integriert. reg2gene besteht aus drei Hauptschritten: 1) Datenquantifizierung, 2) Datenmodellierung und Signifikanzbewertung und 3) Datenintegration, die in dem R-Paket reg2gene zusammengefasst sind. Als Ergebnis haben wir zwei Sätze von Enhancer-Gen-Assoziationen (EGAs) identifiziert: den flexiblen Satz von ~230K EGAs (flexibleC) und den stringenten Satz von ~60K EGAs (stringentC). Wir haben große Unterschiede zwischen den bisher veröffentlichten Berechnungsmodellen für Enhancer-Gene-Assoziationen festgestellt, vor allem in Bezug auf die Lage, die Anzahl und die Eigenschaften der definierten Enhancer-Regionen und EGAs. Wir führten ein detailliertes Benchmarking von sieben Sets von rechnerisch modellierten EGAs durch, zeigten jedoch, dass keiner der derzeit verfügbaren Benchmark-Datensätze als "goldener Standard" verwendet werden kann. Wir definierten einen zusätzlichen Benchmark-Datensatz mit positiven und negativen EGAs, mit dem wir zeigten, dass das stringentC-Modell den höchsten positiven Vorhersagewert (PPV) hatte. Wir haben das Potenzial von EGAs zur Identifizierung von Genzielen von nicht-kodierenden SNP-Gene-Assoziationen nachgewiesen. Schließlich führten wir eine funktionelle Analyse durch, um neue Genziele, Enhancer-Pleiotropie und Mechanismen der Enhancer-Aktivität zu ermitteln. Insgesamt bringt diese Arbeit unser Verständnis der durch Enhancer vermittelten Regulierung der Genexpression in Gesundheit und Krankheit voran. / Human genome contains millions of regulatory elements - enhancers - that quantitatively regulate gene expression. Multiple experimental and computational approaches were developed to associate enhancers with their gene targets. Despite the tremendous progress in understanding how enhancers tune gene expression, the field still lacks an approach that is systematic, integrative and accessible for discovering and documenting cis-regulatory relationships across the genome. We developed a novel computational approach - reg2gene- that models and integrates gene expression ~ enhancer activity. reg2gene consists of three main steps: 1) data quantification, 2) data modelling and significance assessment, and 3) data integration gathered in the reg2gene R package. As a result we identified two sets of enhancer-gene associations (EGAs): the flexible set of ~230K EGAs (flexibleC), and the stringent set of ~60K EGAs (stringentC). We identified major differences across previously published computational models of enhancer-gene associations; mostly in the location, number and properties of defined enhancer regions and EGAs. We performed detailed benchmarking of seven sets of computationally modelled EGAs, but showed that none of the currently available benchmark datasets could be used as a “golden-standard” benchmark dataset. To account for that observation, we defined an additional benchmark set of positive and negative EGAs with which we showed that the stringentC model had the highest positive predictive value (PPV) across all analyzed computational models. We reviewed the influence of EGA sets on the functional analysis of risk SNPs and demonstrated the potential of EGAs to identify gene targets of non-coding SNP-gene associations. Lastly, we performed a functional analysis to detect novel gene targets, enhancer pleiotropy, and mechanisms of enhancer activity. Altogether, this work advances our understanding of enhancer-mediated gene expression regulation in health and disease. Genexpressionsregulierung Enhancer Computermodellierung Enhancer-Gen-Assoziationen reg2gene Humangenom regulacija ekspresije gena enhancer ljudski genom reg2gene enhancer-gen interakcije računalno modeliranje gene expression regulation computational modelling enhancer-gene associations human genome reg2gene enhancer 570 Biologie 576 Genetik und Evolution WC 7700 WG 7000 WG 1940 ST 250 R ddc:570 ddc:005 ddc:576
14	Timeout Reached, Session Ends? / A Methodological Framework for Evaluating the Impact of Different Session-Identification Approaches Dietz, Florian 14 December 2022 (has links) Die Identifikation von Sessions zum Verständnis des Benutzerverhaltens ist ein Forschungsgebiet des Web Usage Mining. Definitionen und Konzepte werden seit über 20 Jahren diskutiert. Die Forschung zeigt, dass Session-Identifizierung kein willkürlicher Prozess sein sollte. Es gibt eine fragwürdige Tendenz zu vereinfachten mechanischen Sessions anstelle logischer Segmentierungen. Ziel der Dissertation ist es zu beweisen, wie unterschiedliche Session-Ansätze zu abweichenden Ergebnissen und Interpretationen führen. Die übergreifende Forschungsfrage lautet: Werden sich verschiedene Ansätze zur Session-Identifizierung auf Analyseergebnisse und Machine-Learning-Probleme auswirken? Ein methodischer Rahmen für die Durchführung, den Vergleich und die Evaluation von Sessions wird gegeben. Die Dissertation implementiert 135 Session-Ansätze in einem Jahr (2018) Daten einer deutschen Preisvergleichs-E-Commerce-Plattform. Die Umsetzung umfasst mechanische Konzepte, logische Konstrukte und die Kombination mehrerer Mechaniken. Es wird gezeigt, wie logische Sessions durch Embedding-Algorithmen aus Benutzersequenzen konstruiert werden: mit einem neuartigen Ansatz zur Identifizierung logischer Sessions, bei dem die thematische Nähe von Interaktionen anstelle von Suchanfragen allein verwendet wird. Alle Ansätze werden verglichen und quantitativ beschrieben sowie in drei Machine-Learning-Problemen (wie Recommendation) angewendet. Der Hauptbeitrag dieser Dissertation besteht darin, einen umfassenden Vergleich von Session-Identifikationsalgorithmen bereitzustellen. Die Arbeit bietet eine Methodik zum Implementieren, Analysieren und Evaluieren einer Auswahl von Mechaniken, die es ermöglichen, das Benutzerverhalten und die Auswirkungen von Session-Modellierung besser zu verstehen. Die Ergebnisse zeigen, dass unterschiedlich strukturierte Eingabedaten die Ergebnisse von Algorithmen oder Analysen drastisch verändern können. / The identification of sessions as a means of understanding user behaviour is a common research area of web usage mining. Different definitions and concepts have been discussed for over 20 years: Research shows that session identification is not an arbitrary task. There is a tendency towards simplistic mechanical sessions instead of more complex logical segmentations, which is questionable. This dissertation aims to prove how the nature of differing session-identification approaches leads to diverging results and interpretations. The overarching research question asks: will different session-identification approaches impact analysis and machine learning tasks? A comprehensive methodological framework for implementing, comparing and evaluating sessions is given. The dissertation provides implementation guidelines for 135 session-identification approaches utilizing a complete year (2018) of traffic data from a German price-comparison e-commerce platform. The implementation includes mechanical concepts, logical constructs and the combination of multiple methods. It shows how logical sessions were constructed from user sequences by employing embedding algorithms on interaction logs; taking a novel approach to logical session identification by utilizing topical proximity of interactions instead of search queries alone. All approaches are compared and quantitatively described. The application in three machine-learning tasks (such as recommendation) is intended to show that using different sessions as input data has a marked impact on the outcome. The main contribution of this dissertation is to provide a comprehensive comparison of session-identification algorithms. The research provides a methodology to implement, analyse and compare a wide variety of mechanics, allowing to better understand user behaviour and the effects of session modelling. The main results show that differently structured input data may drastically change the results of algorithms or analysis. Session-Identifikation Session-Modellierung Session-Evaluation Verhaltensmodellierung Session-Analyse session identification session modelling session evaluation user behaviour modelling task extraction session analysis AN 77500 ST 530 ddc:020 ddc:005 ddc:000
15	One-to-One Marketing in Grocery Retailing Gabel, Sebastian 28 June 2019 (has links) In der akademischen Fachliteratur existieren kaum Forschungsergebnisse zu One-to-One-Marketing, die auf Anwendungen im Einzelhandel ausgerichtet sind. Zu den Hauptgründen zählen, dass Ansätze nicht auf die Größe typischer Einzelhandelsanwendungen skalieren und dass die Datenverfügbarkeit auf Händler und Marketing-Systemanbieter beschränkt ist. Die vorliegende Dissertation entwickelt neue deskriptive, prädiktive und präskriptive Modelle für automatisiertes Target Marketing, die auf Representation Learning und Deep Learning basieren, und untersucht deren Wirksamkeit in Praxisanwendungen. Im ersten Schritt zeigt die Arbeit, dass Representation Learning in der Lage ist, skalierbar Marktstrukturen zu analysieren. Der vorgeschlagene Ansatz zur Visualisierung von Marktstrukturen ist vollständig automatisiert und existierenden Methoden überlegen. Die Arbeit entwickelt anschließend ein skalierbares, nichtparametrisches Modell, das Produktwahl auf Konsumentenebene für alle Produkte im Sortiment großer Einzelhändler vorhersagt. Das Deep Neural Network übertrifft die Vorhersagekraft existierender Benchmarks und auf Basis des Modells abgeleitete Coupons erzielen signifikant höhere Umsatzsteigerungen. Die Dissertation untersucht abschließend eine Coupon-Engine, die auf den entwickelten Modellen basiert. Der Vergleich personalisierter Werbeaktionen mit Massenmarketing belegt, dass One-to-One Marketing Einlösungsraten, Umsätze und Gewinne steigern kann. Eine Analyse der Kundenreaktionen auf personalisierte Coupons im Rahmen eines Kundenbindungsprogrammes zeigt, dass personalisiertes Marketing Systemnutzung erhöht. Dies illustriert, wie Target Marketing und Kundenbindungsprogramme effizient kombiniert werden können. Die vorliegende Dissertation ist somit sowohl für Forscher als auch für Praktiker relevant. Neben leistungsfähigeren Modellansätzen bietet diese Arbeit relevante Implikationen für effizientes Promotion-Management und One-to-One-Marketing im Einzelhandel. / Research on one-to-one marketing with a focus on retailing is scarce in academic literature. The two main reasons are that the target marketing approaches proposed by researchers do not scale to the size of typical retail applications and that data regarding one-to-one marketing remain locked within retailers and marketing solution providers. This dissertation develops new descriptive, predictive, and prescriptive marketing models for automated target marketing that are based on representation learning and deep learning and studies the models’ impact in real-life applications. First, this thesis shows that representation learning is capable of analyzing market structures at scale. The proposed approach to visualizing market structures is fully automated and superior to existing mapping methods that are based on the same input data. The thesis then proposes a scalable, nonparametric model that predicts product choice for the entire assortment of a large retailer. The deep neural network outperforms benchmark methods for predicting customer purchases. Coupon policies based on the proposed model lead to substantially higher revenue lifts than policies based on the benchmark models. The remainder of the thesis studies a real-time offer engine that is based on the proposed models. The comparison of personalized promotions to non-targeted promotions shows that one-to-one marketing increases redemption rates, revenues, and profits. A study of customer responses to personalized price promotions within the retailer’s loyalty program reveals that personalized marketing also increases loyalty program usage. This illustrates how targeted price promotions can be integrated smoothly into loyalty programs. In summary, this thesis is highly relevant for both researchers and practitioners. The new deep learning models facilitate more scalable and efficient one-to-one marketing. In addition, this research offers pertinent implications for promotion management and one-to-one marketing. Target Marketing Deep Learning Machine Learning Marktstrukturanalyse Produktwahlmodelle Multikategorie-Produktwahl Couponoptimierung Real-Time-Offer-Engines Recommender-Systeme Einzelhandel Big Data Kundenbindungsprogramme Prämien target marketing deep learning machine learning market structure analysis product choice model cross-category choice coupon optimization real-time offer engines recommender systems retailing big data loyalty programs loyalty program rewards 004 Datenverarbeitung; Informatik 330 Wirtschaft QP 612 QP 621 QP 611 QP 680 ddc:004 ddc:005 ddc:330

Page generated in 0.0859 seconds