Global ETD Search

1	ECLAT - Das Stuttgarter Festival für Neue Musik in Geschichte, Kontext und Dramaturgie / ECLAT - Stuttgart's New Music Festival in its history, context and dramaturgy Standke, Sarah Laila January 2022 (has links) (PDF) Diese Dissertation hat das ECLAT Festival Neue Musik Stuttgart zum Gegenstand der Forschung genommen und möchte dieses aus unterschiedlichen Perspektiven allumfassend betrachten. 1980 als Tage für Neue Musik Stuttgart gegründet und 1998 in ECLAT umbenannt, zählt es heute zu den wichtigsten Festivals für zeitgenössische Musik in Deutschland und genießt einen internationalen Ruf. In der wissenschaftlichen Forschung ist das ECLAT Festival jedoch bisher nicht aufgegriffen worden. Diese Arbeit möchte daher eine Lücke schließen nicht nur als Beitrag zur musikalischen Institutionsgeschichte, sondern auch in der Anwendung des Konzeptes der Dramaturgieanalyse, mit der die dramatische Gestalt und Struktur eines Festivals für zeitgenössische Musik erforscht wird. Detailliert beschäftigt sich die Dissertation mit der Historie des Festivals seit seiner Gründung bis hin zur Gegenwart sowie mit den es umgebenden Umständen, Personen und Institutionen in Stuttgart. Ein besonderes Augenmerk wird hierbei auf den damaligen SDR und heutigen SWR als langjährigem Kooperationspartner des Festivals gelegt. Neben dem methodischen Blickwinkel und der Quellenanalyse im Sinne der historischen Musikwissenschaft, liegt der musikwissenschaftlich-angewandte Fokus auf einer Dramaturgie- und Höranalyse in einem größeren, interdisziplinären Kontext. Diese wurde exemplarisch mit einer Herausarbeitung der jeweiligen Schwerpunkte und Themen sowie deren Kontextualisierung anhand der einzelnen Festivaljahrgänge zwischen 1998 und 2013 durchgeführt — ein Zeitraum, der sich von der Umbenennung des Festivals von Tage für Neue Musik Stuttgart zu ECLAT, die mit einer inhaltlichen Neupositionierung einherging, und bis zum Ende der über mehr als zwei Jahrzehnte dauernden alleinigen Künstlerischen Leitung von Hans-Peter Jahn erstreckt. / This dissertation has taken the ECLAT Festival Neue Musik Stuttgart as the subject of research and would like to look at it from different perspectives in an all-encompassing way. Founded in 1980 as Tage für Neue Musik Stuttgart and renamed ECLAT in 1998, it is now one of the most important festivals for contemporary music in Germany and enjoys an international reputation. However, the ECLAT Festival has not yet been taken up in scientific research. This thesis therefore aims to close a gap not only as a contribution to the history of musical institutions, but also in the application of the concept of dramaturgy analysis, with which the dramatic form and structure of a festival for contemporary music is explored. The dissertation deals in detail with the history of the festival from its foundation to the present as well as with the surrounding circumstances, persons and institutions in Stuttgart. Particular attention is paid to the former SDR and today's SWR as a long-standing cooperation partner of the festival. In addition to the methodological perspective and source analysis in the sense of historical musicology, the musicologically applied focus is on a dramaturgy and listening analysis in a larger, interdisciplinary context. This was carried out exemplarily with a working out of the respective focal points and topics as well as their contextualisation on the basis of the individual festival years between 1998 and 2013 – a period that extends from the renaming of the festival from Tage für Neue Musik Stuttgart to ECLAT, which was accompanied by a repositioning with regard to its contents, and to the end of Hans-Peter Jahn's sole artistic direction, which lasted for more than two decades. ECLAT (1998 : Stuttgart) ECLAT (2012 : Stuttgart) ECLAT (1999 : Stuttgart) ECLAT (2003 : Stuttgart) ECLAT (2004 : Stuttgart) ddc:700
2	Scalable frequent itemset mining on many-core processors Schlegel, Benjamin, Karnagel, Thomas, Kiefer, Tim, Lehner, Wolfgang 19 September 2022 (has links) Frequent-itemset mining is an essential part of the association rule mining process, which has many application areas. It is a computation and memory intensive task with many opportunities for optimization. Many efficient sequential and parallel algorithms were proposed in the recent years. Most of the parallel algorithms, however, cannot cope with the huge number of threads that are provided by large multiprocessor or many-core systems. In this paper, we provide a highly parallel version of the well-known Eclat algorithm. It runs on both, multiprocessor systems and many-core coprocessors, and scales well up to a very large number of threads---244 in our experiments. To evaluate mcEclat's performance, we conducted many experiments on realistic datasets. mcEclat achieves high speedups of up to 11.5x and 100x on a 12-core multiprocessor system and a 61-core Xeon Phi many-core coprocessor, respectively. Furthermore, mcEclat is competitive with highly optimized existing frequent-itemset mining implementations taken from the FIMI repository. info:eu-repo/classification/ddc/004 ddc:004
3	Frequent itemset mining on multiprocessor systems Schlegel, Benjamin 08 May 2014 (has links) (PDF) Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets. Data Mining Assoziationsanalyse Mehrprozessorsysteme Paralleles Data Mining SIMD Apriori Eclat FP-growth Data mining Association rule mining Multiprocessor Systems Parallel mining SIMD Compression Apriori Eclat FP-growth ddc:004 rvk:ST 530 Datenverarbeitung Informatik Computerprogrammierung Programme Daten Spezielle Computerverfahren Data Mining Algorithmen Multithreading SIMD Datenkompression
4	Frequent itemset mining on multiprocessor systems Schlegel, Benjamin 30 May 2013 (has links) Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets. info:eu-repo/classification/ddc/004 ddc:004
5	Datenzentrierte Bestimmung von Assoziationsregeln in parallelen Datenbankarchitekturen Legler, Thomas 15 August 2009 (has links) (PDF) Die folgende Arbeit befasst sich mit der Alltagstauglichkeit moderner Massendatenverarbeitung, insbesondere mit dem Problem der Assoziationsregelanalyse. Vorhandene Datenmengen wachsen stark an, aber deren Auswertung ist für ungeübte Anwender schwierig. Daher verzichten Unternehmen auf Informationen, welche prinzipiell vorhanden sind. Assoziationsregeln zeigen in diesen Daten Abhängigkeiten zwischen den Elementen eines Datenbestandes, beispielsweise zwischen verkauften Produkten. Diese Regeln können mit Interessantheitsmaßen versehen werden, welche dem Anwender das Erkennen wichtiger Zusammenhänge ermöglichen. Es werden Ansätze gezeigt, dem Nutzer die Auswertung der Daten zu erleichtern. Das betrifft sowohl die robuste Arbeitsweise der Verfahren als auch die einfache Auswertung der Regeln. Die vorgestellten Algorithmen passen sich dabei an die zu verarbeitenden Daten an, was sie von anderen Verfahren unterscheidet. Assoziationsregelsuchen benötigen die Extraktion häufiger Kombinationen (EHK). Hierfür werden Möglichkeiten gezeigt, Lösungsansätze auf die Eigenschaften moderne System anzupassen. Als Ansatz werden Verfahren zur Berechnung der häufigsten $N$ Kombinationen erläutert, welche anders als bekannte Ansätze leicht konfigurierbar sind. Moderne Systeme rechnen zudem oft verteilt. Diese Rechnerverbünde können große Datenmengen parallel verarbeiten, benötigen jedoch die Vereinigung lokaler Ergebnisse. Für verteilte Top-N-EHK auf realistischen Partitionierungen werden hierfür Ansätze mit verschiedenen Eigenschaften präsentiert. Aus den häufigen Kombinationen werden Assoziationsregeln gebildet, deren Aufbereitung ebenfalls einfach durchführbar sein soll. In der Literatur wurden viele Maße vorgestellt. Je nach den Anforderungen entsprechen sie je einer subjektiven Bewertung, allerdings nicht zwingend der des Anwenders. Hierfür wird untersucht, wie mehrere Interessantheitsmaßen zu einem globalen Maß vereinigt werden können. Dies findet Regeln, welche mehrfach wichtig erschienen. Der Nutzer kann mit den Vorschlägen sein Suchziel eingrenzen. Ein zweiter Ansatz gruppiert Regeln. Dies erfolgt über die Häufigkeiten der Regelelemente, welche die Grundlage von Interessantheitsmaßen bilden. Die Regeln einer solchen Gruppe sind daher bezüglich vieler Interessantheitsmaßen ähnlich und können gemeinsam ausgewertet werden. Dies reduziert den manuellen Aufwand des Nutzers. Diese Arbeit zeigt Möglichkeiten, Assoziationsregelsuchen auf einen breiten Benutzerkreis zu erweitern und neue Anwender zu erreichen. Die Assoziationsregelsuche wird dabei derart vereinfacht, dass sie statt als Spezialanwendung als leicht nutzbares Werkzeug zur Datenanalyse verwendet werden kann. / The importance of data mining is widely acknowledged today. Mining for association rules and frequent patterns is a central activity in data mining. Three main strategies are available for such mining: APRIORI , FP-tree-based approaches like FP-GROWTH, and algorithms based on vertical data structures and depth-first mining strategies like ECLAT and CHARM. Unfortunately, most of these algorithms are only moderately suitable for many “real-world” scenarios because their usability and the special characteristics of the data are two aspects of practical association rule mining that require further work. All mining strategies for frequent patterns use a parameter called minimum support to define a minimum occurrence frequency for searched patterns. This parameter cuts down the number of patterns searched to improve the relevance of the results. In complex business scenarios, it can be difficult and expensive to define a suitable value for the minimum support because it depends strongly on the particular datasets. Users are often unable to set this parameter for unknown datasets, and unsuitable minimum-support values can extract millions of frequent patterns and generate enormous runtimes. For this reason, it is not feasible to permit ad-hoc data mining by unskilled users. Such users do not have the knowledge and time to define suitable parameters by trial-and-error procedures. Discussions with users of SAP software have revealed great interest in the results of association-rule mining techniques, but most of these users are unable or unwilling to set very technical parameters. Given such user constraints, several studies have addressed the problem of replacing the minimum-support parameter with more intuitive top-n strategies. We have developed an adaptive mining algorithm to give untrained SAP users a tool to analyze their data easily without the need for elaborate data preparation and parameter determination. Previously implemented approaches of distributed frequent-pattern mining were expensive and time-consuming tasks for specialists. In contrast, we propose a method to accelerate and simplify the mining process by using top-n strategies and relaxing some requirements on the results, such as completeness. Unlike such data approximation techniques as sampling, our algorithm always returns exact frequency counts. The only drawback is that the result set may fail to include some of the patterns up to a specific frequency threshold. Another aspect of real-world datasets is the fact that they are often partitioned for shared-nothing architectures, following business-specific parameters like location, fiscal year, or branch office. Users may also want to conduct mining operations spanning data from different partners, even if the local data from the respective partners cannot be integrated at a single location for data security reasons or due to their large volume. Almost every data mining solution is constrained by the need to hide complexity. As far as possible, the solution should offer a simple user interface that hides technical aspects like data distribution and data preparation. Given that BW Accelerator users have such simplicity and distribution requirements, we have developed an adaptive mining algorithm to give unskilled users a tool to analyze their data easily, without the need for complex data preparation or consolidation. For example, Business Intelligence scenarios often partition large data volumes by fiscal year to enable efficient optimizations for the data used in actual workloads. For most mining queries, more than one data partition is of interest, and therefore, distribution handling that leaves the data unaffected is necessary. The algorithms presented in this paper have been developed to work with data stored in SAP BW. A salient feature of SAP BW Accelerator is that it is implemented as a distributed landscape that sits on top of a large number of shared-nothing blade servers. Its main task is to execute OLAP queries that require fast aggregation of many millions of rows of data. Therefore, the distribution of data over the dedicated storage is optimized for such workloads. Data mining scenarios use the same data from storage, but reporting takes precedence over data mining, and hence, the data cannot be redistributed without massive costs. Distribution by special data semantics or user-defined selections can produce many partitions and very different partition sizes. The handling of such real-world distributions for frequent-pattern mining is an important task, but it conflicts with the requirement of balanced partition. Assoziationsregel Mining Suche SAP SAP BW Accelerator BI Accelerator Frequent Patterns association rule mining frequent pattern mining sap bw accelerator bwa bia uneven partitions distributed partition eclat top-n top-k top-r ddc:004 rvk:ST 274
6	Contribuer volontairement au bien public en groupe élargi : évolution via le triptyque observation, explication, représentation sur fond d'un classique / Voluntarily contributing to public good in larger group : evolution through triptych observation, explanation, modeling against a backdrop of a classic Chambre, Damien 23 March 2016 (has links) Nous capturons la décision de contribuer volontairement au bien public en groupe restreint et important, en nous basant sur un classique de l’économie expérimentale. Contrairement aux prévisions, la décision dépend négativement du rendement du bien public en groupe important. Les joueurs percevraient l’enjeu comme faible. La décision dépend positivement, mais trop fragilement, du nombre de bénéficiaires du bien public en groupe restreint et n’en dépend pas en groupe important. La décision dépend toujours négativement de l’avancement du jeu, mais sans convergence vers la stratégie dominante. Le mode de calcul de la rémunération aurait une incidence. La réciprocité et l’aversion à l’iniquité sont toujours absentes. Cela serait lié au fait que les joueurs ne sont pas représentatifs de la population. Dans le sens des prévisions, deux propriétés de l’investissement public se dégagent et concordent avec les décisions observées dans la vie courante. La décision dépend positivement du rendement public en groupe restreint, dans la logique de cet incitatif à contribuer. L’altruisme est présent sous forme de traces en groupe restreint et disparaît en groupe important, désavantagé par la dilution du don. L’éclat chaleureux du don voit sa présence s’accroître en groupe important, favorisé par le changement de nature du don. Nous modélisons ces résultats en nous basant sur l’équilibre Logit. Il s’agit de fonctions de réponse perturbées comprenant différents composants. Les propriétés de ces fonctions rencontrent adéquatement la prise de décision et ont l’avantage de ne pas représenter certains errements empiriques. / We capture decision-making to voluntarily contribute to public good in small and large group, based on a Classic in experimental economics. Contrary to forecasts, decision-making is negatively correlated with Marginal Per Capita Return to investing in the public group in large group. Subjects appear to view the issue as weak. Decision-making is positively correlated, but too weakly, with number of subjects in small group and is not correlated in large group. Decision-making is always negatively correlated with game process, but without convergence to the dominant strategy. The method used in calculating compensation can have an impact. Reciprocity and aversion to inequity are always missing. This can be linked to the fact that subjects are not representative of the true population. In line with forecasts, two properties of public investment emerge and are consistent with decision-making of everyday life. Decision-making is negatively correlated with Marginal per Capita Return in small group, following logic of this incentive to contribute. There are signs of altruism in small group disappearing in larger group, disadvantaged by dilution of Giving. Presence of Warm-Glow Giving increases in large group, supported by changing nature of Giving. We model these results using the Logit equilibrium. It is noisy response functions including different components. Function’s properties properly meet decision-making and have advantage of not modeling some empirical contradictions. Contributions volontaires Jeu du bien public Groupes importants Prise de décision Equilibre Logit Altruisme Eclat chaleureux du don Voluntary contributions Public goods game Large groups Decision making Logit equilibrium Altruism Warm-glow giving 350 519.3
7	Datenzentrierte Bestimmung von Assoziationsregeln in parallelen Datenbankarchitekturen Legler, Thomas 22 June 2009 (has links) Die folgende Arbeit befasst sich mit der Alltagstauglichkeit moderner Massendatenverarbeitung, insbesondere mit dem Problem der Assoziationsregelanalyse. Vorhandene Datenmengen wachsen stark an, aber deren Auswertung ist für ungeübte Anwender schwierig. Daher verzichten Unternehmen auf Informationen, welche prinzipiell vorhanden sind. Assoziationsregeln zeigen in diesen Daten Abhängigkeiten zwischen den Elementen eines Datenbestandes, beispielsweise zwischen verkauften Produkten. Diese Regeln können mit Interessantheitsmaßen versehen werden, welche dem Anwender das Erkennen wichtiger Zusammenhänge ermöglichen. Es werden Ansätze gezeigt, dem Nutzer die Auswertung der Daten zu erleichtern. Das betrifft sowohl die robuste Arbeitsweise der Verfahren als auch die einfache Auswertung der Regeln. Die vorgestellten Algorithmen passen sich dabei an die zu verarbeitenden Daten an, was sie von anderen Verfahren unterscheidet. Assoziationsregelsuchen benötigen die Extraktion häufiger Kombinationen (EHK). Hierfür werden Möglichkeiten gezeigt, Lösungsansätze auf die Eigenschaften moderne System anzupassen. Als Ansatz werden Verfahren zur Berechnung der häufigsten $N$ Kombinationen erläutert, welche anders als bekannte Ansätze leicht konfigurierbar sind. Moderne Systeme rechnen zudem oft verteilt. Diese Rechnerverbünde können große Datenmengen parallel verarbeiten, benötigen jedoch die Vereinigung lokaler Ergebnisse. Für verteilte Top-N-EHK auf realistischen Partitionierungen werden hierfür Ansätze mit verschiedenen Eigenschaften präsentiert. Aus den häufigen Kombinationen werden Assoziationsregeln gebildet, deren Aufbereitung ebenfalls einfach durchführbar sein soll. In der Literatur wurden viele Maße vorgestellt. Je nach den Anforderungen entsprechen sie je einer subjektiven Bewertung, allerdings nicht zwingend der des Anwenders. Hierfür wird untersucht, wie mehrere Interessantheitsmaßen zu einem globalen Maß vereinigt werden können. Dies findet Regeln, welche mehrfach wichtig erschienen. Der Nutzer kann mit den Vorschlägen sein Suchziel eingrenzen. Ein zweiter Ansatz gruppiert Regeln. Dies erfolgt über die Häufigkeiten der Regelelemente, welche die Grundlage von Interessantheitsmaßen bilden. Die Regeln einer solchen Gruppe sind daher bezüglich vieler Interessantheitsmaßen ähnlich und können gemeinsam ausgewertet werden. Dies reduziert den manuellen Aufwand des Nutzers. Diese Arbeit zeigt Möglichkeiten, Assoziationsregelsuchen auf einen breiten Benutzerkreis zu erweitern und neue Anwender zu erreichen. Die Assoziationsregelsuche wird dabei derart vereinfacht, dass sie statt als Spezialanwendung als leicht nutzbares Werkzeug zur Datenanalyse verwendet werden kann. / The importance of data mining is widely acknowledged today. Mining for association rules and frequent patterns is a central activity in data mining. Three main strategies are available for such mining: APRIORI , FP-tree-based approaches like FP-GROWTH, and algorithms based on vertical data structures and depth-first mining strategies like ECLAT and CHARM. Unfortunately, most of these algorithms are only moderately suitable for many “real-world” scenarios because their usability and the special characteristics of the data are two aspects of practical association rule mining that require further work. All mining strategies for frequent patterns use a parameter called minimum support to define a minimum occurrence frequency for searched patterns. This parameter cuts down the number of patterns searched to improve the relevance of the results. In complex business scenarios, it can be difficult and expensive to define a suitable value for the minimum support because it depends strongly on the particular datasets. Users are often unable to set this parameter for unknown datasets, and unsuitable minimum-support values can extract millions of frequent patterns and generate enormous runtimes. For this reason, it is not feasible to permit ad-hoc data mining by unskilled users. Such users do not have the knowledge and time to define suitable parameters by trial-and-error procedures. Discussions with users of SAP software have revealed great interest in the results of association-rule mining techniques, but most of these users are unable or unwilling to set very technical parameters. Given such user constraints, several studies have addressed the problem of replacing the minimum-support parameter with more intuitive top-n strategies. We have developed an adaptive mining algorithm to give untrained SAP users a tool to analyze their data easily without the need for elaborate data preparation and parameter determination. Previously implemented approaches of distributed frequent-pattern mining were expensive and time-consuming tasks for specialists. In contrast, we propose a method to accelerate and simplify the mining process by using top-n strategies and relaxing some requirements on the results, such as completeness. Unlike such data approximation techniques as sampling, our algorithm always returns exact frequency counts. The only drawback is that the result set may fail to include some of the patterns up to a specific frequency threshold. Another aspect of real-world datasets is the fact that they are often partitioned for shared-nothing architectures, following business-specific parameters like location, fiscal year, or branch office. Users may also want to conduct mining operations spanning data from different partners, even if the local data from the respective partners cannot be integrated at a single location for data security reasons or due to their large volume. Almost every data mining solution is constrained by the need to hide complexity. As far as possible, the solution should offer a simple user interface that hides technical aspects like data distribution and data preparation. Given that BW Accelerator users have such simplicity and distribution requirements, we have developed an adaptive mining algorithm to give unskilled users a tool to analyze their data easily, without the need for complex data preparation or consolidation. For example, Business Intelligence scenarios often partition large data volumes by fiscal year to enable efficient optimizations for the data used in actual workloads. For most mining queries, more than one data partition is of interest, and therefore, distribution handling that leaves the data unaffected is necessary. The algorithms presented in this paper have been developed to work with data stored in SAP BW. A salient feature of SAP BW Accelerator is that it is implemented as a distributed landscape that sits on top of a large number of shared-nothing blade servers. Its main task is to execute OLAP queries that require fast aggregation of many millions of rows of data. Therefore, the distribution of data over the dedicated storage is optimized for such workloads. Data mining scenarios use the same data from storage, but reporting takes precedence over data mining, and hence, the data cannot be redistributed without massive costs. Distribution by special data semantics or user-defined selections can produce many partitions and very different partition sizes. The handling of such real-world distributions for frequent-pattern mining is an important task, but it conflicts with the requirement of balanced partition. info:eu-repo/classification/ddc/004 ddc:004

1

Page generated in 0.0501 seconds