Global ETD Search

1	Quality-of-Service-Aware Data Stream Processing Schmidt, Sven 21 March 2007 (has links) (PDF) Data stream processing in the industrial as well as in the academic field has gained more and more importance during the last years. Consider the monitoring of industrial processes as an example. There, sensors are mounted to gather lots of data within a short time range. Storing and post-processing these data may occasionally be useless or even impossible. On the one hand, only a small part of the monitored data is relevant. To efficiently use the storage capacity, only a preselection of the data should be considered. On the other hand, it may occur that the volume of incoming data is generally too high to be stored in time or–in other words–the technical efforts for storing the data in time would be out of scale. Processing data streams in the context of this thesis means to apply database operations to the stream in an on-the-fly manner (without explicitly storing the data). The challenges for this task lie in the limited amount of resources while data streams are potentially infinite. Furthermore, data stream processing must be fast and the results have to be disseminated as soon as possible. This thesis focuses on the latter issue. The goal is to provide a so-called Quality-of-Service (QoS) for the data stream processing task. Therefore, adequate QoS metrics like maximum output delay or minimum result data rate are defined. Thereafter, a cost model for obtaining the required processing resources from the specified QoS is presented. On that basis, the stream processing operations are scheduled. Depending on the required QoS and on the available resources, the weight can be shifted among the individual resources and QoS metrics, respectively. Calculating and scheduling resources requires a lot of expert knowledge regarding the characteristics of the stream operations and regarding the incoming data streams. Often, this knowledge is based on experience and thus, a revision of the resource calculation and reservation becomes necessary from time to time. This leads to occasional interruptions of the continuous data stream processing, of the delivery of the result, and thus, of the negotiated Quality-of-Service. The proposed robustness concept supports the user and facilitates a decrease in the number of interruptions by providing more resources. data stream processing quality-of-service robustness Datenstromverarbeitung Qualität Robustheit ddc:004 rvk:ST 274 Datenstrom Datenverarbeitung Dienstgüte
2	Structural Graph-based Metamodel Matching Voigt, Konrad 17 January 2012 (has links) (PDF) Data integration has been, and still is, a challenge for applications processing multiple heterogeneous data sources. Across the domains of schemas, ontologies, and metamodels, this imposes the need for mapping specifications, i.e. the task of discovering semantic correspondences between elements. Support for the development of such mappings has been researched, producing matching systems that automatically propose mapping suggestions. However, especially in the context of metamodel matching the result quality of state of the art matching techniques leaves room for improvement. Although the traditional approach of pair-wise element comparison works on smaller data sets, its quadratic complexity leads to poor runtime and memory performance and eventually to the inability to match, when applied on real-world data. The work presented in this thesis seeks to address these shortcomings. Thereby, we take advantage of the graph structure of metamodels. Consequently, we derive a planar graph edit distance as metamodel similarity metric and mining-based matching to make use of redundant information. We also propose a planar graph-based partitioning to cope with large-scale matching. These techniques are then evaluated using real-world mappings from SAP business integration scenarios and the MDA community. The results demonstrate improvement in quality and managed runtime and memory consumption for large-scale metamodel matching. Metamodell Ähnlichkeit Graphen Planar Metamodel Matching Graph Planar ddc:004 rvk:ST 274 rvk:ST 230
3	Technologische Analysen im Umfeld Sozialer Netzwerke Schnitzler, Peter 11 November 2008 (has links) (PDF) Die vorliegende Arbeit analysiert die Möglichkeiten einer Daten- und Kontaktaggregation im Umfeld Sozialer Netzwerke. Zunächst wird eine Kategorisierung der wichtigsten Netzwerke und Frameworks vorgenommen. Die Funktionalitäten von acht Sozialen Netzwerken und fünf Frameworks werden anhand einer zuvor entwickelten Evaluierungsmatrix detailliert untersucht. Dabei stehen insbesondere die Funktionalitäten der APIs im Vordergrund. Aufbauend auf den Ergebnissen der Analyse wird ein Prototyp für eine Daten- und Kontaktaggregation konzipiert, implementiert und evaluiert. Abschließend werden Empfehlungen zu den verwendeten Technologien und für die Konzipierungen von zukünftigen Daten- und Kontaktaggregationen im Umfeld Sozialer Netzwerke gegeben. / This thesis analyses the potential of a data- and contactaggreation in the context of social networks. It provides an overview and categorization of the most important networks and frameworks. The functions of eight social networks and five frameworks are analyzed on the basis of a previously developed matrix of evaluation. Special attention is paid to the features of the APIs. Using the results from the evaluation a prototype is planned, coded and evaluated. Finally, regards for future aggregations between social networks are elaborated. Soziale Netzwerke Datenaggregation Kontaktaggreagtion Frameworks Evaluierung Matrix social networks dataaggregation contactaggregation frameworks evaluation white label social networks matrix of evaluation ddc:004 rvk:ST 200 rvk:ST 274
4	Laufzeitadaption von zustandsbehafteten Datenstromoperatoren Wolf, Bernhard 04 December 2013 (has links) (PDF) Änderungen von Datenstromanfragen zur Laufzeit werden insbesondere durch zustandsbehaftete Datenstromoperatoren erschwert. Da die Zustände im Arbeitsspeicher abgelegt sind und bei einem Neustart verloren gehen, wurden in der Vergangenheit Migrationsverfahren entwickelt, um die inneren Operatorzustände bei einem Änderungsvorgang zu erhalten. Die Migrationsverfahren basieren auf zwei unterschiedlichen Ansätzen - Zustandstransfer und Parallelausführung - sind jedoch aufgrund ihrer Realisierung auf eine zentrale Ausführung beschränkt. Mit wachsenden Anforderungen in Bezug auf Datenmengen und Antwortzeiten werden Datenstromsysteme vermehrt verteilt ausgeführt, beispielsweise durch Sensornetze oder verteilte IT-Systeme. Zur Anpassung der Anfragen zur Laufzeit sind existierende Migrationsstrategien nicht oder nur bedingt geeignet. Diese Arbeit leistet einen Beitrag zur Lösung dieser Problematik und zur Optimierung der Migration in Datenstromsystemen. Am Beispiel von präventiven Instandhaltungsstrategien in Fabrikumgebungen werden Anforderungen für die Datenstromverarbeitung und insbesondere für die Migration abgeleitet. Das generelle Ziel ist demnach eine möglichst schnelle Migration bei gleichzeitiger Ergebnisausgabe. In einer detaillierten Analyse der existierenden Migrationsstrategien werden deren Stärken und Schwächen bezüglich der gestellten Anforderungen diskutiert. Für die Adaption von laufenden Datenstromanfragen wird eine allgemeine Methodik vorgestellt, welche als Basis für die neuen Strategien dient. Diese Adaptionsmethodik unterstützt zwei Verfahren zur Bestimmung von Migrationskonfigurationen - ein numerisches Verfahren für periodische Datenströme und ein heuristisches Verfahren, welches auch auf aperiodische Datenströme angewendet werden kann. Eine wesentliche Funktionalität zur Minimierung der Migrationsdauer ist dabei die Beschränkung auf notwendige Zustandswerte, da in verteilten Umgebungen eine Übertragungszeit für den Zustandstransfer veranschlagt werden muss - zwei Aspekte, die bei existierenden Verfahren nicht berücksichtigt werden. Durch die Verwendung von neu entwickelten Zustandstransfermethoden kann zudem die Übertragungsreihenfolge der einzelnen Zustandswerte beeinflusst werden. Die Konzepte wurden in einem OSGi-basierten Prototyp implementiert und zudem simulativ analysiert. Mit einer umfassenden Evaluierung wird die Funktionsfähigkeit aller Komponenten und Konzepte demonstriert. Der Performance-Vergleich zwischen den existierenden und den neuen Migrationsstrategien fällt deutlich zu Gunsten der neuen Strategien aus, die zudem in der Lage sind, alle Anforderungen zu erfüllen. Datenstrom Datenstromverarbeitung Zustandstransfer Migration Zustandsmigration Zustandserhaltung data stream processing data stream management state transfer operator migration state migration ddc:004 rvk:ST 200 rvk:ST 274 Datenstrom Datenbank
5	Kollaborative Erstellung von Mind-Maps mit persönlichen Linsen an interaktiven Display Walls / Collaborative Creation of Mind-Maps on interactive Display Walls using personal Lenses Gräf, Maximilian 06 June 2018 (has links) (PDF) Mind-Mapping ist eine effiziente und universelle Kreativitätstechnik, die es ermöglicht, in einer Gruppe ein komplexes Thema aufzuschlüsseln. Dabei wird eine Assoziation verschiedener Schlüsselwörter, Notizen und Grafiken zu einem zentralen Begriff vorgenommen und dadurch eine Möglichkeit zur schrittweisen kompakten Visualisierung eines Themengebietes geschaffen. Große interaktive Display Walls erlauben durch ihre Fläche und ihre interaktiven Möglichkeiten das kollaborative Erstellen von Mind-Maps, so dass jeder Nutzer individuell Teile kreieren und der gemeinsamen Mind-Map beifügen kann. In dieser Ausarbeitung werden geeignete Interaktionskonzepte zur Erstellung von Mind-Maps an interaktiven Display Walls vorgestellt. Dabei liegt der Fokus auf den Möglichkeiten der Erkennung der Nutzerposition vor der Display Wall, um jedem Benutzer eine persönliche Linse und damit Zugriff auf persönliche Werkzeuge zu ermöglichen, sowie der Interaktion mittels Stift und Touch zum Zwecke der Erstellung und Platzierung von Mind-Map Elementen. Dafür werden zunächst verwandte Arbeiten aus den Bereichen: Mind-Maps, Kollaboration und Interaktion mit interaktiven Display Walls und Magische Linsen analysiert. Basierend auf einem Grundverständnis dieser drei Domänen wurden Konzepte zur kollaborativen Erstellung von Mind-Maps an interaktiven Display Walls mit persönlichen Linsen präsentiert. Ausgewählte Konzepte wurden dabei in einer prototypischen Implementierung umgesetzt und boten erste aussichtsreiche Einblicke, wie das Mind-Mapping der Zukunft erfolgen könnte. Dabei wurde insbesondere der Nutzen einer persönlichen Linse, als individuelle Visualisierungs- und Interaktions-Schnittstelle erkannt. / Mind-mapping is an effective and universal creativity technique, which allows to break down complex topics in a group activity. Therefore keywords, notes or images are being associated to a superordinate topic, in order to gradually create a compact visualization of a topic. Large interactive display walls allow the collaborative creation of mind-maps due to their size and options for interaction so that the users can individually create parts and integrate them into the collective mind-map. As part of this thesis, suitable concepts for the creation of mind-maps on interactive display walls are being presented. In this context, the emphasis is lying on the possibilities of using the position of the user in front the display wall, in order to offer every user a personal lens and thus access to personal tools, as well as the interaction with pen and touch in order to create and move mind-map-elements. Therefore related work of the topic areas: mind-maps, collaboration and interaction at interactive display walls and Magic Lenses will be analysed at first. Based on a fundamental understanding of these three domains, concepts for the collaborative creation of mind-maps on interactive display walls using personal lenses are being presented. Selected concepts were prototypically implemented and offered first promising insights on how the mind-mapping of the future could be done. In the process the particular benefit of a personal lens as an individual visualization- and interaction-interface has been recognized. Mind-Map Mind-mapping kollaborative Display Wall interaktiv Touch Interaktion Mind-Map Mind-mapping collaborative interactive Display Wall touch interaction ddc:004 rvk:ST 274
6	Sampling Algorithms for Evolving Datasets Gemulla, Rainer 24 October 2008 (has links) (PDF) Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such samples are widely used to speed up the processing of analytic queries and data-mining tasks, to enhance query optimization, and to facilitate information integration. Most of the existing work on database sampling focuses on how to create or exploit a random sample of a static database, that is, a database that does not change over time. The assumption of a static database, however, severely limits the applicability of these techniques in practice, where data is often not static but continuously evolving. In order to maintain the statistical validity of the sample, any changes to the database have to be appropriately reflected in the sample. In this thesis, we study efficient methods for incrementally maintaining a uniform random sample of the items in a dataset in the presence of an arbitrary sequence of insertions, updates, and deletions. We consider instances of the maintenance problem that arise when sampling from an evolving set, from an evolving multiset, from the distinct items in an evolving multiset, or from a sliding window over a data stream. Our algorithms completely avoid any accesses to the base data and can be several orders of magnitude faster than algorithms that do rely on such expensive accesses. The improved efficiency of our algorithms comes at virtually no cost: the resulting samples are provably uniform and only a small amount of auxiliary information is associated with the sample. We show that the auxiliary information not only facilitates efficient maintenance, but it can also be exploited to derive unbiased, low-variance estimators for counts, sums, averages, and the number of distinct items in the underlying dataset. In addition to sample maintenance, we discuss methods that greatly improve the flexibility of random sampling from a system's point of view. More specifically, we initiate the study of algorithms that resize a random sample upwards or downwards. Our resizing algorithms can be exploited to dynamically control the size of the sample when the dataset grows or shrinks; they facilitate resource management and help to avoid under- or oversized samples. Furthermore, in large-scale databases with data being distributed across several remote locations, it is usually infeasible to reconstruct the entire dataset for the purpose of sampling. To address this problem, we provide efficient algorithms that directly combine the local samples maintained at each location into a sample of the global dataset. We also consider a more general problem, where the global dataset is defined as an arbitrary set or multiset expression involving the local datasets, and provide efficient solutions based on hashing. Uniform sampling incremental sample maintenance set sampling multiset sampling distinct-item sampling data stream sampling Einfache Zufallsstichproben inkrementelle Stichprobenwartung ddc:004 rvk:ST 274
7	Datenzentrierte Bestimmung von Assoziationsregeln in parallelen Datenbankarchitekturen Legler, Thomas 15 August 2009 (has links) (PDF) Die folgende Arbeit befasst sich mit der Alltagstauglichkeit moderner Massendatenverarbeitung, insbesondere mit dem Problem der Assoziationsregelanalyse. Vorhandene Datenmengen wachsen stark an, aber deren Auswertung ist für ungeübte Anwender schwierig. Daher verzichten Unternehmen auf Informationen, welche prinzipiell vorhanden sind. Assoziationsregeln zeigen in diesen Daten Abhängigkeiten zwischen den Elementen eines Datenbestandes, beispielsweise zwischen verkauften Produkten. Diese Regeln können mit Interessantheitsmaßen versehen werden, welche dem Anwender das Erkennen wichtiger Zusammenhänge ermöglichen. Es werden Ansätze gezeigt, dem Nutzer die Auswertung der Daten zu erleichtern. Das betrifft sowohl die robuste Arbeitsweise der Verfahren als auch die einfache Auswertung der Regeln. Die vorgestellten Algorithmen passen sich dabei an die zu verarbeitenden Daten an, was sie von anderen Verfahren unterscheidet. Assoziationsregelsuchen benötigen die Extraktion häufiger Kombinationen (EHK). Hierfür werden Möglichkeiten gezeigt, Lösungsansätze auf die Eigenschaften moderne System anzupassen. Als Ansatz werden Verfahren zur Berechnung der häufigsten $N$ Kombinationen erläutert, welche anders als bekannte Ansätze leicht konfigurierbar sind. Moderne Systeme rechnen zudem oft verteilt. Diese Rechnerverbünde können große Datenmengen parallel verarbeiten, benötigen jedoch die Vereinigung lokaler Ergebnisse. Für verteilte Top-N-EHK auf realistischen Partitionierungen werden hierfür Ansätze mit verschiedenen Eigenschaften präsentiert. Aus den häufigen Kombinationen werden Assoziationsregeln gebildet, deren Aufbereitung ebenfalls einfach durchführbar sein soll. In der Literatur wurden viele Maße vorgestellt. Je nach den Anforderungen entsprechen sie je einer subjektiven Bewertung, allerdings nicht zwingend der des Anwenders. Hierfür wird untersucht, wie mehrere Interessantheitsmaßen zu einem globalen Maß vereinigt werden können. Dies findet Regeln, welche mehrfach wichtig erschienen. Der Nutzer kann mit den Vorschlägen sein Suchziel eingrenzen. Ein zweiter Ansatz gruppiert Regeln. Dies erfolgt über die Häufigkeiten der Regelelemente, welche die Grundlage von Interessantheitsmaßen bilden. Die Regeln einer solchen Gruppe sind daher bezüglich vieler Interessantheitsmaßen ähnlich und können gemeinsam ausgewertet werden. Dies reduziert den manuellen Aufwand des Nutzers. Diese Arbeit zeigt Möglichkeiten, Assoziationsregelsuchen auf einen breiten Benutzerkreis zu erweitern und neue Anwender zu erreichen. Die Assoziationsregelsuche wird dabei derart vereinfacht, dass sie statt als Spezialanwendung als leicht nutzbares Werkzeug zur Datenanalyse verwendet werden kann. / The importance of data mining is widely acknowledged today. Mining for association rules and frequent patterns is a central activity in data mining. Three main strategies are available for such mining: APRIORI , FP-tree-based approaches like FP-GROWTH, and algorithms based on vertical data structures and depth-first mining strategies like ECLAT and CHARM. Unfortunately, most of these algorithms are only moderately suitable for many “real-world” scenarios because their usability and the special characteristics of the data are two aspects of practical association rule mining that require further work. All mining strategies for frequent patterns use a parameter called minimum support to define a minimum occurrence frequency for searched patterns. This parameter cuts down the number of patterns searched to improve the relevance of the results. In complex business scenarios, it can be difficult and expensive to define a suitable value for the minimum support because it depends strongly on the particular datasets. Users are often unable to set this parameter for unknown datasets, and unsuitable minimum-support values can extract millions of frequent patterns and generate enormous runtimes. For this reason, it is not feasible to permit ad-hoc data mining by unskilled users. Such users do not have the knowledge and time to define suitable parameters by trial-and-error procedures. Discussions with users of SAP software have revealed great interest in the results of association-rule mining techniques, but most of these users are unable or unwilling to set very technical parameters. Given such user constraints, several studies have addressed the problem of replacing the minimum-support parameter with more intuitive top-n strategies. We have developed an adaptive mining algorithm to give untrained SAP users a tool to analyze their data easily without the need for elaborate data preparation and parameter determination. Previously implemented approaches of distributed frequent-pattern mining were expensive and time-consuming tasks for specialists. In contrast, we propose a method to accelerate and simplify the mining process by using top-n strategies and relaxing some requirements on the results, such as completeness. Unlike such data approximation techniques as sampling, our algorithm always returns exact frequency counts. The only drawback is that the result set may fail to include some of the patterns up to a specific frequency threshold. Another aspect of real-world datasets is the fact that they are often partitioned for shared-nothing architectures, following business-specific parameters like location, fiscal year, or branch office. Users may also want to conduct mining operations spanning data from different partners, even if the local data from the respective partners cannot be integrated at a single location for data security reasons or due to their large volume. Almost every data mining solution is constrained by the need to hide complexity. As far as possible, the solution should offer a simple user interface that hides technical aspects like data distribution and data preparation. Given that BW Accelerator users have such simplicity and distribution requirements, we have developed an adaptive mining algorithm to give unskilled users a tool to analyze their data easily, without the need for complex data preparation or consolidation. For example, Business Intelligence scenarios often partition large data volumes by fiscal year to enable efficient optimizations for the data used in actual workloads. For most mining queries, more than one data partition is of interest, and therefore, distribution handling that leaves the data unaffected is necessary. The algorithms presented in this paper have been developed to work with data stored in SAP BW. A salient feature of SAP BW Accelerator is that it is implemented as a distributed landscape that sits on top of a large number of shared-nothing blade servers. Its main task is to execute OLAP queries that require fast aggregation of many millions of rows of data. Therefore, the distribution of data over the dedicated storage is optimized for such workloads. Data mining scenarios use the same data from storage, but reporting takes precedence over data mining, and hence, the data cannot be redistributed without massive costs. Distribution by special data semantics or user-defined selections can produce many partitions and very different partition sizes. The handling of such real-world distributions for frequent-pattern mining is an important task, but it conflicts with the requirement of balanced partition. Assoziationsregel Mining Suche SAP SAP BW Accelerator BI Accelerator Frequent Patterns association rule mining frequent pattern mining sap bw accelerator bwa bia uneven partitions distributed partition eclat top-n top-k top-r ddc:004 rvk:ST 274

1

Page generated in 0.2806 seconds