Spelling suggestions: "subject:"datenanalyse"" "subject:"batenanalyse""
151 |
Visual Data Analysis in Device EcologiesHorak, Tom 07 September 2021 (has links)
With the continued development towards a digitalized and data-driven world, the importance of visual data analysis is increasing as well. Visual data analysis enables people to interactively explore and reason on certain data through the combined use of multiple visualizations. This is relevant for a wide range of application domains, including personal, professional, and public ones. In parallel, a ubiquity of modern devices with very heterogeneous characteristics has spawned. These devices, such as smartphones, tablets, or digital whiteboards, can enable more flexible workflows during our daily work, for example, while on-the-go, in meetings, or at home. One way to enable flexible workflows is the combination of multiple devices in so-called device ecologies. This thesis investigates how such a combined usage of devices can facilitate the visual data analysis of multivariate data sets. For that, new approaches for both visualization and interaction are presented here, allowing to make full use of the dynamic nature of device ecologies. So far, the literature on these aspects is limited and lacks a broader consideration of data analysis in device ecologies.
This doctoral thesis presents investigations into three main parts, each addressing one research question: (i) how visualizations can be adapted for heterogeneous devices, (ii) how device pairings can be used to support data exploration workflows, and (iii) how visual data analysis can be supported in fully dynamic device ecologies. For the first part, an extended analytical investigation of the notion of responsive visualization is contributed. This investigation is then complemented by the introduction of a novel matrix-based visualization approach that incorporates such responsive visualizations as local focus regions. For the two other parts, multiple conceptual frameworks are presented that are innovative combinations of visualization and interaction techniques. In the second part, such work is conducted for two selected display pairings, the extension of smartwatches with display-equipped watchstraps and the contrary combination of smartwatch and large display. For these device ensembles, it is investigated how analysis workflows can be facilitated. Then, in the third part, it is explored how interactive mechanisms can be used for flexibly combining and coordinating devices by utilizing spatial arrangements, as well as how the view distribution process can be supported through automated optimization processes. This thesis’s extensive conceptual work is accompanied by the design of prototypical systems, qualitative evaluations, and reviews of existing literature.
|
152 |
Von Chaos und Qualität ‐ die Ergebnisse des Projekts Collaborative TaggingKrätzsch, Christine 19 January 2012 (has links)
Im akademischen Bereich sind in Social-Software-Anwendungen wie Connotea, CiteULike und BibSonony umfangreiche Sammlungen von nutzergenerierten Metadaten entstanden. Im Vergleich zu kontrollierten Vokabularen, wie der Schlagwortnormdatei, handelt es sich dabei um personalisierte und in weiten Teilen „chaotische“ Inhaltserschließung. An der Universitätsbibliothek Mannheim wurde in einem DFG-Projekt untersucht, inwieweit das Potential dieser Art von Metadaten für eine bessere und nutzerorientierte Präsentation von Informationsressourcen eingesetzt werden kann.
Ein Kernstück der Untersuchung war die Analyse von Tag-Daten des Systems BibSonomy. Es zeigte sich, dass nicht nur die mangelnde semantische Strukturiertheit der Tags, sondern auch ihre heterogene Gestalt einen limitierenden Faktor für die Verwendung in der bibliothekarischen Sacherschließung darstellt. Der Beitrag gibt anhand von Beispielen Einblick in das qualitative und strukturelle Chaos der untersuchten Tags und fasst die Ergebnisse des Projekts zusammen.
|
153 |
Feedback-Driven Data ClusteringHahmann, Martin 28 October 2013 (has links)
The acquisition of data and its analysis has become a common yet critical task in many areas of modern economy and research. Unfortunately, the ever-increasing scale of datasets has long outgrown the capacities and abilities humans can muster to extract information from them and gain new knowledge. For this reason, research areas like data mining and knowledge discovery steadily gain importance. The algorithms they provide for the extraction of knowledge are mandatory prerequisites that enable people to analyze large amounts of information. Among the approaches offered by these areas, clustering is one of the most fundamental. By finding groups of similar objects inside the data, it aims to identify meaningful structures that constitute new knowledge. Clustering results are also often used as input for other analysis techniques like classification or forecasting.
As clustering extracts new and unknown knowledge, it obviously has no access to any form of ground truth. For this reason, clustering results have a hypothetical character and must be interpreted with respect to the application domain. This makes clustering very challenging and leads to an extensive and diverse landscape of available algorithms. Most of these are expert tools that are tailored to a single narrowly defined application scenario. Over the years, this specialization has become a major trend that arose to counter the inherent uncertainty of clustering by including as much domain specifics as possible into algorithms. While customized methods often improve result quality, they become more and more complicated to handle and lose versatility. This creates a dilemma especially for amateur users whose numbers are increasing as clustering is applied in more and more domains. While an abundance of tools is offered, guidance is severely lacking and users are left alone with critical tasks like algorithm selection, parameter configuration and the interpretation and adjustment of results.
This thesis aims to solve this dilemma by structuring and integrating the necessary steps of clustering into a guided and feedback-driven process. In doing so, users are provided with a default modus operandi for the application of clustering. Two main components constitute the core of said process: the algorithm management and the visual-interactive interface. Algorithm management handles all aspects of actual clustering creation and the involved methods. It employs a modular approach for algorithm description that allows users to understand, design, and compare clustering techniques with the help of building blocks. In addition, algorithm management offers facilities for the integration of multiple clusterings of the same dataset into an improved solution. New approaches based on ensemble clustering not only allow the utilization of different clustering techniques, but also ease their application by acting as an abstraction layer that unifies individual parameters. Finally, this component provides a multi-level interface that structures all available control options and provides the docking points for user interaction.
The visual-interactive interface supports users during result interpretation and adjustment. For this, the defining characteristics of a clustering are communicated via a hybrid visualization. In contrast to traditional data-driven visualizations that tend to become overloaded and unusable with increasing volume/dimensionality of data, this novel approach communicates the abstract aspects of cluster composition and relations between clusters. This aspect orientation allows the use of easy-to-understand visual components and makes the visualization immune to scale related effects of the underlying data. This visual communication is attuned to a compact and universally valid set of high-level feedback that allows the modification of clustering results. Instead of technical parameters that indirectly cause changes in the whole clustering by influencing its creation process, users can employ simple commands like merge or split to directly adjust clusters.
The orchestrated cooperation of these two main components creates a modus operandi, in which clusterings are no longer created and disposed as a whole until a satisfying result is obtained. Instead, users apply the feedback-driven process to iteratively refine an initial solution. Performance and usability of the proposed approach were evaluated with a user study. Its results show that the feedback-driven process enabled amateur users to easily create satisfying clustering results even from different and not optimal starting situations.
|
154 |
Jobzentrisches Monitoring in Verteilten Heterogenen Umgebungen mit Hilfe Innovativer Skalierbarer MethodenHilbrich, Marcus 24 March 2015 (has links)
Im Bereich des wissenschaftlichen Rechnens nimmt die Anzahl von Programmläufen (Jobs), die von einem Benutzer ausgeführt werden, immer weiter zu. Dieser Trend resultiert sowohl aus einer steigenden Anzahl an CPU-Cores, auf die ein Nutzer zugreifen kann, als auch durch den immer einfacheren Zugriff auf diese mittels Portalen, Workflow-Systemen oder Services. Gleichzeitig schränken zusätzliche Abstraktionsschichten von Grid- und Cloud-Umgebungen die Möglichkeit zur Beobachtung von Jobs ein. Eine Lösung bietet das jobzentrische Monitoring, das die Ausführung von Jobs transparent darstellen kann.
Die vorliegende Dissertation zeigt zum einen Methoden mit denen eine skalierbare Infrastruktur zur Verwaltung von Monitoring-Daten im Kontext von Grid, Cloud oder HPC (High Performance Computing) realisiert werden kann. Zu diesem Zweck wird sowohl eine Aufgabenteilung unter Berücksichtigung von Aspekten wie Netzwerkbandbreite und Speicherkapazität mittels einer Strukturierung der verwendeten Server in Schichten, als auch eine dezentrale Aufbereitung und Speicherung der Daten realisiert. Zum anderen wurden drei Analyseverfahren zur automatisierten und massenhaften Auswertung der Daten entwickelt.
Hierzu wurde unter anderem ein auf der Kreuzkorrelation basierender Algorithmus mit einem baumbasierten Optimierungsverfahren zur Reduzierung der Laufzeit und des Speicherbedarfs entwickelt. Diese drei Verfahren können die Anzahl der manuell zu analysierenden Jobs von vielen Tausenden, auf die wenigen, interessanten, tatsächlichen Ausreißer bei der Jobausführung reduzieren. Die Methoden und Verfahren zur massenhaften Analyse, sowie zur skalierbaren Verwaltung der jobzentrischen Monitoring-Daten, wurden entworfen, prototypisch implementiert und mittels Messungen sowie durch theoretische Analysen untersucht. / An increasing number of program executions (jobs) is an ongoing trend in scientific computing. Increasing numbers of available compute cores and lower access barriers, based on portal-systems, workflow-systems, or services, drive this trend. At the same time, the abstraction layers that enable grid and cloud solutions pose challenges in observing job behaviour. Thus, observation and monitoring capabilities for large numbers of jobs are lacking. Job-centric monitoring offers a solution to present job executions in a transparent manner.
This dissertation presents methods for scalable infrastructures that handle monitoring data of jobs in grid, cloud, and HPC (High Performance Computing) solutions. A layer-based organisation of servers with a distributed storage scheme enables a task sharing that respects network bandwidths and data capacities. Additionally, three proposed automatic analysis techniques enable an evaluation of huge data quantities.
One of the developed algorithms is based on cross-correlation and uses a tree-based optimisation strategy to decrease both runtime and memory usage. These three methods are able to significantly reduce the number of jobs for manual analysis from many thousands to a few interesting jobs that exhibit outlier-behaviour during job execution. Contributions of this thesis include a design, a prototype implementation, and an evaluation for methods that analyse large amounts of job-data, as well for the scalable storage concept for such data.
|
155 |
Database Support for 3D-Protein Data Set AnalysisLehner, Wolfgang, Hinneburg, Alexander 25 May 2022 (has links)
The progress in genome research demands for an adequate infrastructure to analyze the data sets. Database systems reflect a key technology to organize data and speed up the analysis process. This paper discusses the role of a relational database system based on the problem of finding frequent substructures in multi-dimensional protein databases. The specific problem consists of producing a set of association rules regarding frequent substructures with different lengths and gaps between the amino acid residues of a protein. From a database point of view, the process of finding association rules building the base for a more in-depth analysis of the data material is split into two parts. The first part performs a discretization of the conformational angle space of a single amino acid residue by computing the nearest neighbor of a given set of representatives. The second part consists in adapting a well-known association rule algorithm to determine the frequent substructures. Both steps within this comprehensive analysis task requires substantial support of the underlying database in order to reduce the programming overhead at the application level.
|
156 |
Building a real data warehouse for market researchLehner, Wolfgang, Albrecht, J., Teschke, M., Kirsche, T. 08 April 2022 (has links)
This paper reflects the results of the evaluation phase of building a data production system for the retail research division of the GfK, Europe's largest market research company. The application specific requirements like end-user needs or data volume are very different from data warehouses discussed in the literature, making it a real data warehouse. In a case study, these requirements are compared with state-of-the-art solutions offered by leading software vendors. Each of the common architectures (MOLAP, ROLAP, HOLAP) was represented by a product. The result of this comparison is that all systems have to be massively tailored to GfK's needs, especially to cope with meta data management or the maintenance of aggregations.
|
157 |
Building a real data warehouse for market researchLehner, Wolfgang, Albrecht, J., Teschke, M., Kirsche, T. 19 May 2022 (has links)
This paper reflects the results of the evaluation phase of building a data production system for the retail research division of the GfK, Europe's largest market research company. The application specific requirements like end-user needs or data volume are very different from data warehouses discussed in the literature, making it a real data warehouse. In a case study, these requirements are compared with state-of-the-art solutions offered by leading software vendors. Each of the common architectures (MOLAP, ROLAP, HOLAP) was represented by a product. The result of this comparison is that all systems have to be massively tailored to GfK's needs, especially to cope with meta data management or the maintenance of aggregations.
|
158 |
Transparent Forecasting Strategies in Database Management SystemsFischer, Ulrike, Lehner, Wolfgang 02 February 2023 (has links)
Whereas traditional data warehouse systems assume that data is complete or has been carefully preprocessed, increasingly more data is imprecise, incomplete, and inconsistent. This is especially true in the context of big data, where massive amount of data arrives continuously in real-time from vast data sources. Nevertheless, modern data analysis involves sophisticated statistical algorithm that go well beyond traditional BI and, additionally, is increasingly performed by non-expert users. Both trends require transparent data mining techniques that efficiently handle missing data and present a complete view of the database to the user. Time series forecasting estimates future, not yet available, data of a time series and represents one way of dealing with missing data. Moreover, it enables queries that retrieve a view of the database at any point in time - past, present, and future. This article presents an overview of forecasting techniques in database management systems. After discussing possible application areas for time series forecasting, we give a short mathematical background of the main forecasting concepts. We then outline various general strategies of integrating time series forecasting inside a database and discuss some individual techniques from the database community. We conclude this article by introducing a novel forecasting-enabled database management architecture that natively and transparently integrates forecast models.
|
159 |
Designing Random Sample Synopses with OutliersLehner, Wolfgang, Rosch, Philip, Gemulla, Rainer 12 August 2022 (has links)
Random sampling is one of the most widely used means to build synopses of large datasets because random samples can be used for a wide range of analytical tasks. Unfortunately, the quality of the estimates derived from a sample is negatively affected by the presence of 'outliers' in the data. In this paper, we show how to circumvent this shortcoming by constructing outlier-aware sample synopses. Our approach extends the well-known outlier indexing scheme to multiple aggregation columns.
|
160 |
Clustering Uncertain Data with Possible WorldsLehner, Wolfgang, Volk, Peter Benjamin, Rosenthal, Frank, Hahmann, Martin, Habich, Dirk 16 August 2022 (has links)
The topic of managing uncertain data has been explored in many ways. Different methodologies for data storage and query processing have been proposed. As the availability of management systems grows, the research on analytics of uncertain data is gaining in importance. Similar to the challenges faced in the field of data management, algorithms for uncertain data mining also have a high performance degradation compared to their certain algorithms. To overcome the problem of performance degradation, the MCDB approach was developed for uncertain data management based on the possible world scenario. As this methodology shows significant performance and scalability enhancement, we adopt this method for the field of mining on uncertain data. In this paper, we introduce a clustering methodology for uncertain data and illustrate current issues with this approach within the field of clustering uncertain data.
|
Page generated in 0.0447 seconds