Global ETD Search

11	Interactive Multiscale Visualization of Large, Multi-dimensional Datasets Kühne, Kay January 2018 (has links) This thesis project set out to find and implement a comfortable way to explore vast, multidimensional datasets using interactive multiscale visualizations to combat the ever-growing information overload that the digitized world is generating. Starting at the realization that even for people not working in the fields of information visualization and data science the size of interesting datasets often outgrows the capabilities of standard spreadsheet applications such as Microsoft Excel. This project established requirements for a system to overcome this problem. In this thesis report, we describe existing solutions, related work, and in the end designs and implementation of a working tool for initial data exploration that utilizes novel multiscale visualizations to make complex coherences comprehensible and has proven successful in a practical evaluation with two case studies. Data Exploration Visual Analytics Multiscale Visualization Focus+Context Overview+Detail Computer Sciences Datavetenskap (datalogi)
12	Interpretação de clusters gerados por algoritmos de clustering hierárquico / Interpreting clusters generated by hierarchical clustering algorithms Jean Metz 04 August 2006 (has links) O processo de Mineração de Dados (MD) consiste na extração automática de padrões que representam o conhecimento implícito em grandes bases de dados. Em geral, a MD pode ser classificada em duas categorias: preditiva e descritiva. Tarefas da primeira categoria, tal como a classificação, realizam inferências preditivas sobre os dados enquanto que tarefas da segunda categoria, tal como o clustering, exploram o conjunto de dados em busca de propriedades que o descrevem. Diferentemente da classificação, que analisa exemplos rotulados, o clustering utiliza exemplos para os quais o rótulo da classe não é previamente conhecido. Nessa tarefa, agrupamentos são formados de modo que exemplos de um mesmo cluster apresentam alta similaridade, ao passo que exemplos em clusters diferentes apresentam baixa similaridade. O clustering pode ainda facilitar a organização de clusters em uma hierarquia de agrupamentos, na qual são agrupados eventos similares, criando uma taxonomia que pode simplificar a interpretação de clusters. Neste trabalho, é proposto e desenvolvido um módulo de aprendizado não-supervisionado, que agrega algoritmos de clustering hierárquico e ferramentas de análise de clusters para auxiliar o especialista de domínio na interpretação dos resultados do clustering. Uma vez que o clustering hierárquico agrupa exemplos de acordo com medidas de similaridade e organiza os clusters em uma hierarquia, o usuário/especialista pode analisar e explorar essa hierarquia de agrupamentos em diferentes níveis para descobrir conceitos descritos por essa estrutura. O módulo proposto está integrado em um sistema maior, em desenvolvimento no Laboratório de Inteligência Computacional ? LABIC ?, que contempla todas as etapas do processo de MD, desde o pré-processamento de dados ao pós-processamento de conhecimento. Para avaliar o módulo proposto e seu uso para descoberta de conceitos a partir da estrutura hierárquica de clusters, foram realizados diversos experimentos sobre conjuntos de dados naturais, assim como um estudo de caso utilizando um conjunto de dados real. Os resultados mostram a viabilidade da metodologia proposta para interpretação dos clusters, apesar da complexidade do processo ser dependente das características do conjunto de dados. / The Data Mining (DM) process consists of the automated extraction of patterns representing knowledge implicitly stored in large databases. In general, DM tasks can be classified into two categories: predictive and descriptive. Tasks in the first category, such as classification and prediction, perform inference on the data in order to make predictions, while tasks in the second category, such as clustering, characterize the general properties of the data. Unlike classification and prediction, which analyze class-labeled data objects, clustering analyses data objects without a known class-label. Clusters of objects are formed so that objects that are in the same cluster have a close similarity among them, but are very dissimilar to objects in other clusters. Clustering can also facilitate the organization of clusters into a hierarchy of clusters that group similar events together. This taxonomy formation can facilitate interpretation of clusters. In this work, we propose and develop tools to deal with this task by implementing a module which comprises hierarchical clustering algorithms and several cluster analysis tools, aiming to help the domain specialist to interpret the clustering results. Once clusters group objects based on similarity measures which are organized into a hierarchy, the user/specialist is able to carry out an analysis and exploration of the agglomeration hierarchy at different levels of the hierarchy in order to discover concepts described by this structure. The proposed module is integrated into a large system under development by researchers from the Computational Intelligence Laboratory ? LABIC ?- which contemplates all the DM process steps, from data pre-processing to knowledge post-processing. To evaluate the implemented module and its use to discover concepts from the hierarchical structure of clusters, several experiments on natural databases were carried out as well as a case study using a real database. Results show the viability of the proposed methodology although the process could be complex depending on the characteristics of the database. Aprendizado não-supervisionado Exploração de dados Extração de padrões Data exploration Non-supervised learning Pattern extraction
13	Code duplication and reuse in Jupyter notebooks Koenzen, Andreas Peter 21 September 2020 (has links) Reusing code can expedite software creation, analysis and exploration of data. Expediency can be particularly valuable for users of computational notebooks, where duplication allows them to quickly test hypotheses and iterate over data, without creating code from scratch. In this thesis, I’ll explore the topic of code duplication and the behaviour of code reuse for Jupyter notebooks; quantifying and describing snippets of code and explore potential barriers for reuse. As part of this thesis I conducted two studies into Jupyter notebooks use. In my first study, I mined GitHub repositories, quantifying and describing code duplicates contained within repositories that contained at least one Jupyter notebook. For my second study, I conducted an observational user study using a contextual inquiry, where my participants solved specific tasks using notebooks, while I observed and took notes. The work in this thesis can be categorized as exploratory, since both my studies were aimed at generating hypotheses for which further studies can build upon. My contributions with this thesis is two-fold: a thorough description of code duplicates contained within GitHub repositories and an exploration of the behaviour behind code reuse in Jupyter notebooks. It is my desire that others can build upon this work to provide new tools, addressing some of the issues outlined in this thesis. / Graduate Jupyter computational notebooks code duplication code clones code reuse data analysis data exploration exploratory programming
14	Defining Underlying Factors Affecting Fault Reports within Residential Real Estate / Faktorer som Påverkar Felanmälningar inom Bostadsfastigheter Djurestål, Li, Leander, David January 2023 (has links) Several studies have been reviewed to get an understanding of where the real estate industry, and more specifically within facility management, stand in regards to digitalization. Implementing digitalization of the Real Estate industry has been researched as a possibility for some time now. Furthermore, the industry needs to make better use of the data in hand created by these digitalized solutions. This thesis uses a quantitative approach through a data analysis, studying underlying factors that affect fault reports, with data on this matter from three Swedish real estate housing companies. Studying this in regards to fault reports in general, non-digitalized fault reports, and digitalized fault reports. The result of the data analysis implies that there are several variables that are statistically proven to affect the amount of fault reports made. This result is then discussed arguing for reasons of this outcome, as well as the literature study related to this subject. / Flertalet studier har undersökts för att få en förståelse över vart fastighetsindustrin, och mer specifikt inom fastighetsförvaltning, står i utvecklingen gällande digitalisering. Implementeringen av digitalisering inom fastighetsindustrin har undersökts som en möjlighet över en längre tid nu. Dessutom måste industrin göra bättre användning av den data som finns tillgänglig av dessa digitala lösningar. Denna avhandling använder en kvantitativ metod genom en dataanalys och studerar underliggande faktorer som påverkar felanmälningar, med data från tre svenska bostads fastighetsföretag. Detta studeras genom att titta generellt på alla felanmälningar, icke-digitaliserade felanmälningar, och digitaliserade felanmälningar. Resultatet av dataanalysen säger att det finns flera variabler som är statistiskt bevisat att de påverkat antalet felanmälningar som gjorts. Resultatet är sedan diskuterat genom att argumentera för anledningar till utfallet, såväl som litteraturstudien relaterat till ämnet i sig. Digitalizing Facility Management Fault Reporting Data Exploration Proptech Digital Fastighetsförvaltning Felanmälan Datautforskning Proptech Civil Engineering Samhällsbyggnadsteknik
15	Using Knowledge Anchors to Facilitate User Exploration of Data Graphs Al-Tawil, M., Dimitrova, V., Thakker, Dhaval 28 November 2018 (has links) Yes / This paper investigates how to facilitate users’ exploration through data graphs for knowledge expansion. Our work focuses on knowledge utility – increasing users’ domain knowledge while exploring a data graph. We introduce a novel exploration support mechanism underpinned by the subsumption theory of meaningful learning, which postulates that new knowledge is grasped by starting from familiar concepts in the graph which serve as knowledge anchors from where links to new knowledge are made. A core algorithmic component for operationalising the subsumption theory for meaningful learning to generate exploration paths for knowledge expansion is the automatic identification of knowledge anchors in a data graph (KADG). We present several metrics for identifying KADG which are evaluated against familiar concepts in human cognitive structures. A subsumption algorithm that utilises KADG for generating exploration paths for knowledge expansion is presented, and applied in the context of a Semantic data browser in a music domain. The resultant exploration paths are evaluated in a task-driven experimental user study compared to free data graph exploration. The findings show that exploration paths, based on subsumption and using knowledge anchors, lead to significantly higher increase in the users’ conceptual knowledge and better usability than free exploration of data graphs. The work opens a new avenue in semantic data exploration which investigates the link between learning and knowledge exploration. This extends the value of exploration and enables broader applications of data graphs in systems where the end users are not experts in the specific domain. Data graphs Knowledge utility Data exploration Meaningful learning Knowledge anchors Exploration paths
16	Approche co-évolutive humain-système pour l'exploration de bases de données / Human-system co-evolutive approach for database exploration Rajaonarivo, Hiary Landy 29 June 2018 (has links) Ces travaux de recherche portent sur l'aide à l'exploration de bases de données.La particularité de l'approche proposée repose sur un principe de co-évolution de l'utilisateur et d'une interface intelligente. Cette dernière devant permettre d'apporter une aide à la compréhension du domaine représenté par les données. Pour cela, une métaphore de musée virtuel vivant a été adoptée. Ce musée évolue de façon incrémentale au fil des interactions de l'utilisateur. Il incarne non seulement les données mais également des informations sémantiques explicitées par un modèle de connaissances spécifique au domaine exploré.A travers l'organisation topologique et l'évolution incrémentale, le musée personnalise en ligne le parcours de l'utilisateur. L'approche est assurée par trois mécanismes principaux : l'évaluation du profil de l'utilisateur modélisé par une pondération dynamique d'informations sémantiques, l'utilisation de ce profil dynamique pour établir une recommandation ainsi que l'incarnation des données dans le musée.L'approche est appliquée au domaine du patrimoine dans le cadre du projet ANTIMOINE, financé par l'Agence Nationale de la Recherche (ANR). La généricité de cette dernière a été démontrée à travers son application à une base de données de publications mais également à travers l'utilisation de types d'interfaces variés (site web, réalité virtuelle).Des expérimentations ont permis de valider l'hypothèse que notre système s'adapte aux évolutions des comportements de l'utilisateur et qu'il est capable, en retour, d'influencer ce dernier. Elles ont également permis de comparer une interface 2D avec une interface 3D en termes de qualité de perception, de guidage, de préférence et d'efficacité. / This thesis focus on a proposition that helps humans during the exploration of database. The particularity of this proposition relies on a co-evolution principle between the user and an intelligent interface. It provides a support to the understanding of the domain represented by the data. A metaphor of living virtual museum is adopted. This museum evolves incrementally according to the user's interactions. It incarnates both the data and the semantic information which are expressed by a knowledge model specific to the domain of the data. Through the topological organization and the incremental evolution, the museum personalizes online the user's exploration. The approach is insured by three main mechanisms: the evaluation of the user profile modelled by a dynamical weighting of the semantic information, the use of this dynamic profile to establish a recommendation as well as the incarnation of the data in the living museum. The approach has been applied to the heritage domain as part of the ANTIMOINE project, funded by the National Research Agency (ANR). The genericity of the latter has been demonstrated through its application to a database of publications but also using various types of interfaces (website, virtual reality).Experiments have validated the hypothesis that our system adapts itself to the user behavior and that it is able, in turn, to influence him.They also showed the comparison between a 2D interface and a 3D interface in terms of quality of perception, guidance, preference and efficiency. Co-évolution Exploration de données Visualisation Recommandation Adaptabilité Environnement 3D Métaphores Co-evolution Data exploration Visualization Recommendation Adaptability 3D environment Metaphors
17	Engineering of a Knowledge Management System for Relational Medical Diagnosis Herrera-Hernandez, Maria Carolina 01 January 2012 (has links) The increasingly high costs of health care in the U.S. have led the general public to search for different medical approaches. Since the 1990's, the use of Complementary and Alternative Medicine (CAM) has radically increased in the U.S. due to its approach to treat physical, mental, and emotional causes of illness. In 2009, the National Health Statistics reported the impact of CAM in the U.S. health care economy, with population expenditures of $14.8 billion out-of-pocket on natural Medicine and $12.4 billion out-of-pocket on visits to CAM providers as a complement to Western Medicine care. CAM interconnects human functions to reach a balanced state, whereas Western Medicine focuses on specialties and body systems. Both Western Medicine and CAM are unlimited sources of knowledge that follow different approaches but that have the common goal of improving patients' well-being. Identifying relationships between Alternative and Western Medicine can open a completely new approach for health care that can increase understanding of human medical conditions, and facilitate the development of new and more cost-effective treatments. However, the abundance and dissimilarity of CAM and Western Medicine data makes knowledge correlation and management an extremely challenging task. The objective of this research is to design the framework for a knowledge management system to organize, store, and manage the abundant data available for Western Medicine and CAM, and to establish key relationships between the two practices for an effective exploration of ideas and possible solutions for medical diagnosis. Three main challenges in the design of the proposed framework are addressed: data acquisition and modeling; data organization, storage and transfer; and information distribution for further generation and sharing of medical knowledge. A framework to relate the diagnosis process in Western Medicine and Traditional Chinese Medicine, as one of the various forms of CAM, is presented based on process-oriented analysis, hierarchical knowledge representation, relational database, and interactive interface for system utilization. The research is demonstrated using a case study on chronic prostatitis, and can be scalable to other medical conditions. The presented system for knowledge management is not intended to provide a definite solution for medical diagnosis, but to enable the exploration and discovery of knowledge for relational medical diagnosis. The results of this research will positively impact information distribution and knowledge generation via interactive medical knowledge systems, development of new skills for diagnosis and treatment, and a broader understanding of medical diseases and treatments. Bioinformatics Data Exploration Learning and Sharing Decision Making in Health Care Traditional Chinese Medicine Western Medicine American Studies Arts and Humanities Industrial Engineering
18	Exploiting Human Factors and UI Characteristics for Interactive Data Exploration Khan, Meraj Ahmed January 2019 (has links) No description available. Computer Science database systems interactive data visualization interactive data exploration interactive visualization augmented reality AR data transformations
19	Food Industry Sales Prediction : A Big Data Analysis & Sales Forecast of Bake-off Products Lindström, Maja January 2021 (has links) In this thesis, the sales of bread and coffee bread at Coop Värmland AB have been studied. The aim was to find what factors that are important for the sales and then make predictions of how the sales will look like in the future to reduce waste and increase profits. Big data analysis and data exploration was used to get to know the data and find the factors that affect the sales the most. Time series forecasting and supervised machine learning models were used to predict future sales. The main focus was five different models that were compared and analysed, they were; Decision tree regression, Random forest regression, Artificial neural networks, Recurrent neural networks and a time series model called Prophet. Comparing the observed values to the predictions made by the models indicated that using a model based on the time series is to be preferred, that is, Prophet and Recurrent neural network. These two models gave the lowest errors and by that, the most accurate results. Prophet yielded mean absolute percentage errors of 8.295% for bread and 9.156% for coffee bread. The Recurrent neural network gave mean absolute percentage errors of 7.938% for bread and 13.12% for coffee bread. That is about twice as good as the models they are using today at Coop which are based on the mean value of the previous sales. / I denna avhandling har försäljningen av matbröd och fikabröd på Coop Värmland AB studerats. Målet var att hitta vilka faktorer som är viktiga för försäljningen och sedan förutsäga hur försäljningen kommer att se ut i framtiden för att minska svinn och öka vin- ster. Big data- analys och explorativ dataanalys har använts för att lära känna datat och hitta de faktorer som påverkar försäljningen mest. Tidsserieprediktion och olika mask- ininlärningsmodeller användes för att förutspå den framtida försäljningen. Huvudfokus var fem olika modeller som jämfördes och analyserades. De var Decision tree regression, Random forest regression, Artificial neural networks, Recurrent neural networks och en tidsseriemodell som kallas Prophet. Jämförelse mellan de observerade värdena och de värden som predicerats med modellerna indikerade att de modeller som är baserade på tidsserierna är att föredra, det vill säga Prophet och Recurrent neural networks. Dessa två modeller gav de lägsta felen och därmed de mest exakta resultaten. Prophet gav genomsnittliga absoluta procentuella fel på 8.295% för matbröd och 9.156% för fikabröd. Recurrent neural network gav genomsnittliga absoluta procentuella fel på 7.938% för matbröd och 13.12% för fikabröd. Det är ungefär dubbelt så korrekt som de modeller de använder idag på Coop som baseras på medelvärdet av tidigare försäljning. Machine Learning Big Data Analysis Data Exploration Prediction Statistics Mathematical Analysis Matematisk analys Probability Theory and Statistics Sannolikhetsteori och statistik
20	CURARE : curating and managing big data collections on the cloud / CURARE : curation et gestion de collections de données volumineuses sur le cloud Kemp, Gavin 26 September 2018 (has links) L'émergence de nouvelles plateformes décentralisées pour la création de données, tel que les plateformes mobiles, les capteurs et l'augmentation de la disponibilité d'open data sur le Web, s'ajoute à l'augmentation du nombre de sources de données disponibles et apporte des données massives sans précédent à être explorées. La notion de curation de données qui a émergé se réfère à la maintenance des collections de données, à la préparation et à l'intégration d'ensembles de données (data set), les combinant avec une plateforme analytique. La tâche de curation inclut l'extraction de métadonnées implicites et explicites ; faire la correspondance et l'enrichissement des métadonnées sémantiques afin d'améliorer la qualité des données. La prochaine génération de moteurs de gestion de données devrait promouvoir des techniques avec une nouvelle philosophie pour faire face au déluge des données. Ils devraient aider les utilisateurs à comprendre le contenue des collections de données et à apporter une direction pour explorer les données. Un scientifique peut explorer les collections de données pas à pas, puis s'arrêter quand le contenu et la qualité atteignent des niveaux satisfaisants. Notre travail adopte cette philosophie et la principale contribution est une approche de curation des données et un environnement d'exploration que nous avons appelé CURARE. CURARE est un système à base de services pour curer et explorer des données volumineuses sur les aspects variété et variabilité. CURARE implémente un modèle de collection de données, que nous proposons, visant représenter le contenu structurel des collections des données et les métadonnées statistiques. Le modèle de collection de données est organisé sous le concept de vue et celle-ci est une structure de données qui pourvoit une perspective agrégée du contenu des collections des données et de ses parutions (releases) associées. CURARE pourvoit des outils pour explorer (interroger) des métadonnées et pour extraire des vues en utilisant des méthodes analytiques. Exploiter les données massives requière un nombre considérable de décisions de la part de l'analyste des données pour trouver quelle est la meilleure façon pour stocker, partager et traiter les collections de données afin d'en obtenir le maximum de bénéfice et de connaissances à partir de ces données. Au lieu d'explorer manuellement les collections des données, CURARE fournit de outils intégrés à un environnement pour assister les analystes des données à trouver quelle est la meilleure collection qui peut être utilisée pour accomplir un objectif analytique donné. Nous avons implémenté CURARE et expliqué comment le déployer selon un modèle d'informatique dans les nuages (cloud computing) utilisant des services de science des donnés sur lesquels les services CURARE sont branchés. Nous avons conçu des expériences pour mesurer les coûts de la construction des vues à partir des ensembles des données du Grand Lyon et de Twitter, afin de pourvoir un aperçu de l'intérêt de notre approche et notre environnement de curation de données / The emergence of new platforms for decentralized data creation, such as sensor and mobile platforms and the increasing availability of open data on the Web, is adding to the increase in the number of data sources inside organizations and brings an unprecedented Big Data to be explored. The notion of data curation has emerged to refer to the maintenance of data collections and the preparation and integration of datasets, combining them to perform analytics. Curation tasks include extracting explicit and implicit meta-data; semantic metadata matching and enrichment to add quality to the data. Next generation data management engines should promote techniques with a new philosophy to cope with the deluge of data. They should aid the user in understanding the data collections’ content and provide guidance to explore data. A scientist can stepwise explore into data collections and stop when the content and quality reach a satisfaction point. Our work adopts this philosophy and the main contribution is a data collections’ curation approach and exploration environment named CURARE. CURARE is a service-based system for curating and exploring Big Data. CURARE implements a data collection model that we propose, used for representing their content in terms of structural and statistical meta-data organised under the concept of view. A view is a data structure that provides an aggregated perspective of the content of a data collection and its several associated releases. CURARE provides tools focused on computing and extracting views using data analytics methods and also functions for exploring (querying) meta-data. Exploiting Big Data requires a substantial number of decisions to be performed by data analysts to determine which is the best way to store, share and process data collections to get the maximum benefit and knowledge from them. Instead of manually exploring data collections, CURARE provides tools integrated in an environment for assisting data analysts determining which are the best collections that can be used for achieving an analytics objective. We implemented CURARE and explained how to deploy it on the cloud using data science services on top of which CURARE services are plugged. We have conducted experiments to measure the cost of computing views based on datasets of Grand Lyon and Twitter to provide insight about the interest of our data curation approach and environment Données volumineuses Services cloud Big data Cloud services Data curation 004

Search results