Spelling suggestions: "subject:"exploratory data 2analysis"" "subject:"exploratory data 3analysis""
31 |
Uncertainty visualization of ensemble simulationsSanyal, Jibonananda 09 December 2011 (has links)
Ensemble simulation is a commonly used technique in operational forecasting of weather and floods. Multi-member ensemble output is usually large, multivariate, and challenging to interpret interactively. Forecast meteorologists and hydrologists are interested in understanding the uncertainties associated with the simulation; specifically variability between the ensemble members. The visualization of ensemble members is currently accomplished through spaghetti plots or hydrographs. To improve visualization techniques and tools for forecasters, we conducted a userstudy to evaluate the effectiveness of existing uncertainty visualization techniques on 1D and 2D synthetic datasets. We designed an uncertainty evaluation framework to enable easier design of such studies for scientific visualization. The techniques evaluated are errorbars, scaled size of glyphs, color-mapping on glyphs, and color-mapping of uncertainty on the data surface. Although we did not find a consistent order among the four techniques for all tasks, we found that the efficiency of techniques used highly depended on the tasks being performed. Errorbars consistently underperformed throughout the experiment. Scaling the size of glyphs and color-mapping of the surface performed reasonably well. With results from the user-study, we iteratively developed a tool named ‘Noodles’ to interactively explore the ensemble uncertainty in weather simulations. Uncertainty was quantified using standard deviation, inter-quartile range, width of the 95% confidence interval, and by bootstrapping the data. A coordinated view of ribbon and glyph-based uncertainty visualization, spaghetti plots, and data transect plots was provided to two meteorologists for expert evaluation. They found it useful in assessing uncertainty in the data, especially in finding outliers and avoiding the parametrizations leading to these outliers. Additionally, they could identify spatial regions with high uncertainty thereby determining poorly simulated storm environments and deriving physical interpretation of these model issues. We also describe uncertainty visualization capabilities developed for a tool named ‘FloodViz’ for visualization and analysis of flood simulation ensembles. Simple member and trend plots and composited inundation maps with uncertainty are described along with different types of glyph based uncertainty representations. We also provide feedback from a hydrologist using various features of the tool from an operational perspective.
|
32 |
Score-Based Approaches to Heterogeneity in Psychological ModelsArnold, Manuel 30 May 2022 (has links)
Statistische Modelle menschlicher Kognition und Verhaltens stützen sich häufig auf aggregierte Daten und vernachlässigen dadurch oft Heterogenität in Form von Unterschieden zwischen Personen oder Gruppen. Die Nichtberücksichtigung vorliegender Heterogenität kann zu verzerrten Parameterschätzungen und zu falsch positiven oder falsch negativen Tests führen. Häufig kann Heterogenität mithilfe von Kovariaten erkannt und vorhergesagt werden. Allerdings erweist sich die Identifizierung von Prädiktoren von Heterogenität oft als schwierige Aufgabe. Zur Lösung dieses Problems schlage ich zwei neue Ansätze vor, um individuelle und gruppenspezifische Unterschiede mithilfe von Kovariaten vorherzusagen.
Die vorliegende kumulative Dissertation setzt sich aus drei Projekten zusammen. Projekt 1 widmet sich dem Verfahren IPC-Regression (Individual Parameter Contribution), welches die Exploration von Parameterheterogenität in Strukturgleichungsmodellen (SEM) mittels Kovariaten erlaubt. Unter anderem evaluiere ich IPC-Regression für dynamische Panel-Modelle, schlage eine alternative Schätzmethode vor und leite IPCs für allgemeine Maximum-Likelihood-Schätzer her. Projekt 2 veranschaulicht, wie IPC-Regression in der Praxis eingesetzt werden kann. Dazu führe ich schrittweise in die Implementierung von IPC-Regression im ipcr-Paket für die statistische Programmiersprache R ein. Schließlich werden in Projekt 3 SEM-Trees weiterentwickelt. SEM-Trees sind eine modellbasierte rekursive Partitionierungsmethode zur Identifizierung von Kovariaten, die Gruppenunterschiede in SEM-Parametern vorhersagen. Die bisher verwendeten SEM-Trees sind sehr rechenaufwendig. In Projekt 3 kombiniere ich SEM-Trees mit unterschiedlichen Score-basierten Tests. Die daraus resultierenden Score-Guided-SEM-Tees lassen sich deutlich schneller als herkömmlichen SEM-Trees berechnen und zeigen bessere statistische Eigenschaften. / Statistical models of human cognition and behavior often rely on aggregated data and may fail to consider heterogeneity, that is, differences across individuals or groups. If overlooked, heterogeneity can bias parameter estimates and may lead to false-positive or false-negative findings. Often, heterogeneity can be detected and predicted with the help of covariates. However, identifying predictors of heterogeneity can be a challenging task. To solve this issue, I propose two novel approaches for detecting and predicting individual and group differences with covariates.
This cumulative dissertation is composed of three projects. Project 1 advances the individual parameter contribution (IPC) regression framework, which allows studying heterogeneity in structural equation model (SEM) parameters by means of covariates. I evaluate the use of IPC regression for dynamic panel models, propose an alternative estimation technique, and derive IPCs for general maximum likelihood estimators. Project 2 illustrates how IPC regression can be used in practice. To this end, I provide a step-by-step introduction to the IPC regression implementation in the ipcr package for the R system for statistical computing. Finally, Project 3 progresses the SEM tree framework. SEM trees are a model-based recursive partitioning method for finding covariates that predict group differences in SEM parameters. Unfortunately, the original SEM tree implementation is computationally demanding. As a solution to this problem, I combine SEM trees with a family of score-based tests. The resulting score-guided SEM trees compute quickly, solving the runtime issues of the original SEM trees, and show favorable statistical properties.
|
33 |
Development of Visual Tools for Analyzing Ensemble Error and UncertaintyAnreddy, Sujan Ranjan Reddy 04 May 2018 (has links)
Climate analysts use Coupled Model Intercomparison Project Phase 5 (CMIP5) simulations to make sense of models performance in predicting extreme events such as heavy precipitation. Similarly, weather analysts use numerical weather prediction models (NWP) to simulate weather conditions either by perturbing initial conditions or by changing multiple input parameterization schemes, e.g., cumulus and microphysics schemes. These simulations are used in operational weather forecasting and for studying the role of parameterization schemes in synoptic weather events like storms. This work addresses the need for visualizing the differences in both CMIP5 and NWP model output. This work proposes three glyph designs used for communicating CMIP5 model error. It also describes Ensemble Visual eXplorer tool that provides multiple ways of visualizing NWP model output and the related input parameter space. The proposed interactive dendrogram provides an effective way to relate multiple input parameterization schemes with spatial characteristics of model uncertainty features. The glyphs that were designed to communicate CMIP5 model error are extended to encode both parameterization schemes and graduated uncertainty, to provide related insights at specific locations such as storm center and the areas surrounding it. The work analyzes different ways of using glyphs to represent parametric uncertainty using visual variables such as color and size, in conjunction with Gestalt visual properties. It demonstrates the use of visual analytics in resolving some of the issues such as visual scalability. As part of this dissertation, we evaluated three glyph designs using average precipitation rate predicted by CMIP5 simulations, and Ensemble Visual eXplorer tool using WRF 1999 March 4th, North American storm track dataset.
|
34 |
Development of a geovisual analytics environment using parallel coordinates with applications to tropical cyclone trend analysisSteed, Chad A 13 December 2008 (has links)
A global transformation is being fueled by unprecedented growth in the quality, quantity, and number of different parameters in environmental data through the convergence of several technological advances in data collection and modeling. Although these data hold great potential for helping us understand many complex and, in some cases, life-threatening environmental processes, our ability to generate such data is far outpacing our ability to analyze it. In particular, conventional environmental data analysis tools are inadequate for coping with the size and complexity of these data. As a result, users are forced to reduce the problem in order to adapt to the capabilities of the tools. To overcome these limitations, we must complement the power of computational methods with human knowledge, flexible thinking, imagination, and our capacity for insight by developing visual analysis tools that distill information into the actionable criteria needed for enhanced decision support. In light of said challenges, we have integrated automated statistical analysis capabilities with a highly interactive, multivariate visualization interface to produce a promising approach for visual environmental data analysis. By combining advanced interaction techniques such as dynamic axis scaling, conjunctive parallel coordinates, statistical indicators, and aerial perspective shading, we provide an enhanced variant of the classical parallel coordinates plot. Furthermore, the system facilitates statistical processes such as stepwise linear regression and correlation analysis to assist in the identification and quantification of the most significant predictors for a particular dependent variable. These capabilities are combined into a unique geovisual analytics system that is demonstrated via a pedagogical case study and three North Atlantic tropical cyclone climate studies using a systematic workflow. In addition to revealing several significant associations between environmental observations and tropical cyclone activity, this research corroborates the notion that enhanced parallel coordinates coupled with statistical analysis can be used for more effective knowledge discovery and confirmation in complex, real-world data sets.
|
35 |
A mixed method approach to exploring and characterizing ionic chemistry in the surface waters of the glacierized upper Santa River watershed, Ancash, PeruEddy, Alex Michelle 17 July 2012 (has links)
No description available.
|
36 |
商業智慧系統之實作於區域治理創新的應用─以宜蘭縣政府為例 / The Development of Business Intelligence System for Regional Governance – A Case Study of Yi-Lan County許乃嘉, Hsu, Nai Chia Unknown Date (has links)
在有限的資源之下,各區域地方政府相當渴望跳脫僵化與官僚的決策模式,尋求創新有效率的治理機制,另一方面開放政府資料已為國際化的趨勢,台灣於開放資料領域耕耘成果亦相當豐碩。本研究希望建置商業智慧平台,將開放資料轉換為無形的「智慧資本」,持續驅動創新有效率的「治理機制」,進而改善在地人民的生活品質。
本論文研究實作一網頁為基礎的商業智慧分析平台,工具包括資料包絡分析法、競爭者分析,透過探索式資料分析,使用者彈性操作指標與決策參數,反覆進行資料探索分析,進而了解(一)地方之競爭縣市與區域特色(二)各縣市相對治理績效(三)單一縣市之優勢產業。並藉由宜蘭縣的文創、觀光、環境此三個產業面向的資料為例說明。
本論文聚焦於使用前端框架技術—AngularJS之系統實作,藉由資料視覺化設計、提升使用者經驗,建置高擴充性的資料探勘分析的平台,更可滿足使用者一次購足的統計資料查詢環境。 / Facing the challenges of limited resources and budget constraints, regional governments have been actively pursuing strategies to transform conventional bureaucratic decision-making model into innovative and efficient governance mechanism. At the same time, “open government data” is becoming a political commitment for many countries and Taiwanese government has made significant advances in this respect recently. To leverage the trend for open public data, this thesis aims to develop a web-based business intelligence system to support efficient governance through in-depth analysis of intellectual capital.
The tools provided in this system include data envelopment analysis (DEA), competitor identification, and exploratory data analysis. The system is designed to allow average users to experiment with different parameter settings and view the results interactively. Insights on competing counties and regional characteristics, relative governance efficiency and leading industry can be gained with ease. We illustrate the functionalities of the system using data from Yi-Lan County and investigate its competitiveness in three areas, namely, culture and creative industry, tourism, and environmental industry.
AngularJS, a front-end framework, is utilized to implement the proposed business intelligence system. The objective is to provide a one stop shopping service for interactive data analysis and visualization with user friendly design and good extensibility.
|
37 |
A comparative study between algorithms for time series forecasting on customer prediction : An investigation into the performance of ARIMA, RNN, LSTM, TCN and HMMAlmqvist, Olof January 2019 (has links)
Time series prediction is one of the main areas of statistics and machine learning. In 2018 the two new algorithms higher order hidden Markov model and temporal convolutional network were proposed and emerged as challengers to the more traditional recurrent neural network and long-short term memory network as well as the autoregressive integrated moving average (ARIMA). In this study most major algorithms together with recent innovations for time series forecasting is trained and evaluated on two datasets from the theme park industry with the aim of predicting future number of visitors. To develop models, Python libraries Keras and Statsmodels were used. Results from this thesis show that the neural network models are slightly better than ARIMA and the hidden Markov model, and that the temporal convolutional network do not perform significantly better than the recurrent or long-short term memory networks although having the lowest prediction error on one of the datasets. Interestingly, the Markov model performed worse than all neural network models even when using no independent variables.
|
38 |
Ordenação evolutiva de anúncios em publicidade computacional / Evolutionary ad ranking for computational advertisingBroinizi, Marcos Eduardo Bolelli 15 June 2015 (has links)
Otimizar simultaneamente os interesses dos usuários, anunciantes e publicadores é um grande desafio na área de publicidade computacional. Mais precisamente, a ordenação de anúncios, ou ad ranking, desempenha um papel central nesse desafio. Por outro lado, nem mesmo as melhores fórmulas ou algoritmos de ordenação são capazes de manter seu status por um longo tempo em um ambiente que está em constante mudança. Neste trabalho, apresentamos uma análise orientada a dados que mostra a importância de combinar diferentes dimensões de publicidade computacional por meio de uma abordagem evolutiva para ordenação de anúncios afim de responder a mudanças de forma mais eficaz. Nós avaliamos as dimensões de valor comercial, desempenho histórico de cliques, interesses dos usuários e a similaridade textual entre o anúncio e a página. Nessa avaliação, nós averiguamos o desempenho e a correlação das diferentes dimensões. Como consequência, nós desenvolvemos uma abordagem evolucionária para combinar essas dimensões. Essa abordagem é composta por três partes: um repositório de configurações para facilitar a implantação e avaliação de experimentos de ordenação; um componente evolucionário de avaliação orientado a dados; e um motor de programação genética para evoluir fórmulas de ordenação de anúncios. Nossa abordagem foi implementada com sucesso em um sistema real de publicidade computacional responsável por processar mais de quatorze bilhões de requisições de anúncio por mês. De acordo com nossos resultados, essas dimensões se complementam e nenhuma delas deve ser neglicenciada. Além disso, nós mostramos que a combinação evolucionária dessas dimensões não só é capaz de superar cada uma individualmente, como também conseguiu alcançar melhores resultados do que métodos estáticos de ordenação de anúncios. / Simultaneous optimization of users, advertisers and publishers\' interests has been a formidable challenge in online advertising. More concretely, ranking of advertising, or more simply ad ranking, has a central role in this challenge. However, even the best ranking formula or algorithm cannot withstand the ever-changing environment of online advertising for a long time. In this work, we present a data-driven analysis that shows the importance of combining different aspects of online advertising through an evolutionary approach for ad ranking in order to effectively respond to changes. We evaluated aspects ranging from bid values and previous click performance to user behavior and interests, including the textual similarity between ad and page. In this evaluation, we assessed commercial performance along with the correlation between different aspects. Therefore, we proposed an evolutionary approach for combining these aspects. This approach was composed of three parts: a configuration repository to facilitate deployment and evaluation of ranking experiments; an evolutionary data-based evaluation component; and a genetic programming engine to evolve ad ranking formulae. Our approach was successfully implemented in a real online advertising system that processes more than fourteen billion ad requests per month. According to our results, these aspects complement each other and none of them should be neglected. Moreover, we showed that the evolutionary combination of these aspects not only outperformed each of them individually, but was also able to achieve better overall results than static ad ranking methods.
|
39 |
Zpracování asociačních pravidel metodou vícekriteriálního shlukování / Post-processing of association rules by multicriterial clustering methodKejkula, Martin January 2002 (has links)
Association rules mining is one of several ways of knowledge discovery in databases. Paradoxically, data mining itself can produce such great amounts of association rules that there is a new knowledge management problem: there can easily be thousands or even more association rules holding in a data set. The goal of this work is to design a new method for association rules post-processing. The method should be software and domain independent. The output of the new method should be structured description of the whole set of discovered association rules. The output should help user to work with discovered rules. The path to reach the goal I used is: to split association rules into clusters. Each cluster should contain rules, which are more similar each other than to rules from another cluster. The output of the method is such cluster definition and description. The main contribution of this Ph.D. thesis is the described new Multicriterial clustering association rules method. Secondary contribution is the discussion of already published association rules post-processing methods. The output of the introduced new method are clusters of rules, which cannot be reached by any of former post-processing methods. According user expectations clusters are more relevant and more effective than any former association rules clustering results. The method is based on two orthogonal clustering of the same set of association rules. One clustering is based on interestingness measures (confidence, support, interest, etc.). Second clustering is inspired by document clustering in information retrieval. The representation of rules in vectors like documents is fontal in this thesis. The thesis is organized as follows. Chapter 2 identify the role of association rules in the KDD (knowledge discovery in databases) process, using KDD methodologies (CRISP-DM, SEMMA, GUHA, RAMSYS). Chapter 3 define association rule and introduce characteristics of association rules (including interestingness measuress). Chapter 4 introduce current association rules post-processing methods. Chapter 5 is the introduction to cluster analysis. Chapter 6 is the description of the new Multicriterial clustering association rules method. Chapter 7 consists of several experiments. Chapter 8 discuss possibilities of usage and development of the new method.
|
40 |
Análise exploratória de dados: uma abordagem com alunos do ensino médioVieira, Márcia 10 November 2008 (has links)
Made available in DSpace on 2016-04-27T16:58:47Z (GMT). No. of bitstreams: 1
Marcia Vieira.pdf: 3313448 bytes, checksum: d9dfb52d32be90e312d15a3cc127b227 (MD5)
Previous issue date: 2008-11-10 / Secretaria da Educação do Estado de São Paulo / The object of the present paper is to study the interactions between the student
and the environment of dynamic statistics, which, in this paper, will be the software
Fathom, according to the approach of the Exploratory Data Analysis. We have
discussed which concepts and procedures are necessary, aiming to the
construction of critical analysis of a group of data, favored by the dynamism of the
computational environment, which will be a tool to facilitate the mobilization of
different types of registers of semiotic representations of this set. As theoretical
reference, we have considered the levels proposed by Curcio (1989, 2001) to
analyze the graphic comprehension mobilized by students in a situation of problem
solving proposed in statistical context, and in the theory of Register of Semiotic
Representation, by Duval (1994). We have tried, this way, to establish an
understanding of this theory, largely used in researches in the Mathematics
Education area relatively to geometric and algebraic concepts, this time to a
representation of the statistics concept. We have especially tried to study the kinds
of understanding of a picture, in this case, the kinds of understanding of statistical
graphics or tables. In order to do so, we have elaborated a didactics sequence of
activities developed with the use of the software, with bases on the Didactics
Engineering (ARTIGUE, 1988). Before starting working with the activities of
didactic sequence, we have realized a diagnostic test with the students, in which
we were able to identify their main difficulties concerning statistical concepts. The
development of the didactics sequence has shown that the interaction with the
computerized environment and with the groups, in the articulations of different
kinds of representation, have contributed to the comprehension of concepts as the
arithmetic mean and median, and also with the analysis and interpretation of
graphics of columns and dots (Dot-Plot). However, these variables are still
insufficient to the comprehension of measures as quarts and the Box-Plot graphic / O presente trabalho tem como objetivo estudar as interações entre aluno e um
ambiente de estatística dinâmica, que neste trabalho será o software Fathom,
segundo a abordagem da Análise Exploratória de Dados. Discutimos quais os
conceitos e quais os procedimentos necessários, visando à construção de uma
análise crítica de um conjunto de dados, favorecida pelo dinamismo do ambiente
computacional, que será uma ferramenta para facilitar a mobilização de diferentes
tipos de registros de representações semióticas deste conjunto. Como
referenciais teóricos, consideramos os níveis propostos por Curcio (1989, 2001)
para analisar a compreensão gráfica mobilizada pelos alunos em situação de
resolução de problemas propostos em contexto estatístico, e na teoria dos
Registros de Representação Semiótica, de Duval (1994). Buscamos assim
estabelecer uma leitura desta teoria, amplamente utilizada em pesquisas na área
da Educação Matemática relativamente a conceitos geométricos e algébricos,
dessa vez para a representação dos conceitos estatísticos. Buscamos
especialmente estudar os tipos de apreensões de uma figura, no caso, os tipos de
apreensões de um gráfico ou tabela estatística. Para tanto, elaboramos uma
seqüência didática de atividades desenvolvidas com o uso do software, com base
nos pressupostos da Engenharia Didática (ARTIGUE, 1988). Antes de iniciar o
trabalho com as atividades da seqüência didática, os alunos realizaram um teste
diagnóstico preparado por nós, em que pudemos identificar suas principais
dificuldades em relação aos conceitos estatísticos. O desenvolvimento da
seqüência didática mostrou que as interações com o ambiente informatizado e
com os grupos, nas articulações dos diferentes tipos de representação,
contribuíram com a compreensão de conceitos como a média aritmética e a
mediana, e também com a análise e interpretação de gráficos de colunas e de
pontos (Dot-Plot). No entanto, estas variáveis ainda foram insuficientes na
compreensão de medidas como os quartis, e do gráfico Box-Plot
|
Page generated in 0.0895 seconds