Global ETD Search

31	Development of a geovisual analytics environment using parallel coordinates with applications to tropical cyclone trend analysis Steed, Chad A 13 December 2008 (has links) A global transformation is being fueled by unprecedented growth in the quality, quantity, and number of different parameters in environmental data through the convergence of several technological advances in data collection and modeling. Although these data hold great potential for helping us understand many complex and, in some cases, life-threatening environmental processes, our ability to generate such data is far outpacing our ability to analyze it. In particular, conventional environmental data analysis tools are inadequate for coping with the size and complexity of these data. As a result, users are forced to reduce the problem in order to adapt to the capabilities of the tools. To overcome these limitations, we must complement the power of computational methods with human knowledge, flexible thinking, imagination, and our capacity for insight by developing visual analysis tools that distill information into the actionable criteria needed for enhanced decision support. In light of said challenges, we have integrated automated statistical analysis capabilities with a highly interactive, multivariate visualization interface to produce a promising approach for visual environmental data analysis. By combining advanced interaction techniques such as dynamic axis scaling, conjunctive parallel coordinates, statistical indicators, and aerial perspective shading, we provide an enhanced variant of the classical parallel coordinates plot. Furthermore, the system facilitates statistical processes such as stepwise linear regression and correlation analysis to assist in the identification and quantification of the most significant predictors for a particular dependent variable. These capabilities are combined into a unique geovisual analytics system that is demonstrated via a pedagogical case study and three North Atlantic tropical cyclone climate studies using a systematic workflow. In addition to revealing several significant associations between environmental observations and tropical cyclone activity, this research corroborates the notion that enhanced parallel coordinates coupled with statistical analysis can be used for more effective knowledge discovery and confirmation in complex, real-world data sets. climate study tropical cyclone hurricane parallel coordinates geovisual analytics stepwise regression correlation analysis statistical analysis exploratory data analysis geovisualization visual interaction techniques
32	A mixed method approach to exploring and characterizing ionic chemistry in the surface waters of the glacierized upper Santa River watershed, Ancash, Peru Eddy, Alex Michelle 17 July 2012 (has links) No description available. Climate Change Geochemistry Geographic Information Science Geography Geological Geology Inorganic Chemistry Latin American Studies glacier loss climate change peru water hydrologic chemistry hydrochemistry Andes weathering exploratory data analysis
33	商業智慧系統之實作於區域治理創新的應用─以宜蘭縣政府為例 / The Development of Business Intelligence System for Regional Governance – A Case Study of Yi-Lan County 許乃嘉, Hsu, Nai Chia Unknown Date (has links) 在有限的資源之下，各區域地方政府相當渴望跳脫僵化與官僚的決策模式，尋求創新有效率的治理機制，另一方面開放政府資料已為國際化的趨勢，台灣於開放資料領域耕耘成果亦相當豐碩。本研究希望建置商業智慧平台，將開放資料轉換為無形的「智慧資本」，持續驅動創新有效率的「治理機制」，進而改善在地人民的生活品質。本論文研究實作一網頁為基礎的商業智慧分析平台，工具包括資料包絡分析法、競爭者分析，透過探索式資料分析，使用者彈性操作指標與決策參數，反覆進行資料探索分析，進而了解（一）地方之競爭縣市與區域特色（二）各縣市相對治理績效（三）單一縣市之優勢產業。並藉由宜蘭縣的文創、觀光、環境此三個產業面向的資料為例說明。本論文聚焦於使用前端框架技術—AngularJS之系統實作，藉由資料視覺化設計、提升使用者經驗，建置高擴充性的資料探勘分析的平台，更可滿足使用者一次購足的統計資料查詢環境。 / Facing the challenges of limited resources and budget constraints, regional governments have been actively pursuing strategies to transform conventional bureaucratic decision-making model into innovative and efficient governance mechanism. At the same time, “open government data” is becoming a political commitment for many countries and Taiwanese government has made significant advances in this respect recently. To leverage the trend for open public data, this thesis aims to develop a web-based business intelligence system to support efficient governance through in-depth analysis of intellectual capital. The tools provided in this system include data envelopment analysis (DEA), competitor identification, and exploratory data analysis. The system is designed to allow average users to experiment with different parameter settings and view the results interactively. Insights on competing counties and regional characteristics, relative governance efficiency and leading industry can be gained with ease. We illustrate the functionalities of the system using data from Yi-Lan County and investigate its competitiveness in three areas, namely, culture and creative industry, tourism, and environmental industry. AngularJS, a front-end framework, is utilized to implement the proposed business intelligence system. The objective is to provide a one stop shopping service for interactive data analysis and visualization with user friendly design and good extensibility. 探索式資料分析商業智慧 AngularJS 區域智慧資本區域競爭力分析 exploratory data analysis business intelligence AngularJS regional intellectual capital regional competitiveness analysis
34	A comparative study between algorithms for time series forecasting on customer prediction : An investigation into the performance of ARIMA, RNN, LSTM, TCN and HMM Almqvist, Olof January 2019 (has links) Time series prediction is one of the main areas of statistics and machine learning. In 2018 the two new algorithms higher order hidden Markov model and temporal convolutional network were proposed and emerged as challengers to the more traditional recurrent neural network and long-short term memory network as well as the autoregressive integrated moving average (ARIMA). In this study most major algorithms together with recent innovations for time series forecasting is trained and evaluated on two datasets from the theme park industry with the aim of predicting future number of visitors. To develop models, Python libraries Keras and Statsmodels were used. Results from this thesis show that the neural network models are slightly better than ARIMA and the hidden Markov model, and that the temporal convolutional network do not perform significantly better than the recurrent or long-short term memory networks although having the lowest prediction error on one of the datasets. Interestingly, the Markov model performed worse than all neural network models even when using no independent variables. machine learning deep learning time series forecasting time series regression data science prediction crisp-dm keras markov model neural network exploratory data analysis maskininlärning djupinlärning tidsserieprediktion tidsserieprognos neurala nätverk markovmodell explorativ dataanalys dataanalys Engineering and Technology Teknik och teknologier
35	Ordenação evolutiva de anúncios em publicidade computacional / Evolutionary ad ranking for computational advertising Broinizi, Marcos Eduardo Bolelli 15 June 2015 (has links) Otimizar simultaneamente os interesses dos usuários, anunciantes e publicadores é um grande desafio na área de publicidade computacional. Mais precisamente, a ordenação de anúncios, ou ad ranking, desempenha um papel central nesse desafio. Por outro lado, nem mesmo as melhores fórmulas ou algoritmos de ordenação são capazes de manter seu status por um longo tempo em um ambiente que está em constante mudança. Neste trabalho, apresentamos uma análise orientada a dados que mostra a importância de combinar diferentes dimensões de publicidade computacional por meio de uma abordagem evolutiva para ordenação de anúncios afim de responder a mudanças de forma mais eficaz. Nós avaliamos as dimensões de valor comercial, desempenho histórico de cliques, interesses dos usuários e a similaridade textual entre o anúncio e a página. Nessa avaliação, nós averiguamos o desempenho e a correlação das diferentes dimensões. Como consequência, nós desenvolvemos uma abordagem evolucionária para combinar essas dimensões. Essa abordagem é composta por três partes: um repositório de configurações para facilitar a implantação e avaliação de experimentos de ordenação; um componente evolucionário de avaliação orientado a dados; e um motor de programação genética para evoluir fórmulas de ordenação de anúncios. Nossa abordagem foi implementada com sucesso em um sistema real de publicidade computacional responsável por processar mais de quatorze bilhões de requisições de anúncio por mês. De acordo com nossos resultados, essas dimensões se complementam e nenhuma delas deve ser neglicenciada. Além disso, nós mostramos que a combinação evolucionária dessas dimensões não só é capaz de superar cada uma individualmente, como também conseguiu alcançar melhores resultados do que métodos estáticos de ordenação de anúncios. / Simultaneous optimization of users, advertisers and publishers\' interests has been a formidable challenge in online advertising. More concretely, ranking of advertising, or more simply ad ranking, has a central role in this challenge. However, even the best ranking formula or algorithm cannot withstand the ever-changing environment of online advertising for a long time. In this work, we present a data-driven analysis that shows the importance of combining different aspects of online advertising through an evolutionary approach for ad ranking in order to effectively respond to changes. We evaluated aspects ranging from bid values and previous click performance to user behavior and interests, including the textual similarity between ad and page. In this evaluation, we assessed commercial performance along with the correlation between different aspects. Therefore, we proposed an evolutionary approach for combining these aspects. This approach was composed of three parts: a configuration repository to facilitate deployment and evaluation of ranking experiments; an evolutionary data-based evaluation component; and a genetic programming engine to evolve ad ranking formulae. Our approach was successfully implemented in a real online advertising system that processes more than fourteen billion ad requests per month. According to our results, these aspects complement each other and none of them should be neglected. Moreover, we showed that the evolutionary combination of these aspects not only outperformed each of them individually, but was also able to achieve better overall results than static ad ranking methods. Análise de componentes principais Análise exploratória de dados Computational advertising Contextual advertising Exploratory data analysis Genetic programming Learning to advertising Online advertising Principal component analysis Programação genética Publicidade computacional Publicidade contextualizada Publicidade digital Publicidade online
36	Zpracování asociačních pravidel metodou vícekriteriálního shlukování / Post-processing of association rules by multicriterial clustering method Kejkula, Martin January 2002 (has links) Association rules mining is one of several ways of knowledge discovery in databases. Paradoxically, data mining itself can produce such great amounts of association rules that there is a new knowledge management problem: there can easily be thousands or even more association rules holding in a data set. The goal of this work is to design a new method for association rules post-processing. The method should be software and domain independent. The output of the new method should be structured description of the whole set of discovered association rules. The output should help user to work with discovered rules. The path to reach the goal I used is: to split association rules into clusters. Each cluster should contain rules, which are more similar each other than to rules from another cluster. The output of the method is such cluster definition and description. The main contribution of this Ph.D. thesis is the described new Multicriterial clustering association rules method. Secondary contribution is the discussion of already published association rules post-processing methods. The output of the introduced new method are clusters of rules, which cannot be reached by any of former post-processing methods. According user expectations clusters are more relevant and more effective than any former association rules clustering results. The method is based on two orthogonal clustering of the same set of association rules. One clustering is based on interestingness measures (confidence, support, interest, etc.). Second clustering is inspired by document clustering in information retrieval. The representation of rules in vectors like documents is fontal in this thesis. The thesis is organized as follows. Chapter 2 identify the role of association rules in the KDD (knowledge discovery in databases) process, using KDD methodologies (CRISP-DM, SEMMA, GUHA, RAMSYS). Chapter 3 define association rule and introduce characteristics of association rules (including interestingness measuress). Chapter 4 introduce current association rules post-processing methods. Chapter 5 is the introduction to cluster analysis. Chapter 6 is the description of the new Multicriterial clustering association rules method. Chapter 7 consists of several experiments. Chapter 8 discuss possibilities of usage and development of the new method.
37	Análise exploratória de dados: uma abordagem com alunos do ensino médio Vieira, Márcia 10 November 2008 (has links) Made available in DSpace on 2016-04-27T16:58:47Z (GMT). No. of bitstreams: 1 Marcia Vieira.pdf: 3313448 bytes, checksum: d9dfb52d32be90e312d15a3cc127b227 (MD5) Previous issue date: 2008-11-10 / Secretaria da Educação do Estado de São Paulo / The object of the present paper is to study the interactions between the student and the environment of dynamic statistics, which, in this paper, will be the software Fathom, according to the approach of the Exploratory Data Analysis. We have discussed which concepts and procedures are necessary, aiming to the construction of critical analysis of a group of data, favored by the dynamism of the computational environment, which will be a tool to facilitate the mobilization of different types of registers of semiotic representations of this set. As theoretical reference, we have considered the levels proposed by Curcio (1989, 2001) to analyze the graphic comprehension mobilized by students in a situation of problem solving proposed in statistical context, and in the theory of Register of Semiotic Representation, by Duval (1994). We have tried, this way, to establish an understanding of this theory, largely used in researches in the Mathematics Education area relatively to geometric and algebraic concepts, this time to a representation of the statistics concept. We have especially tried to study the kinds of understanding of a picture, in this case, the kinds of understanding of statistical graphics or tables. In order to do so, we have elaborated a didactics sequence of activities developed with the use of the software, with bases on the Didactics Engineering (ARTIGUE, 1988). Before starting working with the activities of didactic sequence, we have realized a diagnostic test with the students, in which we were able to identify their main difficulties concerning statistical concepts. The development of the didactics sequence has shown that the interaction with the computerized environment and with the groups, in the articulations of different kinds of representation, have contributed to the comprehension of concepts as the arithmetic mean and median, and also with the analysis and interpretation of graphics of columns and dots (Dot-Plot). However, these variables are still insufficient to the comprehension of measures as quarts and the Box-Plot graphic / O presente trabalho tem como objetivo estudar as interações entre aluno e um ambiente de estatística dinâmica, que neste trabalho será o software Fathom, segundo a abordagem da Análise Exploratória de Dados. Discutimos quais os conceitos e quais os procedimentos necessários, visando à construção de uma análise crítica de um conjunto de dados, favorecida pelo dinamismo do ambiente computacional, que será uma ferramenta para facilitar a mobilização de diferentes tipos de registros de representações semióticas deste conjunto. Como referenciais teóricos, consideramos os níveis propostos por Curcio (1989, 2001) para analisar a compreensão gráfica mobilizada pelos alunos em situação de resolução de problemas propostos em contexto estatístico, e na teoria dos Registros de Representação Semiótica, de Duval (1994). Buscamos assim estabelecer uma leitura desta teoria, amplamente utilizada em pesquisas na área da Educação Matemática relativamente a conceitos geométricos e algébricos, dessa vez para a representação dos conceitos estatísticos. Buscamos especialmente estudar os tipos de apreensões de uma figura, no caso, os tipos de apreensões de um gráfico ou tabela estatística. Para tanto, elaboramos uma seqüência didática de atividades desenvolvidas com o uso do software, com base nos pressupostos da Engenharia Didática (ARTIGUE, 1988). Antes de iniciar o trabalho com as atividades da seqüência didática, os alunos realizaram um teste diagnóstico preparado por nós, em que pudemos identificar suas principais dificuldades em relação aos conceitos estatísticos. O desenvolvimento da seqüência didática mostrou que as interações com o ambiente informatizado e com os grupos, nas articulações dos diferentes tipos de representação, contribuíram com a compreensão de conceitos como a média aritmética e a mediana, e também com a análise e interpretação de gráficos de colunas e de pontos (Dot-Plot). No entanto, estas variáveis ainda foram insuficientes na compreensão de medidas como os quartis, e do gráfico Box-Plot Registros de representação semiótica Estatística Dados -- Analise Estatistica -- Estudo e ensino Matematica (Ensino medio) Educacao matematica Matematica -- Estudo e ensino Fathom Exploratory data analysis Register of semiotic representation Statistics
38	O professor de matemática e o trabalho com medidas separatrizes / The mathematics teacher and the work with position measures Canossa, Roberto 02 March 2009 (has links) Made available in DSpace on 2016-04-27T16:58:51Z (GMT). No. of bitstreams: 1 Roberto Canossa.pdf: 1455069 bytes, checksum: 4384e636c4f411f9c2c2a3d9efb4dbb2 (MD5) Previous issue date: 2009-03-02 / Secretaria da Educação do Estado de São Paulo / The study of Statistics became, in 1997, part of the curriculum of basic school, therefore, the preparation of the mathematics teachers became necessary in order to approach this theme, once many of the teachers either did not have this content in his/her initial formation or had it, but in a superficial way. The surveys performed by our group have also shown these results. These results and the great difficulties some mathematics teachers from public schools, especially in Diadema, have in developing, with the students, the Statistics contents and its interpretations is our motivation to the realization of this paper. In order to do so, we intended to answer the following question of research: What are the didactics characteristics to a continued formation to High School teachers, aiming at working with concepts of median and quartiles, so that students will be able to make decisions from the analysis of the realized variation, with the help of Dot-Plot and Box-Plot? To such verification, we have elaborated and applied a diagnose questionnaire (appendix 1), we have realized continued formation workshops from the results of these questionnaires, and, at last, we have observed a class from a volunteer teacher. We could notice that the majority of teachers do not work the concepts of median and quartiles: they limit themselves to the concept of mean, variance and standard deviation, inserted only as mathematics formulas, without giving sense to such concepts; besides, they do not have knowledge of the graphs Dot-Plot and Box-Plot. The workshop allowed an advancement concerning reasoning and statistical literacy to the volunteer teacher. However, we could realize that the two sections of workshop were not enough to get to level 5 (integrated process reasoning) of statistical reasoning proposed by Garfield (2002) / O estudo da Estatística passou, em 1997, a fazer parte do currículo da Escola Básica, sendo necessária então a preparação dos professores de matemática para a abordagem desse tema, uma vez que muitos deles não tiveram esse conteúdo em sua formação inicial ou tiveram de forma superficial e tecnicista. As pesquisas realizadas em nosso grupo apontam também para esses resultados. Nossa motivação para a realização deste trabalho deve-se a esses resultados e à grande dificuldade que alguns professores de matemática da rede pública do Estado de São Paulo, especificamente na região de Diadema, têm em desenvolver com seus alunos os conteúdos relativos a Estatística e suas interpretações. Para isso, pretendemos responder a seguinte questão de pesquisa: Quais as características didáticas de uma formação continuada para professores do Ensino Médio, visando o trabalho com conceitos de mediana e quartis, para que os alunos possam tomar decisões a partir da análise da variação percebida, com o auxílio do Dot-Plot e do Box-Plot? Para tal verificação, elaboramos e aplicamos um questionário diagnóstico (apêndice 1), realizamos oficinas de formação continuada a partir dos resultados desse questionário e, por fim, observamos uma aula com a professora colaboradora. O que pudemos notar é que a maioria dos professores não trabalha os conceitos de mediana e quartis: limitam-se aos conceitos de média, variância e desvio-padrão, inseridos apenas com fórmulas matemáticas, sem dar sentido para tais conceitos; além disso, não têm conhecimento dos gráficos Dot-Plot e Box-Plot. A oficina permitiu um avanço no nível de raciocínio e alfabetização estatística da professora colaboradora, mas podemos perceber também que as duas sessões de oficinas realizadas não foram suficientes para chegar ao nível 5 (processos de raciocínio integrados) de raciocínio estatístico proposto por Garfield (2002) Análise exploratória de dados Variabilidade Mediana e quartis Raciocínio estatístico Exploratory data analysis Variability Median and quartiles Statistical reasoning
39	Поређење скупова података помоћу графова / Poređenje skupova podataka pomoću grafova / Comparing Data Sets Using Graphs Ivančević Vladimir 02 March 2017 (has links) <p>За потребе поређења скупова података осмишљен је приступ поређењу<br />који се заснива на употреби графова. У овом приступу развијене су две<br />врсте графовских представа: представе вредности које описују скуп<br />података и представе разлика које описују разлике између две<br />представе вредности. У испитивањима приступа над синтетичким и<br />реалним скуповима података, показано је да је кроз визуално<br />истраживање представа разлика и примену помоћних поступака обраде<br />могуће уочити корисне обрасце који приказују разлике између представа<br />вредности, а посредно и између скупова података описаних путем ових<br />представа вредности.</p> / <p>Za potrebe poređenja skupova podataka osmišljen je pristup poređenju<br />koji se zasniva na upotrebi grafova. U ovom pristupu razvijene su dve<br />vrste grafovskih predstava: predstave vrednosti koje opisuju skup<br />podataka i predstave razlika koje opisuju razlike između dve<br />predstave vrednosti. U ispitivanjima pristupa nad sintetičkim i<br />realnim skupovima podataka, pokazano je da je kroz vizualno<br />istraživanje predstava razlika i primenu pomoćnih postupaka obrade<br />moguće uočiti korisne obrasce koji prikazuju razlike između predstava<br />vrednosti, a posredno i između skupova podataka opisanih putem ovih<br />predstava vrednosti.</p> / <p>In order to support data set comparison, a graph-based approach to<br />comparison was devised. In this approach, two types of graph-based<br />representations were introduced: value representations that represent a data<br />set and difference representations that represent differences between two<br />value representations. The results of approach evaluations on synthetic and<br />real data sets indicate that, by visually exploring difference representations<br />and applying auxiliary procedures, it is possible to discover useful patterns<br />which describe differences between two value representations and,<br />consequently, differences between the data sets corresponding to the value<br />representations.</p>
40	Long Term Forecasting of Industrial Electricity Consumption Data With GRU, LSTM and Multiple Linear Regression Buzatoiu, Roxana January 2020 (has links) Accurate long-term energy consumption forecasting of industrial entities is of interest to distribution companies as it can potentially help reduce their churn and offer support in decision making when hedging. This thesis work presents different methods to forecast the energy consumption for industrial entities over a long time prediction horizon of 1 year. Notably, it includes experimentations with two variants of the Recurrent Neural Networks, namely Gated Recurrent Unit (GRU) and Long-Short-Term-Memory (LSTM). Their performance is compared against traditional approaches namely Multiple Linear Regression (MLR) and Seasonal Autoregressive Integrated Moving Average (SARIMA). Further on, the investigation focuses on tailoring the Recurrent Neural Network model to improve the performance. The experiments focus on the impact of different model architectures. Secondly, it focuses on testing the effect of time-related feature selection as an additional input to the Recurrent Neural Network (RNN) networks. Specifically, it explored how traditional methods such as Exploratory Data Analysis, Autocorrelation, and Partial Autocorrelation Functions Plots can contribute to the performance of RNN model. The current work shows through an empirical study on three industrial datasets that GRU architecture is a powerful method for the long-term forecasting task which outperforms LSTM on certain scenarios. In comparison to the MLR model, the RNN achieved a reduction in the RMSE between 5% up to to 10%. The most important findings include: (i) GRU architecture outperforms LSTM on industrial energy consumption datasets when compared against a lower number of hidden units. Also, GRU outperforms LSTM on certain datasets, regardless of the choice units number; (ii) RNN variants yield a better accuracy than statistical or regression models; (iii) using ACF and PACF as dicovery tools in the feature selection process is unconclusive and unefficient when aiming for a general model; (iv) using deterministic features (such as day of the year, day of the month) has limited effects on improving the deep learning model’s performance. / Noggranna långsiktiga energiprognosprognoser för industriella enheter är av intresse för distributionsföretag eftersom det potentiellt kan bidra till att minska deras churn och erbjuda stöd i beslutsfattandet vid säkring. Detta avhandlingsarbete presenterar olika metoder för att prognostisera energiförbrukningen för industriella enheter under en lång tids förutsägelsehorisont på 1 år. I synnerhet inkluderar det experiment med två varianter av de återkommande neurala nätverken, nämligen GRU och LSTM. Deras prestanda jämförs med traditionella metoder, nämligen MLR och SARIMA. Vidare fokuserar undersökningen på att skräddarsy modellen för återkommande neurala nätverk för att förbättra prestanda. Experimenten fokuserar på effekterna av olika modellarkitekturer. För det andra fokuserar den på att testa effekten av tidsrelaterat funktionsval som en extra ingång till RNN -nätverk. Specifikt undersökte den hur traditionella metoder som Exploratory Data Analysis, Autocorrelation och Partial Autocorrelation Funtions Plots kan bidra till prestanda för RNN -modellen. Det aktuella arbetet visar genom en empirisk studie av tre industriella datamängder att GRU -arkitektur är en kraftfull metod för den långsiktiga prognosuppgiften som överträffar ac LSTM på vissa scenarier. Jämfört med MLR -modellen uppnådde RNN en minskning av RMSE mellan 5 % upp till 10 %. De viktigaste resultaten inkluderar: (i) GRU -arkitekturen överträffar LSTM på datauppsättningar för industriell energiförbrukning jämfört med ett lägre antal dolda enheter. GRU överträffar också LSTM på vissa datauppsättningar, oavsett antalet valenheter; (ii) RNN -varianter ger bättre noggrannhet än statistiska modeller eller regressionsmodeller; (iii) att använda ACF och PACF som verktyg för upptäckt i funktionsvalsprocessen är otydligt och ineffektivt när man siktar på en allmän modell; (iv) att använda deterministiska funktioner (t.ex. årets dag, månadsdagen) har begränsade effekter på att förbättra djupinlärningsmodellens prestanda. Time Series Analysis Recurrent Neural Networks long-term Forecasting Exploratory Data Analysis Multiple Linear Regression ACF PACF Energy Sector Tidsserieanalys återkommande neurala nätverk långtidsprognoser undersökande dataanalys multipel linjär regression ACF PACF energisektor Computer and Information Sciences Data- och informationsvetenskap

Search results