• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 24
  • 7
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 54
  • 25
  • 12
  • 12
  • 9
  • 9
  • 9
  • 9
  • 6
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Uma nova metáfora visual escalável para dados tabulares e sua aplicação na análise de agrupamentos / A scalable visual metaphor for tabular data and its application on clustering analysis

Mosquera, Evinton Antonio Cordoba 19 September 2017 (has links)
A rápida evolução dos recursos computacionais vem permitindo que grandes conjuntos de dados sejam armazenados e recuperados. No entanto, a exploração, compreensão e extração de informação útil ainda são um desafio. Com relação às ferramentas computacionais que visam tratar desse problema, a Visualização de Informação possibilita a análise de conjuntos de dados por meio de representações gráficas e a Mineração de Dados fornece processos automáticos para a descoberta e interpretação de padrões. Apesar da recente popularidade dos métodos de visualização de informação, um problema recorrente é a baixa escalabilidade visual quando se está analisando grandes conjuntos de dados, resultando em perda de contexto e desordem visual. Com intuito de representar grandes conjuntos de dados reduzindo a perda de informação relevante, o processo de agregação visual de dados vem sendo empregado. A agregação diminui a quantidade de dados a serem representados, preservando a distribuição e as tendências do conjunto de dados original. Quanto à mineração de dados, visualização de informação vêm se tornando ferramental essencial na interpretação dos modelos computacionais e resultados gerados, em especial das técnicas não-supervisionados, como as de agrupamento. Isso porque nessas técnicas, a única forma do usuário interagir com o processo de mineração é por meio de parametrização, limitando a inserção de conhecimento de domínio no processo de análise de dados. Nesta dissertação, propomos e desenvolvemos uma metáfora visual baseada na TableLens que emprega abordagens baseadas no conceito de agregação para criar representações mais escaláveis para a interpretação de dados tabulares. Como aplicação, empregamos a metáfora desenvolvida na análise de resultados de técnicas de agrupamento. O ferramental resultante não somente suporta análise de grandes bases de dados com reduzida perda de contexto, mas também fornece subsídios para entender como os atributos dos dados contribuem para a formação de agrupamentos em termos da coesão e separação dos grupos formados. / The rapid evolution of computing resources has enabled large datasets to be stored and retrieved. However, exploring, understanding and extracting useful information is still a challenge. Among the computational tools to address this problem, information visualization techniques enable the data analysis employing the human visual ability by making a graphic representation of the data set, and data mining provides automatic processes for the discovery and interpretation of patterns. Despite the recent popularity of information visualization methods, a recurring problem is the low visual scalability when analyzing large data sets resulting in context loss and visual disorder. To represent large datasets reducing the loss of relevant information, the process of aggregation is being used. Aggregation decreases the amount of data to be represented, preserving the distribution and trends of the original dataset. Regarding data mining, information visualization has become an essential tool in the interpretation of computational models and generated results, especially of unsupervised techniques, such as clustering. This occurs because, in these techniques, the only way the user interacts with the mining process is through parameterization, limiting the insertion of domain knowledge in the process. In this thesis, we propose and develop the new visual metaphor based on the TableLens that employs approaches based on the concept of aggregation to create more scalable representations of tabular data. As application, we use the developed metaphor in the analysis of the results of clustering techniques. The resulting framework does not only support large database analysis but also provides insights into how data attributes contribute to clustering regarding cohesion and separation of the composed groups
32

A construção do pensamento estatístico: organização, representação e interpretação de dados por alunos da 5ª série do Ensino Fundamental

Medici, Michele 17 May 2007 (has links)
Made available in DSpace on 2016-04-27T16:57:53Z (GMT). No. of bitstreams: 1 dissertacao_michele_medici.pdf: 6827447 bytes, checksum: 0b99bfd53a4ccff480f5166c93263315 (MD5) Previous issue date: 2007-05-17 / Either during our every day life or in studies or scientific researches, we evidenced the necessity of exploring tabular and graphic representations. The report from the 4th INAF stands out, among other aspects, that only 23% of the Brazilian population shows some familiarity with this kind of representations. As we believe that the sooner we begin to explore them, the better. The aim of this essay was to conceive a didactic sequence on an experimental focus in order to introduce statistics to 5th graders (nowadays belonging to the 6th year) from Ensino Fundamental (Secondary School). We are not only looking for didactic conditions that would support student s autonomous evolution on solving problems of organization, representation and interpretation of a set of data, as well as the didactic sequence which the teacher would use in order to promote the construction of the statistical thinking. Thus, we investigate the way the students interact with situations proposed by the teachers, the students´ former knowledge, hypothesis formulated by them and the way they deal with constructed knowledge. For such analysis, we made use of the surmises of Didactic Engineering and we concluded that the classes must be fulfilled with collective debates and that the students have to be in small groups and every step must be built by them and they are responsible for their research. We are able to notice that the elements for the construction of the statistical thinking could be gradually composed by the students and the representations were most of the time, badly organized and / or with incorrect or missing information. The debates led to homogeneity of the milieux which turned into the students´ learning. We raised a series of questions to be explored with them during their following school year / No nosso dia-a-dia ou em estudos e pesquisas científicas constatamos a necessidade de explorar as representações tabulares e gráficas. O relatório do 4º INAF ressalta, entre outros aspectos, que apenas 23% da população brasileira demonstra certa familiaridade com essas representações. Por acreditar que devemos começar cedo a explorá-las, o objetivo desta dissertação foi conceber uma seqüência didática, em um enfoque experimental, para introduzir estatística aos alunos da 5ª série (hoje 6º ano) do Ensino Fundamental. Buscamos não apenas as condições didáticas que favoreçam a evolução autônoma do aluno na resolução de problemas de organização, representação e interpretação de um conjunto de dados, mas também a seqüência didática que o professor possa utilizar, visando favorecer a construção do pensamento estatístico. Assim, investigamos a maneira como o aluno interage com as situações propostas pelo professor, os conhecimentos preliminares que os alunos já possuem, as hipóteses elaboradas por eles e a forma como mobilizam os conhecimentos construídos. Para tais análises, utilizamos os pressupostos da Engenharia Didática e concluímos que as aulas devem ser permeadas por debates coletivos e em pequenos grupos de trabalho e que todas as etapas devem ser construídas pelos alunos, responsáveis pela sua pesquisa. Pudemos verificar que os elementos para a construção do pensamento estatístico puderam ser compostos gradativamente pelos alunos e que as representações foram muitas vezes pouco organizadas e / ou com informações inexatas ou faltantes. Os debates proporcionaram uma homogeneização dos milieux, o que pôde levar à aprendizagem dos alunos. Levantamos uma série de questões para serem exploradas com esses alunos no ano seguinte de escolaridade
33

Analysis of Taiwan Stock Exchange high frequency transaction data

Hao Hsu, Chia- 06 July 2012 (has links)
Taiwan Security Market is a typical order-driven market. The electronic trading system of Taiwan Security Market launched in 1998 significantly reduces the trade matching time (the current matching time is around 20 seconds) and promptly provides updated online trading information to traders. In this study, we establish an online transaction simulation system which can be applied to predict trade prices and study market efficiency. Models are established for the times and volumes of the newly added bid/ask orders on the match list. Exponentially weighted moving average (EWMA) method is adopted to update the model parameters. Match prices are predicted dynamically based on the EWMA updated models. Further, high frequency bid/ask order data are used to find the supply and demand curves as well as the equilibrium prices. Differences between the transaction prices and the equilibrium prices are used to investigate the efficiency of Taiwan Security Market. Finally, EWMA and cusum control charts are used to monitor the market efficiency. In empirical study, we analyze the intra-daily (April, 2005) high frequency match data of Uni-president Enterprises Corporation and Formosa Plastics Corporation.
34

L’outillage sur plaquette en quartzite du site ElFs-010. Étude d’une technologie distinctive en Jamésie, Québec (1900-400 A.A.)

Henriet, Jean-Pierre 04 1900 (has links)
Ce projet de recherche tente de mieux comprendre le phénomène des supports sur plaquette en quartzite du site ElFs-010 situé en Jamésie. Aucun travail de cette ampleur n’avait encore été réalisé sur ce type d’outil. Il y avait donc un vide à combler. Afin de répondre le plus adéquatement possible à cette problématique, nous avons divisé notre travail en trois objectifs. Dans un premier temps, déterminer si les plaquettes en quartzite sont le produit d’une technologie lithique ou bien d’un processus géologique naturel. En second lieu, démontrer si nous sommes en présence d’un épiphénomène propre au site ElFs-010. Finalement, définir si une période chronologique correspond à cette industrie. Les résultats de nos recherches nous démontrent que les supports sur plaquette en quartzite du site ElFs-010 se retrouvent naturellement sur le talus d’effondrement de la Colline Blanche. Leur faible épaisseur moyenne ainsi que leurs pans abrupts ont sans doute été les facteurs qui ont le plus influencé leur sélection. En nous basant sur ces deux caractéristiques, nous suggérons qu’ils auraient pu être utilisés comme des lames interchangeables ou bien des burins. Nous avons recensé 33 sites jamésiens qui comportaient au moins un fragment de plaquette en quartzite. Malgré quelques indices archéologiques, il est encore trop tôt pour affirmer que cette industrie est diagnostique d’un groupe culturel jamésien. Les données chronologiques suggèrent que cette industrie a connu un essor vers 1300 ans A.A. De plus, il semble que les régions géographiques que nous avons attribuées aux sites correspondent à des séquences culturelles bien définies. Finalement, nos hypothèses portent sur des recherches futures concernant un ensemble d’événements qui, tout comme les supports sur plaquette en quartzite, sont révélateurs de changements dans le mode de vie des groupes préhistoriques de la Jamésie. Mots-clés : Archéologie, Jamésie, ElFs-010, Colline Blanche, plaquette en quartzite, technologie lithique. / This research project seeks to better understand the phenomenon of tabular quartzite tools from the archeological site Elfs-010. No detailed work had yet been carried out on this type of tool, leaving a void to fill. To respond as adequately as possible to this problem, we focused our work on three main objectives. First, determine if tabular pieces of quartzite were the product of a particular lithic technology or of a natural geological process. Second, evaluate whether we are dealing with a unique phenomenon that is specific to site Elfs-010. Third, and finally, define if a specific time period corresponds to this industry. The results of our research show that tabular pieces of quartzite from site Elfs-010 occur naturally on the talus slope of the Colline Blanche. Their low average thickness and their steep sides were probably the factors that most influenced their selection. Based on these two characteristics, we suggest they could be used interchangeably as blades or burins. We identified 33 Jamesian sites that had at least one fragment of tabular quartzite. Despite some archaeological evidence, it is still too early to say that this industry is diagnostic of a Jamesian cultural group. Our chronological data suggest that this industry flourished around 1300 years BP. In addition, it appears that the geographic areas that we have attributed to the sites correspond to culturally well-defined sequences. Finally, our proposed hypotheses for future research concern the events that took place around 1300 years BP and which, like the tabular pieces of quartzite, are indicative of changes in the lifestyle of prehistoric groups of the James Bay region. Keywords: Archaeology, James Bay, ElFs-010, Colline Blanche, tabular pieces, quartzite, lithic technology. / Réalisé en collaboration avec Arkéos Inc.
35

Plateforme visuelle pour l'intégration de données faiblement structurées et incertaines / A visual platform to integrate poorly structured and unknown data

Da Silva Carvalho, Paulo 19 December 2017 (has links)
Nous entendons beaucoup parler de Big Data, Open Data, Social Data, Scientific Data, etc. L’importance qui est apportée aux données en général est très élevée. L’analyse de ces données est importante si l’objectif est de réussir à en extraire de la valeur pour pouvoir les utiliser. Les travaux présentés dans cette thèse concernent la compréhension, l’évaluation, la correction/modification, la gestion et finalement l’intégration de données, pour permettre leur exploitation. Notre recherche étudie exclusivement les données ouvertes (DOs - Open Data) et plus précisément celles structurées sous format tabulaire (CSV). Le terme Open Data est apparu pour la première fois en 1995. Il a été utilisé par le groupe GCDIS (Global Change Data and Information System) (États-Unis) pour encourager les entités, possédant les mêmes intérêts et préoccupations, à partager leurs données [Data et System, 1995]. Le mouvement des données ouvertes étant récent, il s’agit d’un champ qui est actuellement en grande croissance. Son importance est actuellement très forte. L’encouragement donné par les gouvernements et institutions publiques à ce que leurs données soient publiées a sans doute un rôle important à ce niveau. / We hear a lot about Big Data, Open Data, Social Data, Scientific Data, etc. The importance currently given to data is, in general, very high. We are living in the era of massive data. The analysis of these data is important if the objective is to successfully extract value from it so that they can be used. The work presented in this thesis project is related with the understanding, assessment, correction/modification, management and finally the integration of the data, in order to allow their respective exploitation and reuse. Our research is exclusively focused on Open Data and, more precisely, Open Data organized in tabular form (CSV - being one of the most widely used formats in the Open Data domain). The first time that the term Open Data appeared was in 1995 when the group GCDIS (Global Change Data and Information System) (from United States) used this expression to encourage entities, having the same interests and concerns, to share their data [Data et System, 1995]. However, the Open Data movement has only recently undergone a sharp increase. It has become a popular phenomenon all over the world. Being the Open Data movement recent, it is a field that is currently growing and its importance is very strong. The encouragement given by governments and public institutions to have their data published openly has an important role at this level.
36

Uma nova metáfora visual escalável para dados tabulares e sua aplicação na análise de agrupamentos / A scalable visual metaphor for tabular data and its application on clustering analysis

Evinton Antonio Cordoba Mosquera 19 September 2017 (has links)
A rápida evolução dos recursos computacionais vem permitindo que grandes conjuntos de dados sejam armazenados e recuperados. No entanto, a exploração, compreensão e extração de informação útil ainda são um desafio. Com relação às ferramentas computacionais que visam tratar desse problema, a Visualização de Informação possibilita a análise de conjuntos de dados por meio de representações gráficas e a Mineração de Dados fornece processos automáticos para a descoberta e interpretação de padrões. Apesar da recente popularidade dos métodos de visualização de informação, um problema recorrente é a baixa escalabilidade visual quando se está analisando grandes conjuntos de dados, resultando em perda de contexto e desordem visual. Com intuito de representar grandes conjuntos de dados reduzindo a perda de informação relevante, o processo de agregação visual de dados vem sendo empregado. A agregação diminui a quantidade de dados a serem representados, preservando a distribuição e as tendências do conjunto de dados original. Quanto à mineração de dados, visualização de informação vêm se tornando ferramental essencial na interpretação dos modelos computacionais e resultados gerados, em especial das técnicas não-supervisionados, como as de agrupamento. Isso porque nessas técnicas, a única forma do usuário interagir com o processo de mineração é por meio de parametrização, limitando a inserção de conhecimento de domínio no processo de análise de dados. Nesta dissertação, propomos e desenvolvemos uma metáfora visual baseada na TableLens que emprega abordagens baseadas no conceito de agregação para criar representações mais escaláveis para a interpretação de dados tabulares. Como aplicação, empregamos a metáfora desenvolvida na análise de resultados de técnicas de agrupamento. O ferramental resultante não somente suporta análise de grandes bases de dados com reduzida perda de contexto, mas também fornece subsídios para entender como os atributos dos dados contribuem para a formação de agrupamentos em termos da coesão e separação dos grupos formados. / The rapid evolution of computing resources has enabled large datasets to be stored and retrieved. However, exploring, understanding and extracting useful information is still a challenge. Among the computational tools to address this problem, information visualization techniques enable the data analysis employing the human visual ability by making a graphic representation of the data set, and data mining provides automatic processes for the discovery and interpretation of patterns. Despite the recent popularity of information visualization methods, a recurring problem is the low visual scalability when analyzing large data sets resulting in context loss and visual disorder. To represent large datasets reducing the loss of relevant information, the process of aggregation is being used. Aggregation decreases the amount of data to be represented, preserving the distribution and trends of the original dataset. Regarding data mining, information visualization has become an essential tool in the interpretation of computational models and generated results, especially of unsupervised techniques, such as clustering. This occurs because, in these techniques, the only way the user interacts with the mining process is through parameterization, limiting the insertion of domain knowledge in the process. In this thesis, we propose and develop the new visual metaphor based on the TableLens that employs approaches based on the concept of aggregation to create more scalable representations of tabular data. As application, we use the developed metaphor in the analysis of the results of clustering techniques. The resulting framework does not only support large database analysis but also provides insights into how data attributes contribute to clustering regarding cohesion and separation of the composed groups
37

Syntetisering av tabulär data: En systematisk litteraturstudie om verktyg för att skapa syntetiska dataset

Allergren, Erik, Hildebrand, Clara January 2023 (has links)
De senaste åren har efterfrågan på stora mängder data för att träna maskininläringsalgoritmer ökat. Algoritmerna kan användas för att lösa stora som små samhällsfrågor och utmaningar. Ett sätt att möta efterfrågan är att generera syntetisk data som bibehåller statistiska värden och egenskaper från verklig data. Den syntetiska datan möjliggör generering av stora mängder data men är också bra då den minimerar risken för att personlig integritet röjd och medför att data kan tillgängliggöras för forskning utan att identiteter röjs. I denna studie var det övergripande syftet att undersöka och sammanställa vilka verktyg för syntetisering av tabulär data som finns beskrivna i vetenskapliga publiceringar på engelska. Studien genomfördes genom att följa de åtta stegen i en systematisk litteraturstudie med tydligt definierade kriterier för vilka artiklar som skulle inkluderas eller exkluderas. De främsta kraven för artiklarna var att de beskrivna verktygen existerar i form av kod eller program, alltså inte enbart i teorin, samt var generella och applicerbara på olika tabulära dataset. Verktygen fick därmed inte bara fungera eller vara anpassad till ett specifikt dataset eller situation. De verktyg som fanns beskrivna i de återstående artiklarna efter genomförd sökning och därmed representeras i resultatet är (a) Synthpop, ett verktyg som togs fram i ett projekt för UK Longitudinal Studies för att kunna hantera känslig data och personuppgifter; (b) Gretel, ett kommersiellt och open-source verktyg som uppkommit för att möta det ökade behovet av träningsdata; (c) UniformGAN, en ny variant av GAN (Generative Adversarial Network) som genererar syntetiska tabulära dataset medan sekretess säkerställs samt; (d) Synthia, ett open-source paket för Python som är gjort för att generera syntetisk data med en eller flera variabler, univariat och multivariat data. Resultatet visade att verktygen använder sig av olika metoder och modeller för att framställa syntetisk data samt har olika grad av tillgänglighet. Gretel framträdde mest från verktygen, då den är mer kommersiell med fler tjänster samt erbjuder möjligheten att generera syntetiskt data utan att ha goda kunskaper i programmering. / During the last years the demand for big amounts of data to train machine learning algorithms has increased. The algorithms can be used to solve real world problems and challenges. A way to meet the demand is to generate synthetic data that preserve the statistical values and characteristics from real data. The synthetic data makes it possible to obtain large amounts of data, but is also good since it minimizes the risk for privacy issues in micro data. In that way, this type of data can be made accessible for important research without disclosure and potentially harming personal integrity. In this study, the overall aim was to examine and compile which tools for generation of synthetic data are described in scientific articles written in English. The study was conducted by following the eight steps of systematic literature reviews with clearly defined requirements for which articles to include or exclude. The primary requirements for the articles were that the described tools where existing in the form of accessible code or program and that they could be used for general tabular datasets. Thus the tools could not be made just for a specific dataset or situation. The tools that were described in the remaining articles after the search, and consequently included in the result of the study, was (a) Synthpop, a tool developed within the UK Longitudinal Studies to handle sensitive data containing personal information; (b) Gretel, a commercial and open source tool that was created to meet the demand for training data; (c) UniformGAN, a new Generative Adversarial Network that generates synthetic data while preserving privacy and (d) Synthia, a Python open-source package made to generate synthetic univariate and multivariate data. The result showed that the tools use different methods and models to generate synthetic data and have different degrees of accessibility. Gretel is distinguished from the other tools, since it is more commercial with several services and offers the possibility to generate synthetic data without good knowledge in programming.
38

Cell Classification for Layout Recognition in Spreadsheets

Koci, Elvis, Thiele, Maik, Romero, Oscar, Lehner, Wolfgang 28 July 2021 (has links)
Spreadsheets compose a notably large and valuable dataset of documents within the enterprise settings and on the Web. Although spreadsheets are intuitive to use and equipped with powerful functionalities, extracting and reusing data from them remains a cumbersome and mostly manual task. Their greatest strength, the large degree of freedom they provide to the user, is at the same time also their greatest weakness, since data can be arbitrarily structured. Therefore, in this paper we propose a supervised learning approach for layout recognition in spreadsheets. We work on the cell level, aiming at predicting their correct layout role, out of five predefined alternatives. For this task we have considered a large number of features not covered before by related work. Moreover, we gather a considerably large dataset of annotated cells, from spreadsheets exhibiting variability in format and content. Our experiments, with five different classification algorithms, show that we can predict cell layout roles with high accuracy. Subsequently, in this paper we focus on revising the classification results, with the aim of repairing misclassifications. We propose a sophisticated approach, composed of three steps, which effectively corrects a reasonable number of inaccurate predictions.
39

A Comparison of AutoML Hyperparameter Optimization Tools for Tabular Data

Pokhrel, Prativa 02 May 2023 (has links)
No description available.
40

[en] ALGORITHMS FOR TABLE STRUCTURE RECOGNITION / [pt] ALGORITMOS PARA RECONHECIMENTO DE ESTRUTURAS DE TABELAS

YOSVENI ESCALONA ESCALONA 26 June 2020 (has links)
[pt] Tabelas são uma forma bastante comum de organizar e publicar dados. Por exemplo, a Web possui um enorme número de tabelas publicadas em HTML, embutidas em documentos em PDF, ou que podem ser simplesmente baixadas de páginas Web. Porém, tabelas nem sempre são fáceis de interpretar pois possuem uma grande variedade de características e são organizadas de diversas formas. De fato, um grande número de métodos e ferramentas foram desenvolvidos para interpretação de tabelas. Esta dissertação apresenta a implementação de um algoritmo, baseado em Conditional Random Fields (CRFs), para classificar as linhas de uma tabela em linhas de cabeçalho, linhas de dados e linhas de metadados. A implementação é complementada por dois algoritmos para reconhecimento de tabelas em planilhas, respectivamente baseados em regras e detecção de regiões. Por fim, a dissertação descreve os resultados e os benefícios obtidos pela aplicação dos algoritmos a tabelas em formato HTML, obtidas da Web, e a tabelas em forma de planilhas, baixadas do Web site da Agência Nacional de Petróleo. / [en] Tables are widely adopted to organize and publish data. For example, the Web has an enormous number of tables, published in HTML, imbedded in PDF documents, or that can be simply downloaded from Web pages. However, tables are not always easy to interpret because of the variety of features and formats used. Indeed, a large number of methods and tools have been developed to interpret tables. This dissertation presents the implementation of an algorithm, based on Conditional Random Fields (CRFs), to classify the rows of a table as header rows, data rows or metadata rows. The implementation is complemented by two algorithms for table recognition in a spreadsheet document, respectively based on rules and on region detection. Finally, the dissertation describes the results and the benefits obtained by applying the implemented algorithms to HTML tables, obtained from the Web, and to spreadsheet tables, downloaded from the Brazilian National Petroleum Agency.

Page generated in 0.0707 seconds