• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 22
  • 21
  • 1
  • 1
  • Tagged with
  • 56
  • 16
  • 14
  • 10
  • 10
  • 9
  • 9
  • 9
  • 8
  • 8
  • 8
  • 8
  • 8
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Enriching integrated statistical open city data by combining equational knowledge and missing value imputation

Bischof, Stefan, Harth, Andreas, Kämpgen, Benedikt, Polleres, Axel, Schneider, Patrik 19 October 2017 (has links) (PDF)
Several institutions collect statistical data about cities, regions, and countries for various purposes. Yet, while access to high quality and recent such data is both crucial for decision makers and a means for achieving transparency to the public, all too often such collections of data remain isolated and not re-useable, let alone comparable or properly integrated. In this paper we present the Open City Data Pipeline, a focused attempt to collect, integrate, and enrich statistical data collected at city level worldwide, and re-publish the resulting dataset in a re-useable manner as Linked Data. The main features of the Open City Data Pipeline are: (i) we integrate and cleanse data from several sources in a modular and extensible, always up-to-date fashion; (ii) we use both Machine Learning techniques and reasoning over equational background knowledge to enrich the data by imputing missing values, (iii) we assess the estimated accuracy of such imputations per indicator. Additionally, (iv) we make the integrated and enriched data, including links to external data sources, such as DBpedia, available both in a web browser interface and as machine-readable Linked Data, using standard vocabularies such as QB and PROV. Apart from providing a contribution to the growing collection of data available as Linked Data, our enrichment process for missing values also contributes a novel methodology for combining rule-based inference about equational knowledge with inferences obtained from statistical Machine Learning approaches. While most existing works about inference in Linked Data have focused on ontological reasoning in RDFS and OWL, we believe that these complementary methods and particularly their combination could be fruitfully applied also in many other domains for integrating Statistical Linked Data, independent from our concrete use case of integrating city data.
12

The iLog methodology for fostering valid and reliable Big Thick Data

Busso, Matteo 29 April 2024 (has links)
Nowadays, the apparent promise of Big Data is that of being able to understand in real-time people's behavior in their daily lives. However, as big as these data are, many useful variables describing the person's context (e.g., where she is, with whom she is, what she is doing, and her feelings and emotions) are still unavailable. Therefore, people are, at best, thinly described. A former solution is to collect Big Thick Data via blending techniques, combining sensor data sources with high-quality ethnographic data, to generate a dense representation of the person's context. As attractive as the proposal is, the approach is difficult to integrate into research paradigms dealing with Big Data, given the high cost of data collection, integration, and the expertise needed to manage them. Starting from a quantified approach to Big Thick Data, based on the notion of situational context, this thesis proposes a methodology, to design, collect, and prepare reliable and valid quantified Big Thick Data for the purposes of their reuse. Furthermore, the methodology is supported by a set of services to foster its replicability. The methodology has been applied in 4 case studies involving many domain experts and 10,000+ participants from 10 countries. The diverse applications of the methodology and the reuse of the data for multiple applications demonstrate its inner validity and reliability.
13

Hloubka funkcionálních dat / The Depth of Functional Data.

Nagy, Stanislav January 2011 (has links)
The depth function (functional) is a modern nonparametric statistical analysis tool for (finite-dimensional) data with lots of practical applications. In the present work we focus on the possibilities of the extension of the depth concept onto a functional data case. In the case of finite-dimensional functional data the isomorphism between the functional space and the finite-dimensional Euclidean space will be utilized in order to introduce the induced functional data depths. A theorem about induced depths' properties will be proven and on several examples the possibilities and restraints of it's practical applications will be shown. Moreover, we describe and demonstrate the advantages and disadvantages of the established depth functionals used in the literature (Fraiman-Muniz depths and band depths). In order to facilitate the outcoming drawbacks of known depths, we propose new, K-band depth based on the inference extension from continuous to smooth functions. Several important properties of the K-band depth will be derived. On a final supervised classification simulation study the reasonability of practical use of the new approach will be shown. As a conclusion, the computational complexity of all presented depth functionals will be compared.
14

Ochrana osobních údajů v EU - Biometrické údaje / Data protection in the EU - Biometric data

Jansa, Tomáš January 2019 (has links)
Data protection in the EU - Biometric data The main aim of this thesis is to deal with the data protection in connection with the biometric data. In the first chapter, the author of this work deals with the historical context. The right to privacy even nowadays represents the solid ground of the data protection. Therefore, its de- limitation and subsequent connection with the data privacy is of an upmost importance for a proper understanding of this problematics. The author also deals with the data protection not only in the european context, but also with the disunited legislation in the US, where a legisla- tion in the context of general data protection regulation is absent. The second chapter mainly dealt with stating the general legal principles and their rel- evance to the legal order as well as with the special principles laid down in the regulation, which are mandatory to be upheld. The third chapter dealt with the term of personal data. Moreover, it was also important to define the other terms, which goes hand in hand with the personal data term. Therefore, anonymous data as a personal data, which went through the anonymisation process, as well as the special category of personal data, which represents the fundament of the problematics of the biometric data and lastly also the term of data...
15

Discipline and research data in geography

Tam, Wan Ting (Winnie) January 2016 (has links)
Research data is essential to scholarship. The value of research data and its management has been increasingly recognized by policy makers and higher education institutions. A deep understanding of disciplinary practices is vital to develop culturally-sensitive policy, tools and services for successful data management. Previous research has shown that data practices vary across sub-fields and disciplines. However, much less is known about how disciplinary cultures shape data practices. There is a need to theorise research data practices based on empirical evidence in order to inform policy, tools and services. The aim of the thesis is to examine the interrelation between data practices and disciplinary cultures within geography. Geography is well-established and multidisciplinary, consisting of elements from the sciences, social sciences and humanities. By examining a single discipline this thesis develops a theoretical understanding of research data practices at a finer level of granularity than would be achieved by looking at broad disciplinary groupings such as the physical and social sciences. Data collection and analysis consisted of two phases. Phase one was exploratory, including an analysis of geography department websites and researcher web profiles and a bibliometric study of collaboration patterns based on co-authorship. Phase one aimed to understand the disciplinary characteristics of geography in preparation for Phase two. The second phase consisted of a series of 23 semi-structured interviews with researchers in geography, which aimed to understand researchers data practices and their attitudes toward data sharing within the context of the sub-discipline(s) they inhabited. The findings of the thesis show that there are contrasting intellectual, social and data differences between physical and human geography. For example, intellectually, these two branches of geography differ in terms of their research objects and methods; socially, they differ in terms of the scale of their collaborative activities and the motivations to collaborate; furthermore, the nature of data, how data is collected and data sharing practices are also different between physical and human geography. The thesis concludes that differences in the notion of data and data sharing practices are grounded in disciplinary characteristics. The thesis develops a new three-dimensional framework to better understand the notion of data from a disciplinary perspective. The three dimensions are (1) physical form, (2) intellectual content and (3) social construction. Furthermore, Becher and Trowler s (2001) disciplinary taxonomy i.e. hard-soft/pure-applied, and the concepts urban-rural ways of life and convergent-divergent communities, is shown to be useful to explain the diverse data sharing practices of geographers. The thesis demonstrates the usefulness of applying disciplinary theories to the sphere of research data management.
16

Vážené poloprostorové hloubky a jejich vlastnosti / Weighted Halfspace Depths and Their Properties

Kotík, Lukáš January 2015 (has links)
Statistical depth functions became well known nonparametric tool of multivariate data analyses. The most known depth functions include the halfspace depth. Although the halfspace depth has many desirable properties, some of its properties may lead to biased and misleading results especially when data are not elliptically symmetric. The thesis introduces 2 new classes of the depth functions. Both classes generalize the halfspace depth. They keep some of its properties and since they more respect the geometric structure of data they usually lead to better results when we deal with non-elliptically symmetric, multimodal or mixed distributions. The idea presented in the thesis is based on replacing the indicator of a halfspace by more general weight function. This provides us with a continuum, especially if conic-section weight functions are used, between a local view of data (e.g. kernel density estimate) and a global view of data as is e.g. provided by the halfspace depth. The rate of localization is determined by the choice of the weight functions and theirs parameters. Properties including the uniform strong consistency of the proposed depth functions are proved in the thesis. Limit distribution is also discussed together with some other data depth related topics (regression depth, functional data depth)...
17

A knowledge based approach of toxicity prediction for drug formulation : modelling drug vehicle relationships using soft computing techniques

Mistry, Pritesh January 2015 (has links)
This multidisciplinary thesis is concerned with the prediction of drug formulations for the reduction of drug toxicity. Both scientific and computational approaches are utilised to make original contributions to the field of predictive toxicology. The first part of this thesis provides a detailed scientific discussion on all aspects of drug formulation and toxicity. Discussions are focused around the principal mechanisms of drug toxicity and how drug toxicity is studied and reported in the literature. Furthermore, a review of the current technologies available for formulating drugs for toxicity reduction is provided. Examples of studies reported in the literature that have used these technologies to reduce drug toxicity are also reported. The thesis also provides an overview of the computational approaches currently employed in the field of in silico predictive toxicology. This overview focuses on the machine learning approaches used to build predictive QSAR classification models, with examples discovered from the literature provided. Two methodologies have been developed as part of the main work of this thesis. The first is focused on use of directed bipartite graphs and Venn diagrams for the visualisation and extraction of drug-vehicle relationships from large un-curated datasets which show changes in the patterns of toxicity. These relationships can be rapidly extracted and visualised using the methodology proposed in chapter 4. The second methodology proposed, involves mining large datasets for the extraction of drug-vehicle toxicity data. The methodology uses an area-under-the-curve principle to make pairwise comparisons of vehicles which are classified according to the toxicity protection they offer, from which predictive classification models based on random forests and decisions trees are built. The results of this methodology are reported in chapter 6.
18

Integrace Big Data a datového skladu / Integration of Big Data and data warehouse

Kiška, Vladislav January 2017 (has links)
Master thesis deals with a problem of data integration between Big Data platform and enterprise data warehouse. Main goal of this thesis is to create a complex transfer system to move data from a data warehouse to this platform using a suitable tool for this task. This system should also store and manage all metadata information about previous transfers. Theoretical part focuses on describing concepts of Big Data, brief introduction into their history and presents factors which led to need for this new approach. Next chapters describe main principles and attributes of these technologies and discuss benefits of their implementation within an enterprise. Thesis also describes technologies known as Business Intelligence, their typical use cases and their relation to Big Data. Minor chapter presents main components of Hadoop system and most popular related applications. Practical part of this work consists of implementation of a system to execute and manage transfers from traditional relation database, in this case representing a data warehouse, to cluster of a few computers running a Hadoop system. This part also includes a summary of most used applications to move data into Hadoop and a design of database metadata schema, which is used to manage these transfers and to store transfer metadata.
19

Analýza reálných dat z restauračního prostředí / Analysis of the real data from the restaurant sector

Šimeček, Petr January 2014 (has links)
The aim of this thesis is to analyze the real data from the restaurant sector in the center of Prague, prove assumptions based on existing knowledge and explore hidden relations. The database management system MySQL was used for the initial transformation of the original data structure. The data after the transformation were converted into a form that it was possible to manipulate with it using the procedure LMDataSource of the system LISp-Miner. The analysis of association of relations were used for the procedure 4ft-Miner of the system LISp-Miner. The MySQL database system was used for the frequency analysis to obtain results, and Microsoft Word and Excel were used to interpret the results. Some of the assumptions in the research were found proven. Furthermore, an interesting combination of relations was discovered. The output of this work allows the owner of the data to use some of the data analysis results for the optimization of internal processes. In addition, this study points out other possible ways to analyze these data.
20

Multidimenzionální analýza dat a zpracování analytického zobrazení / Multidimensional Data Analysis and Analytic View Processing

Foltýnová, Veronika January 2018 (has links)
This thesis deals with the analysis and display of multidimensional data. In the theoretical part, the issue of data mining, its tasks and techniques, and a brief explanation of the terms Business Intelligence and data warehouse are presented. The issue of databases is also described in this thesis. Subsequently, the options for displaying multidimensional data are described. At the end of the theoretical part is briefly explained the problems of optical networks and especially the terms Gigabit passive optical network and its frame, because the data from the frames of this network will be displayed by an application. In the practical part, you can find creating a source database and an application to create a OLAP cube and display multidimensional data. This application is based on the theoretical knowledge of multidimensional databases and OLAP technology.

Page generated in 0.0448 seconds