• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 22
  • 21
  • 1
  • 1
  • Tagged with
  • 56
  • 16
  • 14
  • 10
  • 10
  • 9
  • 9
  • 9
  • 8
  • 8
  • 8
  • 8
  • 8
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

LHCb data management on the computing grid

Smith, Andrew Cameron January 2009 (has links)
The LHCb detector is one of the four experiments being built to harness the proton-proton collisions provided by the Large Hadron Collider (LHC) at the European Organisation for Nuclear Research (CERN). The data rate expected, when the LHC experiments are fully operational, eclipses that of any previous scientific experiments and has motivated the adoption of a grid computing paradigm to store and process the data. Managing PetaBytes of data in a distributed environment provides a rich set of challenges related to scalability, reliability and performance. This thesis will present the data management requirements for executing the workload of the LHCb collab- oration. We present the systems designed that support all aspects of the grid data management for LHCb, from data transfer, to data integrity, and efficient data access. The distributed computing environment is inherently unstable and much focus has been made on providing systems that are ro- bust and resilient to observed failures.
2

Open City Data Pipeline

Bischof, Stefan, Kämpgen, Benedikt, Harth, Andreas, Polleres, Axel, Schneider, Patrik 02 1900 (has links) (PDF)
Statistical data about cities, regions and at country level is collected for various purposes and from various institutions. Yet, while access to high quality and recent such data is crucial both for decision makers as well as for the public, all to often such collections of data remain isolated and not re-usable, let alone properly integrated. In this paper we present the Open City Data Pipeline, a focused attempt to collect, integrate, and enrich statistical data collected at city level worldwide, and republish this data in a reusable manner as Linked Data. The main feature of the Open City Data Pipeline are: (i) we integrate and cleanse data from several sources in a modular and extensible, always up-to-date fashion; (ii) we use both Machine Learning techniques as well as ontological reasoning over equational background knowledge to enrich the data by imputing missing values, (iii) we assess the estimated accuracy of such imputations per indicator. Additionally, (iv) we make the integrated and enriched data available both in a we browser interface and as machine-readable Linked Data, using standard vocabularies such as QB and PROV, and linking to e.g. DBpedia. Lastly, in an exhaustive evaluation of our approach, we compare our enrichment and cleansing techniques to a preliminary version of the Open City Data Pipeline presented at ISWC2015: firstly, we demonstrate that the combination of equational knowledge and standard machine learning techniques significantly helps to improve the quality of our missing value imputations; secondly, we arguable show that the more data we integrate, the more reliable our predictions become. Hence, over time, the Open City Data Pipeline shall provide a sustainable effort to serve Linked Data about cities in increasing quality. / Series: Working Papers on Information Systems, Information Business and Operations
3

On techniques for pay-as-you-go data integration of linked data

Christodoulou, Klitos January 2015 (has links)
It is recognised that nowadays, users interact with large amounts of data that exist in disparate forms, and are stored under different settings. Moreover, it is true that the amount of structured and un-structured data outside a single well organised data management system is expanding rapidly. To address the recent challenges of managing large amounts of potentially distributed data, the vision of a dataspace was introduced. This data management paradigm aims at reducing the complexity behind the challenges of integrating heterogeneous data sources. Recently, efforts by the Linked Data (LD) community gave rise to a Web of Data (WoD) that interweaves with the current Web of documents in a way that it is useful for data consumption by both humans and computational agents. On the WoD, datasets are structured under a common data model and published as Web resources following a simple set of guidelines that enables them to be linked with other pieces of data, as well as, to be annotated with useful meta data that help determine their semantics. The WoD is an evolving open ecosystem including specialist publishers as well as community efforts aiming at re-publishing isolated databases as LD on the WoD, and annotating them with meta data. The WoD raises new opportunities and challenges. However, currently it mostly relies on manual efforts for integrating the large amounts of heterogeneous data sources on the WoD. This dissertation makes the case that several techniques from the dataspaces research area (aiming at on-demand integration of data sources in a pay-as-you-go fashion) can support the integration of heterogeneous WoD sources. In so doing, this dissertation explores the opportunities and identifies the challenges of adapting existing pay-as-you-go data integration techniques in the context of LD. More specifically, this dissertation makes the following contributions: (1) a case-study for identifying the challenges when existing pay-as-you-go data integration techniques are applied in a setting where data sources are LD; (2) a methodology that deals with the 'schema-less' nature of LD sources by automatically inferring a conceptual structure from a given RDF graph thus enabling downstream tasks, such as the identification of matches and the derivation of mappings, which are, both, essential for the automatic bootstrapping of a dataspace; and (3) a well-defined, principled methodology that builds on a Bayesian inference technique for reasoning under uncertainty to improve pay-as-you-go integration. Although the developed methodology is generic in being able to reason with different hypothesis, its effectiveness has only been explored on reducing the uncertain decisions made by string-based matchers during the matching stage of a dataspace system.
4

Možnosti využití konceptu Big Data v pojišťovnictví

Stodolová, Jana January 2019 (has links)
This diploma thesis deals with the phenomenon of recent years called Big Data. Big Data are unstructured data of large volume which cannot be managed and processed by commonly used software tools. The analytical part deals with the concept of Big Data and analyses the possibilities of using this concept in the in-surance sector. The practical part presents specific methods and approaches for the use of big data analysis, specifically in increasing the competitiveness of the insurance company and in detecting insurance frauds. Most space is devoted to data mining methods in modelling the task of detecting insurance frauds. This di-ploma thesis builds on and extends the bachelor thesis of the author titled "Mod-ern technology of data analysis and its use in detection of insurance frauds".
5

Defining data as an art material

Freeman, Julie January 2018 (has links)
Digital technology, and speci cally digital data, forms the backbone of nearly all our communications including machine to machine, human to machine, and, increasingly, human to human. It is unsurprising that one of the most prevalent materials of our time is used by artists to create work. This thesis defines data as an art material. It investigates the variety of manifestations of data when used in art, through the review of existing artwork and the development of new artworks and visualisations that use a dataset collected for this research. Through the lens of conceptualising data as an art material, a definition and manifesto of data art is put forward (Chapter 2). In addition, a taxonomy for describing data as an art material is proposed and its usage explored by applying it to a number of data art descriptions and by analysing a database of data artworks tagged with relevant terms (Chapter 3). Temporal, biological, and real-time, terms from the taxonomy, are particularly relevant to the way in which digital technology mediates our connection to nature. To explore these forms of data within artwork, a collaboration with Dr Chris Faulkes, Reader in Evolutionary Ecology, facilitated the design and implementation of an electronic system to collect data from a colony of animals. Chapter 4 describes the tracking system which resulted in a real-time stream of biological temporal data. Translations of this data are explored in more detail through the practical application of various computational techniques including scientific analysis (Chapter 5), animation, sonification, data visualisation (Chapter 6) and soft robotic objects (Chapter 7). The thesis demonstrates that an inanimate object, animated through the translation of data, can have a body language through which to effectively convey characteristics of living things (Chapter 8). Finally, public engagement events are presented in Chapter 9, with reflections, contributions and future work concluded in Chapter 10.
6

Is operational research in UK universities fit-for-purpose for the growing field of analytics?

Mortenson, Michael J. January 2018 (has links)
Over the last decade considerable interest has been generated into the use of analytical methods in organisations. Along with this, many have reported a significant gap between organisational demand for analytical-trained staff, and the number of potential recruits qualified for such roles. This interest is of high relevance to the operational research discipline, both in terms of raising the profile of the field, as well as in the teaching and training of graduates to fill these roles. However, what is less clear, is the extent to which operational research teaching in universities, or indeed teaching on the various courses labelled as analytics , are offering a curriculum that can prepare graduates for these roles. It is within this space that this research is positioned, specifically seeking to analyse the suitability of current provisions, limited to master s education in UK universities, and to make recommendations on how curricula may be developed. To do so, a mixed methods research design, in the pragmatic tradition, is presented. This includes a variety of research instruments. Firstly, a computational literature review is presented on analytics, assessing (amongst other things) the amount of research into analytics from a range of disciplines. Secondly, a historical analysis is performed of the literature regarding elements that can be seen as the pre-cursor of analytics, such as management information systems, decision support systems and business intelligence. Thirdly, an analysis of job adverts is included, utilising an online topic model and correlations analyses. Fourthly, online materials from UK universities concerning relevant degrees are analysed using a bagged support vector classifier and a bespoke module analysis algorithm. Finally, interviews with both potential employers of graduates, and also academics involved in analytics courses, are presented. The results of these separate analyses are synthesised and contrasted. The outcome of this is an assessment of the current state of the market, some reflections on the role operational research make have, and a framework for the development of analytics curricula. The principal contribution of this work is practical; providing tangible recommendations on curricula design and development, as well as to the operational research community in general in respect to how it may react to the growth of analytics. Additional contributions are made in respect to methodology, with a novel, mixed-method approach employed, and to theory, with insights as to the nature of how trends develop in both the jobs market and in academia. It is hoped that the insights here, may be of value to course designers seeking to react to similar trends in a wide range of disciplines and fields.
7

A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure

Suthakar, Uthayanath January 2017 (has links)
Monitoring data-intensive scientific infrastructures in real-time such as jobs, data transfers, and hardware failures is vital for efficient operation. Due to the high volume and velocity of events that are produced, traditional methods are no longer optimal. Several techniques, as well as enabling architectures, are available to support the Big Data issue. In this respect, this thesis complements existing survey work by contributing an extensive literature review of both traditional and emerging Big Data architecture. Scalability, low-latency, fault-tolerance, and intelligence are key challenges of the traditional architecture. However, Big Data technologies and approaches have become increasingly popular for use cases that demand the use of scalable, data intensive processing (parallel), and fault-tolerance (data replication) and support for low-latency computations. In the context of a scalable data store and analytics platform for monitoring data-intensive scientific infrastructure, Lambda Architecture was adapted and evaluated on the Worldwide LHC Computing Grid, which has been proven effective. This is especially true for computationally and data-intensive use cases. In this thesis, an efficient strategy for the collection and storage of large volumes of data for computation is presented. By moving the transformation logic out from the data pipeline and moving to analytics layers, it simplifies the architecture and overall process. Time utilised is reduced, untampered raw data are kept at storage level for fault-tolerance, and the required transformation can be done when needed. An optimised Lambda Architecture (OLA), which involved modelling an efficient way of joining batch layer and streaming layer with minimum code duplications in order to support scalability, low-latency, and fault-tolerance is presented. A few models were evaluated; pure streaming layer, pure batch layer and the combination of both batch and streaming layers. Experimental results demonstrate that OLA performed better than the traditional architecture as well the Lambda Architecture. The OLA was also enhanced by adding an intelligence layer for predicting data access pattern. The intelligence layer actively adapts and updates the model built by the batch layer, which eliminates the re-training time while providing a high level of accuracy using the Deep Learning technique. The fundamental contribution to knowledge is a scalable, low-latency, fault-tolerant, intelligent, and heterogeneous-based architecture for monitoring a data-intensive scientific infrastructure, that can benefit from Big Data, technologies and approaches.
8

INVESTIGATING MACHINE LEARNING ALGORITHMS WITH IMBALANCED BIG DATA

Unknown Date (has links)
Recent technological developments have engendered an expeditious production of big data and also enabled machine learning algorithms to produce high-performance models from such data. Nonetheless, class imbalance (in binary classifications) between the majority and minority classes in big data can skew the predictive performance of the classification algorithms toward the majority (negative) class whereas the minority (positive) class usually holds greater value for the decision makers. Such bias may lead to adverse consequences, some of them even life-threatening, when the existence of false negatives is generally costlier than false positives. The size of the minority class can vary from fair to extraordinary small, which can lead to different performance scores for machine learning algorithms. Class imbalance is a well-studied area for traditional data, i.e., not big data. However, there is limited research focusing on both rarity and severe class imbalance in big data. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2019. / FAU Electronic Theses and Dissertations Collection
9

Impact of data quality on photovoltaic (PV) performance assessment

Koubli, Eleni January 2017 (has links)
In this work, data quality control and mitigation tools have been developed for improving the accuracy of photovoltaic (PV) system performance assessment. These tools allow to demonstrate the impact of ignoring erroneous or lost data on performance evaluation and fault detection. The work mainly focuses on residential PV systems where monitoring is limited to recording total generation and the lack of meteorological data makes quality control in that area truly challenging. Main quality issues addressed in this work are with regards to wrong system description and missing electrical and/or meteorological data in monitoring. An automatic detection of wrong input information such as system nominal capacity and azimuth is developed, based on statistical distributions of annual figures of PV system performance ratio (PR) and final yield. This approach is specifically useful in carrying out PV fleet analyses where only monthly or annual energy outputs are available. The evaluation is carried out based on synthetic weather data which is obtained by interpolating from a network of about 80 meteorological monitoring stations operated by the UK Meteorological Office. The procedures are used on a large PV domestic dataset, obtained by a social housing organisation, where a significant number of cases with wrong input information are found. Data interruption is identified as another challenge in PV monitoring data, although the effect of this is particularly under-researched in the area of PV. Disregarding missing energy generation data leads to falsely estimated performance figures, which consequently may lead to false alarms on performance and/or the lack of necessary requirements for the financial revenue of a domestic system through the feed-in-tariff scheme. In this work, the effect of missing data is mitigated by applying novel data inference methods based on empirical and artificial neural network approaches, training algorithms and remotely inferred weather data. Various cases of data loss are considered and case studies from the CREST monitoring system and the domestic dataset are used as test cases. When using back-filled energy output, monthly PR estimation yields more accurate results than when including prolonged data gaps in the analysis. Finally, to further discriminate more obscure data from system faults when higher temporal resolution data is available, a remote modelling and failure detection framework is ii developed based on a physical electrical model, remote input weather data and system description extracted from PV module and inverter manufacturer datasheets. The failure detection is based on the analysis of daily profiles and long-term PR comparison of neighbouring PV systems. By employing this tool on various case studies it is seen that undetected wrong data may severely obscure fault detection, affecting PV system s lifetime. Based on the results and conclusions of this work on the employed residential dataset, essential data requirements for domestic PV monitoring are introduced as a potential contribution to existing lessons learnt in PV monitoring.
10

Transformace HTML dat o produktech do Linked Data formátu / Converting HTML product data to Linked Data

Kadleček, Rastislav January 2018 (has links)
In order to make a step towards the idea of the Semantic Web it is necessary to research ways how to retrieve semantic information from documents published on the current Web 2.0. As an answer to growing amount of data published in a form of relational tables, the Odalic system, based on the extended TableMiner+ Semantic Table Interpretation algorithm was introduced to provide a convenient way to semantize tabular data using knowledge base disambiguation process. The goal of this thesis is to propose an extended algorithm for the Odalic system, which would allow the system to gather semantic information for tabular data describing products from e-shops, which have very limited presence in the knowl- edge bases. This should be achieved by using a machine learning technique called classification. This thesis consists of several parts - obtaining and preprocessing of the product data from e-shops, evaluation of several classification algorithms in order to select the best-performing one, description of design and implementation of the extended Odalic algorithm, description of its integration into the Odalic system, evaluation of the improved algorithm using the obtained product data and semantization of the product data using the new Odalic algorithm. In the end, the results are concluded and possible...

Page generated in 0.083 seconds