Global ETD Search

71	Density based clustering in large databases using projections and visualizations Hinneburg, Alexander. January 2002 (has links) (PDF) Halle, Wittenberg, University, Diss., 2003. Data Mining
72	Suchraumbeschränkung für relationales Data Mining Weber, Irene. January 2004 (has links) (PDF) Stuttgart, Universiẗat, Diss., 2004. Data Mining
73	Near real-time extract, transform and load Soon, Wilson Wei-Chwen. January 2007 (has links) (PDF) Thesis (M.S.C.I.T.)--Regis University, Denver, Colo., 2007. / Title from PDF title page (viewed on May 25, 2007). Includes bibliographical references. Data warehousing.
74	Distributed Ensemble Learning With Apache Spark Lind, Simon January 2016 (has links) No description available. Big Data
75	The complexities and possibilities of health data utilization in the West Coast District Zimri, Irma Selina January 2018 (has links) Magister Commercii - MCom (IM) (Information Management) / In an ideal public health arena, scientific evidence should be incorporated in the health information practices of making management decisions, developing policies, and implementing programs. However, much effort has been spent in developing health information practices focusing mainly on data collection, data quality and processing, with relatively little development on the utilization side of the information spectrum. Although the South Africa Health National Indicator Dataset of 2013 routinely collects and reports on more than two hundred elements, the degree to which this information is being used is not empirically known. The overall aim of the study was to explore the dynamics of routine primary healthcare information utilization in the West Coast district while identifying specific interventions that could ultimately lead to the improved use of data to better inform decision making. The ultimate goal being to enable managers to better utilize their routine health information for effective decision making. Data Health information Data quality Data use Health data
76	Enriching integrated statistical open city data by combining equational knowledge and missing value imputation Bischof, Stefan, Harth, Andreas, Kämpgen, Benedikt, Polleres, Axel, Schneider, Patrik 19 October 2017 (has links) (PDF) Several institutions collect statistical data about cities, regions, and countries for various purposes. Yet, while access to high quality and recent such data is both crucial for decision makers and a means for achieving transparency to the public, all too often such collections of data remain isolated and not re-useable, let alone comparable or properly integrated. In this paper we present the Open City Data Pipeline, a focused attempt to collect, integrate, and enrich statistical data collected at city level worldwide, and re-publish the resulting dataset in a re-useable manner as Linked Data. The main features of the Open City Data Pipeline are: (i) we integrate and cleanse data from several sources in a modular and extensible, always up-to-date fashion; (ii) we use both Machine Learning techniques and reasoning over equational background knowledge to enrich the data by imputing missing values, (iii) we assess the estimated accuracy of such imputations per indicator. Additionally, (iv) we make the integrated and enriched data, including links to external data sources, such as DBpedia, available both in a web browser interface and as machine-readable Linked Data, using standard vocabularies such as QB and PROV. Apart from providing a contribution to the growing collection of data available as Linked Data, our enrichment process for missing values also contributes a novel methodology for combining rule-based inference about equational knowledge with inferences obtained from statistical Machine Learning approaches. While most existing works about inference in Linked Data have focused on ontological reasoning in RDFS and OWL, we believe that these complementary methods and particularly their combination could be fruitfully applied also in many other domains for integrating Statistical Linked Data, independent from our concrete use case of integrating city data.
77	SIDVI: a model for secure distributed data integration Routly, Wayne A January 2004 (has links) The new millennium has brought about an increase in the use of business intelligence and knowledge management systems. The very foundations of these systems are the multitude of source databases that store the data. The ability to derive information from these databases is brought about by means of data integration. With the current emphasis on security in all walks of information and communication technology, a renewed interest must be placed in the systems that provide us with information; data integration systems. This dissertation investigates security issues at specific stages in the data integration cycle, with special reference to problems when performing data integration in a peer-topeer environment, as in distributed data integration. In the database environment we are concerned with the database itself and the media used to connect to and from the database. In distributed data integration, the concept of the database is redefined to the source database, from which we extract data and the storage database in which the integrated data is stored. This postulates three distinct areas in which to apply security, the data source, the network medium and the data store. All of these areas encompass data integration and must be considered holistically when implementing security. Data integration is never only one server or one database; it is various geographically dispersed components working together towards a common goal. It is important then that we consider all aspects involved when attempting to provide security for data integration. This dissertation will focus on the areas of security threats and investigates a model to ensure the integrity and security of data during the entire integration process. In order to ensure effective security in a data integration environment, that security, should be present at all stages, it should provide for end-to-end protection. Data protection
78	A statistical approach to automated detection of multi-component radio sources Smith, Jeremy Stewart 24 February 2021 (has links) Advances in radio astronomy are allowing for deeper and wider areas of the sky to be observed than ever before. Source counts of future radio surveys are expected to number in the tens of millions. Source finding techniques are used to identify sources in a radio image, however, these techniques identify single distinct sources and are challenged to identify multi-component sources, that is to say, where two or more distinct sources belong to the same underlying physical phenomenon, such as a radio galaxy. Identification of such phenomena is an important step in generating catalogues from surveys on which much of the radio astronomy science is based. Historically, identifying multi-component sources was conducted by visual inspection, however, the size of future surveys makes manual identification prohibitive. An algorithm to automate this process using statistical techniques is proposed. The algorithm is demonstrated on two radio images. The output of the algorithm is a catalogue where nearest neighbour source pairs are assigned a probability score of being a component of the same physical object. By applying several selection criteria, pairs of sources which are likely to be multi-component sources can be determined. Radio image cutouts are then generated from this selection and may be used as input into radio source classification techniques. Successful identification of multi-component sources using this method is demonstrated. data science
79	A temporal prognostic model based on dynamic Bayesian networks: mining medical insurance data Mbaka, Sarah Kerubo 10 September 2021 (has links) A prognostic model is a formal combination of multiple predictors from which risk probability of a specific diagnosis can be modelled for patients. Prognostic models have become essential instruments in medicine. The models are used for prediction purposes of guiding doctors to make a smart diagnosis, patient-specific decisions or help in planning the utilization of resources for patient groups who have similar prognostic paths. Dynamic Bayesian networks theoretically provide a very expressive and flexible model to solve temporal problems in medicine. However, this involves various challenges due both to the nature of the clinical domain, and the nature of the DBN modelling and inference process itself. The challenges from the clinical domain include insufficient knowledge of temporal interactions of processes in the medical literature, the sparse nature and variability of medical data collection, and the difficulty in preparing and abstracting clinical data in a suitable format without losing valuable information in the process. Challenges about the DBN methodology and implementation include the lack of tools that allow easy modelling of temporal processes. Overcoming this challenge will help to solve various clinical temporal reasoning problems. In this thesis, we addressed these challenges while building a temporal network with explanations of the effects of predisposing factors, such as age and gender, and the progression information of all diagnoses using claims data from an insurance company in Kenya. We showed that our network could differentiate the possible probability exposure to a diagnosis given the age and gender and possible paths given a patient's history. We also presented evidence that the more patient history is provided, the better the prediction of future diagnosis. Data Science
80	Designing an event display for the Transition Radiation Detector in ALICE Perumal, Sameshan 15 September 2021 (has links) We document here a successful design study for an event display focused on the Transition Radiation Detector (TRD) within A Large Ion Collider Experiment (ALICE) at the European Organisation for Nuclear Research (CERN). Reviews of the fields of particle physics and visualisation are presented to motivate formally designing this display for two different audiences. We formulate a methodology, based on successful design studies in similar fields, that involves experimental physicists in the design process as domain experts. An iterative approach incorporating in-person interviews is used to define a series of visual components applying best practices from literature. Interactive event display prototypes are evaluated with potential users, and refined using elicited feedback. The primary artefact is a portable, functional, effective, validated event display – a series of case studies evaluate its use by both scientists and the general public. We further document use cases for, and hindrances preventing, the adoption of event displays, and propose novel data visualisations of experimental particle physics data. We also define a flexible intermediate JSON data format suitable for web-based displays, and a generic task to convert historical data to this format. This collection of artefacts can guide the design of future event displays. Our work makes the case for a greater use of high quality data visualisation in particle physics, across a broad spectrum of possible users, and provides a framework for the ongoing development of web-based event displays of TRD data. Data Science

Search results