• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 78
  • 6
  • 5
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 129
  • 39
  • 38
  • 37
  • 28
  • 21
  • 19
  • 17
  • 16
  • 15
  • 15
  • 15
  • 14
  • 13
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Weather-sensitive, spatially-disaggregated electricity demand model for Nigeria

Oluwole, Oluwadamilola January 2018 (has links)
The historical underinvestment in power infrastructure and the poor performance of power delivery has resulted in extensive and regular power shortages in Nigeria. As Nigeria aims to bridge its power supply gap, the recent deregulation of its electricity market has seen the privatisation of its generation and distribution companies. Ambitious plans have also been put in place to expand the transmission network and the total power generation capacity. However, these plans have been developed with essentially arbitrary estimates for prevailing demand levels as the network and generation limits mean actual demand cannot be measured directly due to a programme of almost constant load shedding; the managed and intermittent distribution of inadequate energy allocation from the system operator. Network expansion planning and system reliability analysis require time series demand data to assess generation adequacy and to evaluate the impact of daily and seasonal influences on the energy supply-demand balance. To facilitate such analysis this thesis describes efforts to develop a credible time series electricity demand model for Nigeria. The focus of the approach has been to develop a fundamental bottom-up model of individual households accounting for a range of dwelling characteristics, socioeconomic factors, appliance use and household activities. A householder survey was conducted to provide essential inputs to allow a portfolio of household demand models which can account for weather-dependence and other factors. A range of national and regional socioeconomic and weather datasets have been employed to create a regionally disaggregated time series demand model. The generated demand estimates are validated against metered data obtained from Nigeria. The value of the approach is highlighted by using the model to investigate the potential for future load growth as well as analyse the impact of renewable energy generation on the Nigerian grid.
12

Advancement of Computing on Large Datasets via Parallel Computing and Cyberinfrastructure

Yildirim, Ahmet Artu 01 May 2015 (has links)
Large datasets require efficient processing, storage and management to efficiently extract useful information for innovation and decision-making. This dissertation demonstrates novel approaches and algorithms using virtual memory approach, parallel computing and cyberinfrastructure. First, we introduce a tailored user-level virtual memory system for parallel algorithms that can process large raster data files in a desktop computer environment with limited memory. The application area for this portion of the study is to develop parallel terrain analysis algorithms that use multi-threading to take advantage of common multi-core processors for greater efficiency. Second, we present two novel parallel WaveCluster algorithms that perform cluster analysis by taking advantage of discrete wavelet transform to reduce large data to coarser representations so data is smaller and more easily managed than the original data in size and complexity. Finally, this dissertation demonstrates an HPC gateway service that abstracts away many details and complexities involved in the use of HPC systems including authentication, authorization, and data and job management.
13

Ensemble of Feature Selection Techniques for High Dimensional Data

Vege, Sri Harsha 01 May 2012 (has links)
Data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships from large amounts of data stored in databases, data warehouses, or other information repositories. Feature selection is an important preprocessing step of data mining that helps increase the predictive performance of a model. The main aim of feature selection is to choose a subset of features with high predictive information and eliminate irrelevant features with little or no predictive information. Using a single feature selection technique may generate local optima. In this thesis we propose an ensemble approach for feature selection, where multiple feature selection techniques are combined to yield more robust and stable results. Ensemble of multiple feature ranking techniques is performed in two steps. The first step involves creating a set of different feature selectors, each providing its sorted order of features, while the second step aggregates the results of all feature ranking techniques. The ensemble method used in our study is frequency count which is accompanied by mean to resolve any frequency count collision. Experiments conducted in this work are performed on the datasets collected from Kent Ridge bio-medical data repository. Lung Cancer dataset and Lymphoma dataset are selected from the repository to perform experiments. Lung Cancer dataset consists of 57 attributes and 32 instances and Lymphoma dataset consists of 4027 attributes and 96 ix instances. Experiments are performed on the reduced datasets obtained from feature ranking. These datasets are used to build the classification models. Model performance is evaluated in terms of AUC (Area under Receiver Operating Characteristic Curve) performance metric. ANOVA tests are also performed on the AUC performance metric. Experimental results suggest that ensemble of multiple feature selection techniques is more effective than an individual feature selection technique.
14

Classification and Sequential Pattern Mining From Uncertain Datasets

Hooshsadat, Metanat Unknown Date
No description available.
15

Discovering Co-Location Patterns and Rules in Uncertain Spatial Datasets

Adilmagambetov, Aibek Unknown Date
No description available.
16

Tasks and visual techniques for the exploration of temporal graph data

Kerracher, Natalie January 2017 (has links)
This thesis considers the tasks involved in exploratory analysis of temporal graph data, and the visual techniques which are able to support these tasks. There has been an enormous increase in the amount and availability of graph (network) data, and in particular, graph data that is changing over time. Understanding the mechanisms involved in temporal change in a graph is of interest to a wide range of disciplines. While the application domain may differ, many of the underlying questions regarding the properties of the graph and mechanism of change are the same. The research area of temporal graph visualisation seeks to address the challenges involved in visually representing change in a graph over time. While most graph visualisation tools focus on static networks, recent research has been directed toward the development of temporal visualisation systems. By representing data using computer-generated graphical forms, Information Visualisation techniques harness human perceptual capabilities to recognise patterns, spot anomalies and outliers, and find relationships within the data. Interacting with these graphical representations allow individuals to explore large datasets and gain further insightinto the relationships between different aspects of the data. Visual approaches are particularly relevant for Exploratory Data Analysis (EDA), where the person performing the analysis may be unfamiliar with the data set, and their goal is to make new discoveries and gain insight through its exploration. However, designing visual systems for EDA can be difficult, as the tasks which a person may wish to carry out during their analysis are not always known at outset. Identifying and understanding the tasks involved in such a process has given rise to a number of task taxonomies which seek to elucidate the tasks and structure them in a useful way. While task taxonomies for static graph analysis exist, no suitable temporal graph taxonomy has yet been developed. The first part of this thesis focusses on the development of such a taxonomy. Through the extension and instantiation of an existing formal task framework for general EDA, a task taxonomy and a task design space are developed specifically for exploration of temporal graph data. The resultant task framework is evaluated with respect to extant classifications and is shown to address a number of deficiencies in task coverage in existing works. Its usefulness in both the design and evaluation processes is also demonstrated. Much research currently surrounds the development of systems and techniques for visual exploration of temporal graphs, but little is known about how the different types of techniques relate to one another and which tasks they are able to support. The second part of this thesis focusses on the possibilities in this area: a design spaceof the possible visual encodings for temporal graph data is developed, and extant techniques are classified into this space, revealing potential combinations of encodings which have not yet been employed. These may prove interesting opportunities for further research and the development of novel techniques. The third part of this work addresses the need to understand the types of analysis the different visual techniques support, and indeed whether new techniques are required. The techniques which are able to support the different task dimensions are considered. This task-technique mapping reveals that visual exploration of temporalgraph data requires techniques not only from temporal graph visualisation, but also from static graph visualisation and comparison, and temporal visualisation. A number of tasks which are unsupported or less-well supported, which could prove interesting opportunities for future research, are identified. The taxonomies, design spaces, and mappings in this work bring order to the range of potential tasks of interest when exploring temporal graph data and the assortmentof techniques developed to visualise this type of data, and are designed to be of use in both the design and evaluation of temporal graph visualisation systems.
17

The abundance and diversity of endogenous retroviruses in the chicken genome

Mason, Andrew Stephen January 2018 (has links)
Long terminal repeat (LTR) retrotransposons are autonomous eukaryotic repetitive elements which may elicit prolonged genomic and immunological stress on their host organism. LTR retrotransposons comprise approximately 10 % of the mammalian genome, but previous work identified only 1.35 % of the chicken genome as LTR retrotransposon sequence. This deficit appears inconsistent across birds, as studied Neoaves have contents comparable with mammals, although all birds contain only one LTR retrotransposon class: endogenous retroviruses (ERVs). One group of chicken-specific ERVs (Avian Leukosis Virus subgroup E; ALVEs) remain active and have been linked to commercially detrimental phenotypes, such as reduced lifetime egg count, but their full diversity and range of phenotypic effects are poorly understood. A novel identification pipeline, LocaTR, was developed to identify LTR retrotransposon sequences in the chicken genome. This enabled the annotation of 3.01 % of the genome, including 1,073 structurally intact elements with replicative potential. Elements were depleted within coding regions, and over 40 % of intact elements were found in clusters in gene sparse, poorly recombining regions. RNAseq analysis showed that elements were generally not expressed, but intact transcripts were identified in four cases, supporting the potential for viral recombination and retrotransposition of non-autonomous repeats. LocaTR analysis of seventy-two additional sauropsid genomes revealed highly lineage-specific repeat content, and did not support the proposed deficit in Galliformes. A second, novel bioinformatic pipeline was constructed to identify ALVE insertions in whole genome resequencing data and was applied to eight elite layer lines from Hy-Line International. Twenty ALVEs were identified and diagnostic assays were developed to validate the bioinformatic approach. Each ALVE was sequenced and characterised, with many exhibiting high structural intactness. In addition, a K locus revertant line was identified due to the unexpected presence of ALVE21, confirmed using BioNano optic maps. The ALVE identification pipeline was then applied to ninety chicken lines and 322 different ALVEs were identified, 81 % of which were novel. Overall, broilers and non-commercial chickens had a greater number of ALVEs than were found in layers. Taken together, these two analyses have enabled a thorough characterisation of both the abundance and diversity of chicken ERVs.
18

Pingo: A Framework for the Management of Storage of Intermediate Outputs of Computational Workflows

January 2017 (has links)
abstract: Scientific workflows allow scientists to easily model and express the entire data processing steps, typically as a directed acyclic graph (DAG). These scientific workflows are made of a collection of tasks that usually take a long time to compute and that produce a considerable amount of intermediate datasets. Because of the nature of scientific exploration, a scientific workflow can be modified and re-run multiple times, or new scientific workflows are created that might make use of past intermediate datasets. Storing intermediate datasets has the potential to save time in computations. Since storage is limited, one main problem that needs a solution is determining which intermediate datasets need to be saved at creation time in order to minimize the computational time of the workflows to be run in the future. This research thesis proposes the design and implementation of Pingo, a system that is capable of managing the computations of scientific workflows as well as the storage, provenance and deletion of intermediate datasets. Pingo uses the history of workflows submitted to the system to predict the most likely datasets to be needed in the future, and subjects the decision of dataset deletion to the optimization of the computational time of future workflows. / Dissertation/Thesis / Masters Thesis Computer Science 2017
19

Interpolation in stationary spatial and spatial-temporal datasets

Smit, Ansie 27 October 2010 (has links)
In the early 1950s the study on how to determine true ore-grade distributions in the mining sector, sparked the development of a series of statistical tools that specifically allows for spatial and subsequently spatial-temporal dependence. These statistics are commonly referred to as geostatistics, and has since been incorporated in several fields of study characterized by this dependence. Basic descriptive statistics and mapping tools for geostatistics are defined and illustrated by means of a simulated dataset. The moments are modelled according to predefined conditions and model structures to describe the spatial and spatial-temporal variance in the data. These variograms and covariance structures are subsequently utilized in the least square procedure, namely kriging. At present, kriging is most commonly used in geostatistics for the interpolation and simulation of spatial or spatial-temporal data. The univariate and multivariate spatial and spatial-temporal kriging techniques are tested on the simulated dataset, to demonstrate how interpolation weights are determined according to the lag distances and underlying variance structure. The strength, weaknesses and inherent complexities of the methodologies are highlighted. / Dissertation (MSc)--University of Pretoria, 2010. / Statistics / unrestricted
20

Taxonomy, phylogeny and population biology of Mycosphaerella species occurring on Eucalyptus

Hunter, Gavin Craig 09 July 2008 (has links)
Much research has been published on Mycosphaerella spp. causing Mycosphaerella Leaf Disease (MLD) on Eucalyptus spp. The first chapter of this thesis presents a review of the literature on this topic and focuses on the taxonomy, phylogeny and population biology of Mycosphaerella spp. occurring on Eucalyptus. From the published literature, it is clear that the majority of research conducted on MLD has focussed on the epidemiology and taxonomy of Mycosphaerella spp and the susceptibility of Eucalyptus hosts to species of Mycosphaerella. Advances in DNA-based technologies have, however, lead to extensive DNA sequence datasets of Mycosphaerella spp occurring on Eucalyptus. These datasets have provided substantial insight into species concepts for Mycosphaerella and have led to the realisation that many morphological species are complexes of several cryptic phylogenetic taxa. Furthermore, a recent application to the study of Mycosphaerella spp. occurring on Eucalyptus is that concerning their population dynamics. Such studies will aid in our understanding of the genetic structure of Mycosphaerella populations and their movement between countries. These population-based studies will aid forestry companies in establishing Eucalyptus breeding programmes to produce tolerant Eucalyptus genotypes that may be deployed in commercial forestry operations. Mycosphaerella spp. are difficult to identify, due to their conserved teleomorph morphology and the lack of natural occurrences of anamorph structures. DNA sequence data have, therefore, become the definitive technique used to identify Mycosphaerella spp. The Internal Transcribed Spacer (ITS) region of the ribosomal RNA operon has traditionally been targeted for DNA sequence comparisons. However, this gene region does not offer sufficient resolution to discriminate cryptic taxa or resolve deeper nodes within Mycosphaerella. Results presented in chapter two of this thesis present a multi-gene phylogeny for the identification of Mycosphaerella spp. occurring on Eucalyptus. This is based on DNA sequence data from four nuclear gene regions. The generation of these sequence datasets has allowed for competent elucidation of cryptic taxa, species complexes and the greater resolution of deeper nodes within Mycosphaerella. Furthermore, these results have also led to recognising that Mycosphaerella ambipyhlla and M. vespa is a synonym of Mycosphaerella molleriana and Pseudocercospora epispermogonia is recognised as the asexual state of Mycosphaerella marksii. A serious foliar disease of Eucalyptus camaldulensis and hybrids of this species has been known from Thailand and Vietnam for many years. This disease has been known to be caused by a species of Pseudocercospora and was attributed to the cosmopolitan Pseudocercospora eucalyptorum. Results of a study presented in chapter three of this thesis have, however, clearly shown that P. eucalyptorum is not the causal agent of the disease observed on E. camaldulensis in Thailand. By employing classical morphological techniques and DNA sequence data from four nuclear gene regions, I have shown that an undescribed species of Pseudocercospora is responsible for epidemics of this leaf disease. This species is formally described as Pseudocercospora flavomarginata. P. flavomarginata is only known from Thailand and Vietnam. However, considering that E. camaldulensis is planted in other south-east Asian countries and that E. camaldulensis is the most commonly found Eucalyptus sp. in Australia, further surveys in these areas will most likely lead to the discovery of the pathogen from these countries. Techniques that have been used to identify Mycosphaerella spp. include classical morphological comparisons and analyses of DNA sequence data. These techniques have, however, allowed only for the study of the evolutionary history within Mycosphaerella and for species identification. Recent advances in the field of population biology have led to the study of many fungal pathogens at a population level. One of the main tools used to study population biology involves applying DNA-based microsatellite markers. Chapter four of this thesis focuses on the development of DNA-based microsatellite markers for the Eucalyptus leaf pathogen Mycosphaerella nubilosa. By employing specific enrichment protocols, I was able to develop ten polymorphic microsatellite markers for M. nubilosa. These microsatellite markers exhibit high specificity for M. nubilosa and did not cross amplify with other Mycosphaerella spp. that are closely related to M. nubilosa. Mycosphaerella nubilosa has been extensively studied with respect to its taxonomy and epidemiology. However, nothing is known regarding the population biology of this important Eucalyptus leaf pathogen. Therefore, DNA-based microsatellite markers developed in chapter four of this thesis were used to study the population biology of M. nubilosa from several different geographic locations. Results presented in chapter five of this thesis show that populations of M. nubilosa from eastern Australia are genetically more diverse than those populations from western Australia, Africa and Europe. This indicates that eastern Australia is the likely centre of origin for M. nubilosa. Furthermore, based on shared haplotypes between M. nubilosa populations used in this study, I have proposed a pathway of gene flow of M. nubilosa. This suggests that the pathogen moved from eastern Australia to both western Australia and South Africa and then from South Africa into other countries in Africa and finally into Europe. An interesting result emerging from the population biology study presented in chapter five, is the finding that M. nubilosa appears to employ a homothallic mating strategy. Thus, opportunities exist, in countries with limited genetic diversity of M. nubilosa, to breed for Eucalyptus resistance. From the high number of M. nubilosa haplotypes observed in Australia and South Africa, it is also important that this pathogen be added to quarantine action lists to prevent the movement of contaminated Eucalyptus germplasm. This is necessary to prevent novel M. nubilosa haplotypes from moving into new environments where susceptible Eucalyptus spp. are propagated. Mycosphaerella nubilosa is one of the most pathogenic Mycosphaerella spp. causing MLD on Eucalyptus. Surveys of diseased Eucalyptus plantations from several countries where this pathogen occurs, have resulted in an extensive collection of M. nubilosa isolates. Recently, DNA-based studies have led to the hypothesis that M. nubilosa may represent two distinct taxa. Results of studies presented in chapter six of this thesis indicate that two distinct ITS phylogenetic lineages are represented by M. nubilosa sensu lato. These are characterized by defined geographic distributions and Eucalyptus host associations. M. nubilosa ITS lineage 1 is found exclusively in New Zealand, Tasmania and Victoria, eastern Australia occurring on E. globulus. M. nubilosa ITS lineage 2 has a broader geographic distribution and can be found in Spain, Portugal, Tanzania, Kenya, Ethiopia, South Africa, western Australia, Victoria and New South Wales, eastern Australia, where it occurs on E. globulus and several other Eucalyptus spp. that are used in commercial forestry including E. nitens. It is envisaged that results presented in chapter six will lead to more extensive studies into M. nubilosa sensu lato that may result in the description of a new Mycosphaerella sp. represented by M. nubilosa ITS lineage 1. / Thesis (PhD)--University of Pretoria, 2011. / Microbiology and Plant Pathology / Unrestricted

Page generated in 0.0605 seconds