• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1323
  • 364
  • 187
  • 126
  • 69
  • 39
  • 37
  • 33
  • 26
  • 25
  • 22
  • 21
  • 19
  • 12
  • 9
  • Tagged with
  • 2669
  • 603
  • 521
  • 422
  • 386
  • 332
  • 283
  • 281
  • 267
  • 241
  • 234
  • 206
  • 203
  • 199
  • 190
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Exploratory Data Analysis using Clusters and Stories

Hossain, Mahmud Shahriar 25 July 2012 (has links)
Exploratory data analysis aims to study datasets through the use of iterative, investigative, and visual analytic algorithms. Due to the difficulty in managing and accessing the growing volume of unstructured data, exploratory analysis of datasets has become harder than ever and an interest to data mining researchers. In this dissertation, we study new algorithms for exploratory analysis of data collections using clusters and stories. Clustering brings together similar entities whereas stories connect dissimilar objects. The former helps organize datasets into regions of interest, and the latter explores latent information by connecting the dots between disjoint instances. This dissertation specifically focuses on five different research aspects to demonstrate the applicability and usefulness of clusters and stories as exploratory data analysis tools. In the area of clustering, we investigate whether clustering algorithms can be automatically "alternatized" and how they can be guided to obtain alternative results using flexible constraints as "scatter-gather" operations. We demonstrate the application of these ideas in many application domains, including studying the bat biosonar system and designing sustainable products. In the area of storytelling, we develop algorithms that can generate stories using distance, clique, and syntactic constraints. We explore the use of storytelling for studying document collections in the biomedical literature and intelligence analysis domain. / Ph. D.
22

Accurate relative location of similar earthquakes

Logan, Alan Leslie Leonard January 1987 (has links)
No description available.
23

Large-scale density and velocity fields in the Universe

Lilje, Per Vidar Barth January 1988 (has links)
No description available.
24

Ion channel activity and signalling in the Fucus rhizoid

Manison, Nicholas Frederick January 1999 (has links)
No description available.
25

Structural and spectroscopic aspects of water clusters

Buffey, Ian Peter January 1988 (has links)
No description available.
26

Mathematical modelling of coagulation and gelation

Davies, Susan C. January 1998 (has links)
No description available.
27

Statistical analysis of large scale structure in the universe

Baugh, Carlton Martin January 1994 (has links)
No description available.
28

Clustering analysis of residential loads

Karimi, Kambiz January 1900 (has links)
Master of Science / Department of Electrical and Computer Engineering / Anil Pahwa / Understanding electricity consumer behavior at different times of the year and throughout the day is very import for utilities. Though electricity consumers pay a fixed predetermined amount of money for using electric energy, the market wholesale prices vary hourly during the day. This analysis is intended to see overall behavior of consumers in different seasons of the year and compare them with the market wholesale prices. Specifically, coincidence of peaks in the loads with peak of market wholesale price is analyzed. This analysis used data from 101 homes in Austin, TX, which are gathered and stored by Pecan Street Inc. These data were used to first determine the average seasonal load profiles of all houses. Secondly, the houses were categorized into three clusters based on similarities in the load profiles using k-means clustering method. Finally, the average seasonal profiles of each cluster with the wholesale market prices which was taken from Electric Reliability Council of Texas (ERCOT) were compared. The data obtained for the houses were in 15-min intervals so they were first changed to average hourly profiles. All the data were then used to determine average seasonal profiles for each house in each season (winter, spring, summer and fall). We decided to set three levels of clusters). All houses were then categorized into one of these three clusters using k-means clustering. Similarly electricity prices taken from ERCOT, which were also on 15-min basis, were changed to hourly averages and then to seasonal averages. Through clustering analysis we found that a low percent of the consumers did not change their pattern of electricity usage while the majority of the users changed their electricity usage pattern once from one season to another. This change in usage patterns mostly depends on level of income, type of heating and cooling systems used, and other electric appliances used. Comparing the ERCOT prices with the average seasonal electricity profiles of each cluster we found that winter and spring seasons are critical for utilities and the ERCOT price peaks in the morning while the peak loads occur in the evening. In summer and fall, on the other hand, ERCOT price and load demand peak at almost the same time with one or two hour difference. This analysis can help utilities and other authorities make better electricity usage policies so they could shift some of the load from the time of peak to other times.
29

Parallelisation of EST clustering

Ranchod, Pravesh 23 March 2006 (has links)
Master of Science - Science / The field of bioinformatics has been developing steadily, with computational problems related to biology taking on an increased importance as further advances are sought. The large data sets involved in problems within computational biology have dictated a search for good, fast approximations to computationally complex problems. This research aims to improve a method used to discover and understand genes, which are small subsequences of DNA. A difficulty arises because genes contain parts we know to be functional and other parts we assume are non-functional as there functions have not been determined. Isolating the functional parts requires the use of natural biological processes which perform this separation. However, these processes cannot read long sequences, forcing biologists to break a long sequence into a large number of small sequences, then reading these. This creates the computational difficulty of categorizing the short fragments according to gene membership. Expressed Sequence Tag Clustering is a technique used to facilitate the identification of expressed genes by grouping together similar fragments with the assumption that they belong to the same gene. The aim of this research was to investigate the usefulness of distributed memory parallelisation for the Expressed Sequence Tag Clustering problem. This was investigated empirically, with a distributed system tested for speed against a sequential one. It was found that distributed memory parallelisation can be very effective in this domain. The results showed a super-linear speedup for up to 100 processors, with higher numbers not tested, and likely to produce further speedups. The system was able to cluster 500000 ESTs in 641 minutes using 101 processors.
30

Apprentissage non supervisé de flux de données massives : application aux Big Data d'assurance / Unsupervided learning of massive data streams : application to Big Data in insurance

Ghesmoune, Mohammed 25 November 2016 (has links)
Le travail de recherche exposé dans cette thèse concerne le développement d'approches à base de growing neural gas (GNG) pour le clustering de flux de données massives. Nous proposons trois extensions de l'approche GNG : séquentielle, distribuée et parallèle, et une méthode hiérarchique; ainsi qu'une nouvelle modélisation pour le passage à l'échelle en utilisant le paradigme MapReduce et l'application de ce modèle pour le clustering au fil de l'eau du jeu de données d'assurance. Nous avons d'abord proposé la méthode G-Stream. G-Stream, en tant que méthode "séquentielle" de clustering, permet de découvrir de manière incrémentale des clusters de formes arbitraires et en ne faisant qu'une seule passe sur les données. G-Stream utilise une fonction d'oubli an de réduire l'impact des anciennes données dont la pertinence diminue au fil du temps. Les liens entre les nœuds (clusters) sont également pondérés par une fonction exponentielle. Un réservoir de données est aussi utilisé an de maintenir, de façon temporaire, les observations très éloignées des prototypes courants. L'algorithme batchStream traite les données en micro-batch (fenêtre de données) pour le clustering de flux. Nous avons défini une nouvelle fonction de coût qui tient compte des sous ensembles de données qui arrivent par paquets. La minimisation de la fonction de coût utilise l'algorithme des nuées dynamiques tout en introduisant une pondération qui permet une pénalisation des données anciennes. Une nouvelle modélisation utilisant le paradigme MapReduce est proposée. Cette modélisation a pour objectif de passer à l'échelle. Elle consiste à décomposer le problème de clustering de flux en fonctions élémentaires (Map et Reduce). Ainsi de traiter chaque sous ensemble de données pour produire soit les clusters intermédiaires ou finaux. Pour l'implémentation de la modélisation proposée, nous avons utilisé la plateforme Spark. Dans le cadre du projet Square Predict, nous avons validé l'algorithme batchStream sur les données d'assurance. Un modèle prédictif combinant le résultat du clustering avec les arbres de décision est aussi présenté. L'algorithme GH-Stream est notre troisième extension de GNG pour la visualisation et le clustering de flux de données massives. L'approche présentée a la particularité d'utiliser une structure hiérarchique et topologique, qui consiste en plusieurs arbres hiérarchiques représentant des clusters, pour les tâches de clustering et de visualisation. / The research outlined in this thesis concerns the development of approaches based on growing neural gas (GNG) for clustering of data streams. We propose three algorithmic extensions of the GNG approaches: sequential, distributed and parallel, and hierarchical; as well as a model for scalability using MapReduce and its application to learn clusters from the real insurance Big Data in the form of a data stream. We firstly propose the G-Stream method. G-Stream, as a “sequential" clustering method, is a one-pass data stream clustering algorithm that allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. G-Stream uses an exponential fading function to reduce the impact of old data whose relevance diminishes over time. The links between the nodes are also weighted. A reservoir is used to hold temporarily the distant observations in order to reduce the movements of the nearest nodes to the observations. The batchStream algorithm is a micro-batch based method for clustering data streams which defines a new cost function taking into account that subsets of observations arrive in discrete batches. The minimization of this function, which leads to a topological clustering, is carried out using dynamic clusters in two steps: an assignment step which assigns each observation to a cluster, followed by an optimization step which computes the prototype for each node. A scalable model using MapReduce is then proposed. It consists of decomposing the data stream clustering problem into the elementary functions, Map and Reduce. The observations received in each sub-dataset (within a time interval) are processed through deterministic parallel operations (Map and Reduce) to produce the intermediate states or the final clusters. The batchStream algorithm is validated on the insurance Big Data. A predictive and analysis system is proposed by combining the clustering results of batchStream with decision trees. The architecture and these different modules from the computational core of our Big Data project, called Square Predict. GH-Stream for both visualization and clustering tasks is our third extension. The presented approach uses a hierarchical and topological structure for both of these tasks.

Page generated in 0.1007 seconds