51 |
Galaxy evolution in the William Herschel Deep FieldMcCracken, Henry Joy January 1999 (has links)
In this Thesis we investigate the evolutionary histories of faint field galaxies using extremely deep optical and near-infrared photometry. Our work is centred on a 50 arcmin(^2) region at high galactic latitude which we call "The William Herschel Deep Field" (WHDF). In this work we describe three new near-infrared surveys of this field. In considering both this infrared data and the existing optical data, our broad aims are to increase our understanding of both the growth of galaxy clustering in the Universe and also to determine the star-formation histories of the field galaxy population. We consider our observations primarily in the context of luminosity evolution models in low density universes, but alternative scenarios are considered. Near-infrared galaxy counts derived from our catalogues are consistent with the predictions of our models, without the need for a steep faint-end slope for the galaxy luminosity function. We find that optical-infrared colour distributions of infrared-selected galaxies in the WHDF are deficient in red, early-type galaxies. This is consistent with the predictions of evolutionary models in which these systems have a small amount of on-going star-formation. We measure the amplitude of galaxy clustering in the WHDF for galaxies selected in optical and near-infrared bandpasses using the projected two-point correlation function. By comparing our measured clustering amplitudes with the predictions of our models we find that in all bandpasses the growth of galaxy clustering is approximately fixed in proper co-ordinates, again assuming a low-density Universe. Finally, an analysis of errors on the correlation function measurements suggest that discrepancies between our work and those of other authors may be explained by an underestimation of statistical errors.
|
52 |
Power spectrum analysis of redshift surveysTadros-Attalla, Helen January 1996 (has links)
This thesis describes a study of the clustering properties of galaxies and clusters of galaxies as measured by the power spectrum (P(k)) and the counts in cells statistic. The samples used are the optical Stromlo-APM galaxy survey, the APM cluster survey and the IRAS 1.2Jy, QDOT and PSCz surveys. Throughout, N-body simulations, for a variety of cosmological models, are used to test methods and to supplement analytic error estimates. For the Stromlo-APM sample the amplitude of the power spectrum is dependent on galaxy morphology. Early-type galaxies show a higher clustering amplitude than late-type galaxies by a factor of ~ 1.8. There is also tentative evidence for some dependence of the clustering amplitude on galaxy luminosity. The parameter Ω0.6/b is estimated via a comparison with the real-space power spectrum of the two-dimensional APM galaxy survey. For APM clusters the power spectrum is measured to very small wave numbers, with a possible detection of the expected turn-over. The results are inconsistent with the standard cold dark matter model. The shape of P( k) for clusters is approximately the same as that for Stromlo-APM galaxies but amplified by a factor of ~ 3.5. The power spectrum of the QDOT sample depends sensitively on the galaxy weighting scheme, probably due the manner in which the region of the Hercules supercluster is sampled. A best estimate of the power spectrum of IRAS galaxies is computed by combining the IRAS l.2Jy and QDOT samples. The PSCz galaxy power spectrum is also computed. The PSCz galaxies have a clustering amplitude twice that of optical galaxies. A similar result is found from a joint counts in cells analysis. Redshift-space distortions in the PSCz sample are analysed using a spherical harmonic decomposition of the density field. The value of Ω0.6/b = 1 is ruled out by this analysis at the 2σ significance level.
|
53 |
Approches modèles pour la structuration du web vu comme un graphe / Model based approaches for uncovering web structuresZanghi, Hugo 25 June 2010 (has links)
L’analyse statistique des réseaux complexes est une tâche difficile, étant donné que des modèles statistiques appropriés et des procédures de calcul efficaces sont nécessaires afin d’apprendre les structures sous-jacentes. Le principe de ces modèles est de supposer que la distribution des valeurs des arêtes suit une distribution paramétrique, conditionnellement à une structure latente qui est utilisée pour détecter les formes de connectivité. Cependant, ces méthodes souffrent de procédures d’estimation relativement lentes, puisque les dépendances sont complexes. Dans cette thèse nous adaptons des stratégies d’estimation incrémentales, développées à l’origine pour l’algorithme EM, aux modèles de graphes. Additionnellement aux données de réseau utilisées dans les méthodes mentionnées ci-dessus, le contenu des noeuds est parfois disponible. Nous proposons ainsi des algorithmes de partitionnement pour les ensembles de données pouvant être modélisés avec une structure de graphe incorporant de l’information au sein des sommets. Finalement,un service Web en ligne, basé sur le moteur de recherche d’ Exalead, permet de promouvoir certains aspects de cette thèse. / He statistical analysis of complex networks is a challenging task, given that appropriate statistical models and efficient computational procedures are required in order for structures to be learned. The principle of these models is to assume that the distribution of the edge values follows a parametric distribution, conditionally on a latent structure which is used to detect connectivity patterns. However, these methods suffer from relatively slow estimation procedures, since dependencies are complex. In this thesis we adapt online estimation strategies, originally developed for the EM algorithm, to the case of graph models. In addition to the network data used in the methods mentioned above, vertex content will sometimes be available. We then propose algorithms for clustering data sets that can be modeled with a graph structure embedding vertex features. Finally, an online Web application, based on the Exalead search engine, allows to promote certain aspects of this thesis.
|
54 |
Comparison of blocking and hierarchical ways to find clusterKumar, Swapnil January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / William H. Hsu / Clustering in data mining is a process of discovering groups in a set of data such that the similarity within the group is maximized and the similarity among the groups is minimized.
One way of approaching clustering is to treat it as a blocking problem of minimizing the maximum distance between any two units within the same group. This method is known as Threshold blocking. It works by applying blocking as a graph partition problem.
Chameleon is a hierarchical clustering algorithm, that based on dynamic modelling measures the similarity between two clusters. In the clustering process, to merge two cluster, we check if the inter-connectivity and closeness between two clusters are high relative to the internal inter-connectivity of the clusters and closeness of items within the clusters. This way of merging of cluster using the dynamic model helps in discovery of natural and homogeneous clusters.
The main goal of this project is to implement a local implementation of CHAMELEON and compare the output generated from Chameleon against Threshold blocking algorithm suggested by Higgins et al with its hybridized form and unhybridized form.
|
55 |
Ethnicity and residential location in Kampala-Mengo (1890-1968)Sendi, Richard Senteza January 1987 (has links)
No description available.
|
56 |
Trajectory Clustering Using a Variation of Fréchet DistanceVafa, Khoshaein January 2014 (has links)
Location-aware devices are one of the examples of variety of systems that can provide trajectory data. The formal definition of a trajectory is the path of a moving object in space as a function of time. Surveillance systems can now automatically detect moving objects and provide a useful dataset for further analysis. Clustering moving objects in a given scene can provide vital information about the trajectory patterns and outliers. The trajectory of an object may contain extended data at each position where the object was detected such as size, colour, etc. The focus of this work is to find an efficient trajectory clustering solution given the most fundamental trajectory data, namely position and time. The main challenge of clustering trajectory data is to handle the length of a single trajectory. The length of a trajectory can be extremely long in some cases. Hence it may cause problems to keep trajectories in main memory or it may be very inefficient to process them. Preprocessing trajectories and simplifying them will help minimize the effects of such issues. We will use some algorithms taken from literature in conjunction with some of our own algorithms in order to cluster trajectories in an efficient manner. In an attempt to accomplish this, we have designed a representation of a trajectory Furthermore, we have designed and implemented algorithms to simplify and evaluate distances between these trajectories. Moreover, we proved that our distance function obeys triangulation properties which is beneficial for clustering algorithms. Our distance function is a variation of the Fréchet distance proposed in 1906 by Maurice René Fréchet. Additionally, we will illustrate how our work can be integrated with an incremental clustering algorithm to cluster trajectories.
|
57 |
Incremental Anomaly Detection Using Two-Layer Cluster-based StructureBigdeli, Elnaz January 2016 (has links)
Anomaly detection algorithms face several challenges, including processing
speed and dealing with noise in data. In this thesis, a two-layer cluster-
based anomaly detection structure is presented which is fast, noise-resilient
and incremental. In this structure, each normal pattern is considered as
a cluster, and each cluster is represented using a Gaussian Mixture Model
(GMM). Then, new instances are presented to the GMM to be labeled as
normal or abnormal.
The proposed structure comprises three main steps. In the first step, the
data are clustered. The second step is to represent each cluster in a way
that enables the model to classify new instances. The Summarization based
on Gaussian Mixture Model (SGMM) proposed in this thesis represents each
cluster as a GMM.
In the third step, a two-layer structure efficiently updates clusters using
GMM representation while detecting and ignoring redundant instances. A
new approach, called Collective Probabilistic Labeling (CPL) is presented
to update clusters in a batch mode. This approach makes the updating
phase noise-resistant and fast. The collective approach also introduces a new
concept called 'rag bag' used to store new instances. The new instances
collected in the rag bag are clustered and summarized by GMMs. This
enables online systems to identify nearby clusters in the existing and new
clusters, and merge them quickly, despite the presence of noise to update the model.
An important step in the updating is the merging of new clusters with ex-
isting ones. To this end, a new distance measure is proposed, which is a mod-
i ed Kullback-Leibler distance between two GMMs. This modi ed distance
allows accurate identi cation of nearby clusters. After finding neighboring
clusters, they are merged, quickly and accurately. One of the reasons that
GMM is chosen to represent clusters is to have a clear and valid mathematical
representation for clusters, which eases further cluster analysis.
In most real-time anomaly detection applications, incoming instances are
often similar to previous ones. In these cases, there is no need to update
clusters based on duplicates, since they have already been modeled in the
cluster distribution. The two-layer structure is responsible for identifying
redundant instances. In this structure, redundant instance are ignored, and
the remaining new instances are used to update clusters. Ignoring redundant
instances, which are typically in the majority, makes the detection phase fast.
Each part of the general structure is validated in this thesis. The experiments include, detection rates, clustering goodness, time, memory usage
and the complexity of the algorithms. The accuracy of the clustering and
summarization of clusters using GMMs is evaluated, and compared to that of
other methods. Using Davies-Bouldin (DB) and Dunn indexes, the distances
for original and regenerated clusters using GMMs is almost zero with SGMM
method while this value for ABACUS is around 0:01. Moreover, the results
show that the SGMM algorithm is 3 times faster than ABACUS in running
time, using one-third of the memory used by ABACUS.
The CPL method, used to label new instances, is found to collectively
remove the effect of noise, while increasing the accuracy of labeling new
instances. In a noisy environment, the detection rate of the CPL method
is 5% higher than other algorithms such as one-class SVM. The false alarm rate is decreased by 10% on average. Memory use is 20 times lesser that that
of the one-class SVM.
The proposed method is found to lower the false alarm rate, which is
one of the basic problems for the one-class SVM. Experiments show the false
alarm rate is decreased from 5% to 15% among different datasets, while the
detection rate is increased from 5% to 10% in di erent datasets with two-
layer structure. The memory usage for the two-layer structure is 20 to 50
times less than that of one-class SVM. One-class SVM uses support vectors in
labeling new instances, while the labeling of the two-layer structure depends
on the number of GMMs. The experiments show that the two-layer structure
is 20 to 50 times faster than the one-class SVM in labeling new instances.
Moreover, the updating time of two-layer structure is 2 to 3 times less than
one-layer structure. This reduction is the direct result of ignoring redundant
instances and using two-layer structure.
|
58 |
The Effect of Item Distance on Organization in the Free Recall of WordsClay, James H. (James Hamilton) 08 1900 (has links)
The purpose of the present study was to investigate the effect of item distance, which is defined as the absolute number of words separating a single item from the other items of the category, upon clustering of the removed items. By studying clustering, psychologists hope to gain knowledge of the effect of organization on memory.
|
59 |
Experimental Analysis of Multiple SDN ControllerGhimire, Sudip 01 December 2021 (has links)
As technology is moving toward cloud computing and virtualization it has led to the SDN paradigm, which separates the data plane from the control plane and places control at the application layer as opposed to the network layer. SDN provides dynamic and efficient configuration by switching control from to software. In comparison to traditional networks, it has a number of advantages, including lower costs, improved security, greater flexibility, and the prevention of vendor lock-in. As a result, SDN has become one of the essential solutions for replicating, re-policing, and re-configuring large-scale networks with periodic updates such as Data Centres. The most widely used SDN protocol/standard at the moment is OpenFlow, which includes design specifications. By integrating OpenFlow, data centers’ networking by making the network more consistent. A single controller architecture will be inefficient for such more extensive networks; thus recent Research has introduced software-defined with multiple controllers to the of High-availability and tolerance. Furthermore, there are a number of projects that offer SDN architecture, all of which need to be thoroughly analyzed based on their performance under various criteria in order to determine their efficiency. A comparison of the performance of multiple controller SDN architectures versus a single controller SDN architecture is presented in this paper. This study developed and examined the OpenDaylight SDN controller, using the Mininet as a network emulator. We perform a performance evaluation considering average throughput , Topology time, flow setup, table read time, flow deletion time considering different numbers of switch cases using Opendaylight Controller. Packet capturing and analysis under various conditions were performed in the experiment and presented as a graph. Under the high load, the cluster throughput and near to the mode. Further, we implement the Single controller connection for the switches and compare it against the normal all controller connection mode. We found that with a Single Controller connection in Cluster, the average topology discovery time, and flow setup time does improve. As a result, these experiments with SDN networks demonstrate that they can be improved under different network conditions.
|
60 |
Hyperbolic Distributions and Transformations for Clustering Incomplete Data with Extensions to Matrix Variate NormalityPocuca, Nikola January 2023 (has links)
Under realistic scenarios, data are often incomplete, asymmetric, or of high-dimensionality.
More intricate data structures often render standard approaches infeasible due to
methodological or computational limitations. This monograph consists of four contributions each solving a specific problem within model-based clustering. An R package
is developed consisting of a three-phase imputation method for both elliptical and hyperbolic parsimonious models. A novel stochastic technique is employed to speed up
computations for hyperbolic distributions demonstrating superior performance overall. A hyperbolic transformation model is conceived for clustering asymmetrical data
within a heterogeneous context. Finally, for high-dimensionality, a framework is developed for assessing matrix variate normality within three-way datasets. All things
considered, this work constitutes a powerful set of tools to deal with the ever-growing
complexity of big data / Dissertation / Doctor of Science (PhD)
|
Page generated in 0.1024 seconds