• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1324
  • 364
  • 187
  • 126
  • 69
  • 39
  • 37
  • 33
  • 26
  • 25
  • 22
  • 21
  • 19
  • 12
  • 9
  • Tagged with
  • 2671
  • 604
  • 522
  • 422
  • 388
  • 333
  • 283
  • 281
  • 269
  • 241
  • 236
  • 206
  • 204
  • 199
  • 190
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Approches modèles pour la structuration du web vu comme un graphe / Model based approaches for uncovering web structures

Zanghi, Hugo 25 June 2010 (has links)
L’analyse statistique des réseaux complexes est une tâche difficile, étant donné que des modèles statistiques appropriés et des procédures de calcul efficaces sont nécessaires afin d’apprendre les structures sous-jacentes. Le principe de ces modèles est de supposer que la distribution des valeurs des arêtes suit une distribution paramétrique, conditionnellement à une structure latente qui est utilisée pour détecter les formes de connectivité. Cependant, ces méthodes souffrent de procédures d’estimation relativement lentes, puisque les dépendances sont complexes. Dans cette thèse nous adaptons des stratégies d’estimation incrémentales, développées à l’origine pour l’algorithme EM, aux modèles de graphes. Additionnellement aux données de réseau utilisées dans les méthodes mentionnées ci-dessus, le contenu des noeuds est parfois disponible. Nous proposons ainsi des algorithmes de partitionnement pour les ensembles de données pouvant être modélisés avec une structure de graphe incorporant de l’information au sein des sommets. Finalement,un service Web en ligne, basé sur le moteur de recherche d’ Exalead, permet de promouvoir certains aspects de cette thèse. / He statistical analysis of complex networks is a challenging task, given that appropriate statistical models and efficient computational procedures are required in order for structures to be learned. The principle of these models is to assume that the distribution of the edge values follows a parametric distribution, conditionally on a latent structure which is used to detect connectivity patterns. However, these methods suffer from relatively slow estimation procedures, since dependencies are complex. In this thesis we adapt online estimation strategies, originally developed for the EM algorithm, to the case of graph models. In addition to the network data used in the methods mentioned above, vertex content will sometimes be available. We then propose algorithms for clustering data sets that can be modeled with a graph structure embedding vertex features. Finally, an online Web application, based on the Exalead search engine, allows to promote certain aspects of this thesis.
52

Comparison of blocking and hierarchical ways to find cluster

Kumar, Swapnil January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / William H. Hsu / Clustering in data mining is a process of discovering groups in a set of data such that the similarity within the group is maximized and the similarity among the groups is minimized. One way of approaching clustering is to treat it as a blocking problem of minimizing the maximum distance between any two units within the same group. This method is known as Threshold blocking. It works by applying blocking as a graph partition problem. Chameleon is a hierarchical clustering algorithm, that based on dynamic modelling measures the similarity between two clusters. In the clustering process, to merge two cluster, we check if the inter-connectivity and closeness between two clusters are high relative to the internal inter-connectivity of the clusters and closeness of items within the clusters. This way of merging of cluster using the dynamic model helps in discovery of natural and homogeneous clusters. The main goal of this project is to implement a local implementation of CHAMELEON and compare the output generated from Chameleon against Threshold blocking algorithm suggested by Higgins et al with its hybridized form and unhybridized form.
53

Ethnicity and residential location in Kampala-Mengo (1890-1968)

Sendi, Richard Senteza January 1987 (has links)
No description available.
54

Trajectory Clustering Using a Variation of Fréchet Distance

Vafa, Khoshaein January 2014 (has links)
Location-aware devices are one of the examples of variety of systems that can provide trajectory data. The formal definition of a trajectory is the path of a moving object in space as a function of time. Surveillance systems can now automatically detect moving objects and provide a useful dataset for further analysis. Clustering moving objects in a given scene can provide vital information about the trajectory patterns and outliers. The trajectory of an object may contain extended data at each position where the object was detected such as size, colour, etc. The focus of this work is to find an efficient trajectory clustering solution given the most fundamental trajectory data, namely position and time. The main challenge of clustering trajectory data is to handle the length of a single trajectory. The length of a trajectory can be extremely long in some cases. Hence it may cause problems to keep trajectories in main memory or it may be very inefficient to process them. Preprocessing trajectories and simplifying them will help minimize the effects of such issues. We will use some algorithms taken from literature in conjunction with some of our own algorithms in order to cluster trajectories in an efficient manner. In an attempt to accomplish this, we have designed a representation of a trajectory Furthermore, we have designed and implemented algorithms to simplify and evaluate distances between these trajectories. Moreover, we proved that our distance function obeys triangulation properties which is beneficial for clustering algorithms. Our distance function is a variation of the Fréchet distance proposed in 1906 by Maurice René Fréchet. Additionally, we will illustrate how our work can be integrated with an incremental clustering algorithm to cluster trajectories.
55

Incremental Anomaly Detection Using Two-Layer Cluster-based Structure

Bigdeli, Elnaz January 2016 (has links)
Anomaly detection algorithms face several challenges, including processing speed and dealing with noise in data. In this thesis, a two-layer cluster- based anomaly detection structure is presented which is fast, noise-resilient and incremental. In this structure, each normal pattern is considered as a cluster, and each cluster is represented using a Gaussian Mixture Model (GMM). Then, new instances are presented to the GMM to be labeled as normal or abnormal. The proposed structure comprises three main steps. In the first step, the data are clustered. The second step is to represent each cluster in a way that enables the model to classify new instances. The Summarization based on Gaussian Mixture Model (SGMM) proposed in this thesis represents each cluster as a GMM. In the third step, a two-layer structure efficiently updates clusters using GMM representation while detecting and ignoring redundant instances. A new approach, called Collective Probabilistic Labeling (CPL) is presented to update clusters in a batch mode. This approach makes the updating phase noise-resistant and fast. The collective approach also introduces a new concept called 'rag bag' used to store new instances. The new instances collected in the rag bag are clustered and summarized by GMMs. This enables online systems to identify nearby clusters in the existing and new clusters, and merge them quickly, despite the presence of noise to update the model. An important step in the updating is the merging of new clusters with ex- isting ones. To this end, a new distance measure is proposed, which is a mod- i ed Kullback-Leibler distance between two GMMs. This modi ed distance allows accurate identi cation of nearby clusters. After finding neighboring clusters, they are merged, quickly and accurately. One of the reasons that GMM is chosen to represent clusters is to have a clear and valid mathematical representation for clusters, which eases further cluster analysis. In most real-time anomaly detection applications, incoming instances are often similar to previous ones. In these cases, there is no need to update clusters based on duplicates, since they have already been modeled in the cluster distribution. The two-layer structure is responsible for identifying redundant instances. In this structure, redundant instance are ignored, and the remaining new instances are used to update clusters. Ignoring redundant instances, which are typically in the majority, makes the detection phase fast. Each part of the general structure is validated in this thesis. The experiments include, detection rates, clustering goodness, time, memory usage and the complexity of the algorithms. The accuracy of the clustering and summarization of clusters using GMMs is evaluated, and compared to that of other methods. Using Davies-Bouldin (DB) and Dunn indexes, the distances for original and regenerated clusters using GMMs is almost zero with SGMM method while this value for ABACUS is around 0:01. Moreover, the results show that the SGMM algorithm is 3 times faster than ABACUS in running time, using one-third of the memory used by ABACUS. The CPL method, used to label new instances, is found to collectively remove the effect of noise, while increasing the accuracy of labeling new instances. In a noisy environment, the detection rate of the CPL method is 5% higher than other algorithms such as one-class SVM. The false alarm rate is decreased by 10% on average. Memory use is 20 times lesser that that of the one-class SVM. The proposed method is found to lower the false alarm rate, which is one of the basic problems for the one-class SVM. Experiments show the false alarm rate is decreased from 5% to 15% among different datasets, while the detection rate is increased from 5% to 10% in di erent datasets with two- layer structure. The memory usage for the two-layer structure is 20 to 50 times less than that of one-class SVM. One-class SVM uses support vectors in labeling new instances, while the labeling of the two-layer structure depends on the number of GMMs. The experiments show that the two-layer structure is 20 to 50 times faster than the one-class SVM in labeling new instances. Moreover, the updating time of two-layer structure is 2 to 3 times less than one-layer structure. This reduction is the direct result of ignoring redundant instances and using two-layer structure.
56

The Effect of Item Distance on Organization in the Free Recall of Words

Clay, James H. (James Hamilton) 08 1900 (has links)
The purpose of the present study was to investigate the effect of item distance, which is defined as the absolute number of words separating a single item from the other items of the category, upon clustering of the removed items. By studying clustering, psychologists hope to gain knowledge of the effect of organization on memory.
57

Experimental Analysis of Multiple SDN Controller

Ghimire, Sudip 01 December 2021 (has links)
As technology is moving toward cloud computing and virtualization it has led to the SDN paradigm, which separates the data plane from the control plane and places control at the application layer as opposed to the network layer. SDN provides dynamic and efficient configuration by switching control from to software. In comparison to traditional networks, it has a number of advantages, including lower costs, improved security, greater flexibility, and the prevention of vendor lock-in. As a result, SDN has become one of the essential solutions for replicating, re-policing, and re-configuring large-scale networks with periodic updates such as Data Centres. The most widely used SDN protocol/standard at the moment is OpenFlow, which includes design specifications. By integrating OpenFlow, data centers’ networking by making the network more consistent. A single controller architecture will be inefficient for such more extensive networks; thus recent Research has introduced software-defined with multiple controllers to the of High-availability and tolerance. Furthermore, there are a number of projects that offer SDN architecture, all of which need to be thoroughly analyzed based on their performance under various criteria in order to determine their efficiency. A comparison of the performance of multiple controller SDN architectures versus a single controller SDN architecture is presented in this paper. This study developed and examined the OpenDaylight SDN controller, using the Mininet as a network emulator. We perform a performance evaluation considering average throughput , Topology time, flow setup, table read time, flow deletion time considering different numbers of switch cases using Opendaylight Controller. Packet capturing and analysis under various conditions were performed in the experiment and presented as a graph. Under the high load, the cluster throughput and near to the mode. Further, we implement the Single controller connection for the switches and compare it against the normal all controller connection mode. We found that with a Single Controller connection in Cluster, the average topology discovery time, and flow setup time does improve. As a result, these experiments with SDN networks demonstrate that they can be improved under different network conditions.
58

Hyperbolic Distributions and Transformations for Clustering Incomplete Data with Extensions to Matrix Variate Normality

Pocuca, Nikola January 2023 (has links)
Under realistic scenarios, data are often incomplete, asymmetric, or of high-dimensionality. More intricate data structures often render standard approaches infeasible due to methodological or computational limitations. This monograph consists of four contributions each solving a specific problem within model-based clustering. An R package is developed consisting of a three-phase imputation method for both elliptical and hyperbolic parsimonious models. A novel stochastic technique is employed to speed up computations for hyperbolic distributions demonstrating superior performance overall. A hyperbolic transformation model is conceived for clustering asymmetrical data within a heterogeneous context. Finally, for high-dimensionality, a framework is developed for assessing matrix variate normality within three-way datasets. All things considered, this work constitutes a powerful set of tools to deal with the ever-growing complexity of big data / Dissertation / Doctor of Science (PhD)
59

Enhancements to the Microbial Source Tracking Process Through the Utilization of Clustering and K-nearest Clusters Algorithm

Lai, Tram B 01 March 2018 (has links) (PDF)
Bacterial contamination in water sources is a serious health risk and the sources of the bacterial strains must be identified to keep people safe. This project is the result of a collaboration effort at Cal Poly to develop a new library-dependent Microbial Source Tracking method for determining sources of fecal contamination in the environment. The library used in this study is called Cal Poly Library of Pyroprints (CPLOP). The process of building CPLOP requires students to collect fecal samples from a multitude of sources in the San Luis Obispo area. A novel method developed by the biologists at Cal Poly called pyroprinting is then applied on the two intergenic regions of the E. coli isolates from these samples to obtain their fingerprints. These fingerprints are stored in the CPLOP database. In our study, we consider any E. coli samples whose fingerprints match above a certain threshold to be in the same group of bacterial strain. However, there has not yet been a final MST method that produces an acceptable level of accuracy. In this thesis, we propose a two-step MST classifier that combines two previous works: pyro-DBSCAN and k-RAP. These algorithms were developed specifically for CPLOP. We call our classifier HAP - Hybrid Algorithm for Pyroprints. The classifier works as follows. Given an unknown isolate, the first step requires performing clustering on the known isolates in the library and comparing the unknown isolate against the resulting clusters. If the isolate falls into a cluster, its classification will be returned as the dominant species of that cluster. Otherwise, we apply the k-Nearest Clusters Algorithm on this isolate to determine its final classification. Ultimately, HAP provides us a set of 16 decision strategies that identify the host species of an unknown sample with high accuracy.
60

Clustering Gaussian Processes: A Modified EM Algorithm for Functional Data Analysis with Application to British Columbia Coastal Rainfall Patterns

Paton, Forrest January 2018 (has links)
Functional data analysis is a statistical framework where data are assumed to follow some functional form. This method of analysis is commonly applied to time series data, where time, measured continuously or in discrete intervals, serves as the lo- cation for a function’s value. In this thesis Gaussian processes, a generalization of the multivariate normal distribution to function space, are used. When multiple processes are observed on a comparable interval, clustering them into sub-populations can provide significant insights. A modified EM algorithm is developed for cluster- ing processes. The model presented clusters processes based on how similar their underlying covariance kernel is. In other words, cluster formation arises from modelling correlation between inputs (as opposed to magnitude between process values). The method is applied to both simulated data and British Columbia coastal rainfall patterns. Results show clustering yearly processes can accurately classify extreme weather patterns. / Thesis / Master of Science (MSc)

Page generated in 0.1026 seconds