• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1358
  • 364
  • 187
  • 127
  • 69
  • 39
  • 37
  • 33
  • 26
  • 25
  • 22
  • 21
  • 19
  • 12
  • 9
  • Tagged with
  • 2717
  • 614
  • 530
  • 432
  • 402
  • 339
  • 287
  • 283
  • 280
  • 247
  • 241
  • 209
  • 208
  • 206
  • 194
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
231

Fuzzy logic-based digital soil mapping in the Laurel Creek Conservation Area, Waterloo, Ontario

Ren, Que January 2012 (has links)
The aim of this thesis was to examine environmental covariate-related issues, the resolution dependency, the contribution of vegetation covariates, and the use of LiDAR data, in the purposive sampling design for fuzzy logic-based digital soil mapping. In this design fuzzy c-means (FCM) clustering of environmental covariates was employed to determine proper sampling sites and assist soil survey and inference. Two subsets of the Laurel Creek Conservation area were examined for the purposes of exploring the resolution and vegetation issues, respectively. Both conventional and LiDAR-derived digital elevation models (DEMs) were used to derive terrain covariates, and a vegetation index calculated from remotely sensed data was employed as a vegetation covariate. A basic field survey was conducted in the study area. A validation experiment was performed in another area. The results show that the choices of optimal numbers of clusters shift with resolution aggregated, which leads to the variations in the optimal partition of environmental covariates space and the purposive sampling design. Combining vegetation covariates with terrain covariates produces different results from the use of only terrain covariates. The level of resolution dependency and the influence of adding vegetation covariates vary with DEM source. This study suggests that DEM resolution, vegetation, and DEM source bear significance to the purposive sampling design for fuzzy logic-based digital soil mapping. The interpretation of fuzzy membership values at sampled sites also indicates the associations between fuzzy clusters and soil series, which lends promise to the applicability of fuzzy logic-based digital soil mapping in areas where fieldwork and data are limited.
232

Prediction Of Protein Subcellular Localization Based On Primary Sequence Data

Ozarar, Mert 01 January 2003 (has links) (PDF)
Subcellular localization is crucial for determining the functions of proteins. A system called prediction of protein subcellular localization (P2SL) that predicts the subcellular localization of proteins in eukaryotic organisms based on the amino acid content of primary sequences using amino acid order is designed. The approach for prediction is to nd the most frequent motifs for each protein in a given class based on clustering via self organizing maps and then to use these most frequent motifs as features for classication by the help of multi layer perceptrons. This approach allows a classication independent of the length of the sequence. In addition to these, the use of a new encoding scheme is described for the amino acids that conserves biological function based on point of accepted mutations (PAM) substitution matrix. The statistical test results of the system is presented on a four class problem. P2SL achieves slightly higher prediction accuracy than the similar studies.
233

Identification and application of extract class refactorings in object-oriented systems

Fokaefs, Marios-Eleftherios 11 1900 (has links)
Software can be considered a live entity, as it undergoes many alterations throughout its lifecycle. Therefore, code can become rather complex and difficult to understand. More specifically in object-oriented systems, classes may become very large and less cohesive. In order to identify such problematic cases, existing approaches have proposed the use of cohesion metrics. While metrics can identify classes with low cohesion, they usually cannot identify new or independent concepts. In this work, we propose a class decomposition method using an clustering algorithm based on the Jaccard distance between class members. The methodology is able to identify new concepts and rank the solutions according to their impact on the design quality of the system. The methodology was evaluated in terms of assessment by designers, expert assessment and metrics. The evaluation showed the ability of the method to identify new recognizable concepts and improve the design quality of the underlying system.
234

A new normalized EM algorithm for clustering gene expression data

Nguyen, Phuong Minh, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2008 (has links)
Microarray data clustering represents a basic exploratory tool to find groups of genes exhibiting similar expression patterns or to detect relevant classes of molecular subtypes. Among a wide range of clustering approaches proposed and applied in the gene expression community to analyze microarray data, mixture model-based clustering has received much attention to its sound statistical framework and its flexibility in data modeling. However, clustering algorithms following the model-based framework suffer from two serious drawbacks. The first drawback is that the performance of these algorithms critically depends on the starting values for their iterative clustering procedures. Additionally, they are not capable of working directly with very high dimensional data sets in the sample clustering problem where the dimension of the data is up to hundreds or thousands. The thesis focuses on the two challenges and includes the following contributions: First, the thesis introduces the statistical model of our proposed normalized Expectation Maximization (EM) algorithm followed by its clustering performance analysis on a number of real microarray data sets. The normalized EM is stable even with random initializations for its EM iterative procedure. The stability of the normalized EM is demonstrated through its performance comparison with other related clustering algorithms. Furthermore, the normalized EM is the first mixture model-based clustering approach to be capable of working directly with very high dimensional microarray data sets in the sample clustering problem, where the number of genes is much larger than the number of samples. This advantage of the normalized EM is illustrated through the comparison with the unnormalized EM (The conventional EM algorithm for Gaussian mixture model-based clustering). Besides, for experimental microarray data sets with the availability of class labels of data points, an interesting property of the convergence speed of the normalized EM with respect to the radius of the hypersphere in its corresponding statistical model is uncovered. Second, to support the performance comparison of different clusterings a new internal index is derived using fundamental concepts from information theory. This index allows the comparison of clustering approaches in which the closeness between data points is evaluated by their cosine similarity. The method for deriving this internal index can be utilized to design other new indexes for comparing clustering approaches which employ a common similarity measure.
235

Design and performance evaluation of a flexible clustering and allocation scheme for parallel processing.

Chingchit, Soontorn January 1999 (has links)
Parallel processing is an important and popular aspect of computing and has been developed to meet the demands of high-performance computing applications. In terms of hardware, a large number of processors connected with high speed networks are put together to solve large scale computationally intensive applications. The computer performance improvements made so far have been based on technological developments. In terms of software, many algorithms are developed for application problem execution on parallel systems to achieve required performance. Clustering and scheduling of tasks for parallel implementation is a well researched problem. Several techniques have been studied to improve performance and reduce problem execution times. In this thesis, a new clustering and scheduling scheme, called flexible clustering and scheduling (FCS) algorithm is proposed. It is a novel approach where clustering and scheduling of tasks can be tuned to achieve maximal speedup or efficiency. The proposed scheme is based on the relation between the costs of computation and communication of task clusters. Vital system parameters such as processor speed, number of processors, and communication bandwidth affect speedup and efficiency. Processor speed and communication bandwidth vary from system to system. Most clustering and scheduling strategies do not take into account the system parameters. The low complexity FCS algorithm can adapt itself to suit different parallel computing platforms and it can also be tuned to suit bounded or unbounded number of processors. The analytical, simulation and experimental studies presented in this thesis validate the claims.
236

Apriori approach to graph-based clustering of text documents

Hossain, Mahmud Shahriar. January 2008 (has links) (PDF)
Thesis (MS)--Montana State University--Bozeman, 2008. / Typescript. Chairperson, Graduate Committee: Rafal A. Angryk. Includes bibliographical references (leaves 59-65).
237

Assessing and quantifying clusteredness: The OPTICS Cordillera

Rusch, Thomas, Hornik, Kurt, Mair, Patrick 01 1900 (has links) (PDF)
Data representations in low dimensions such as results from unsupervised dimensionality reduction methods are often visually interpreted to find clusters of observations. To identify clusters the result must be appreciably clustered. This property of a result may be called "clusteredness". When judged visually, the appreciation of clusteredness is highly subjective. In this paper we suggest an objective way to assess clusteredness in data representations. We provide a definition of clusteredness that captures important aspects of a clustered appearance. We characterize these aspects and define the extremes rigorously. For this characterization of clusteredness we suggest an index to assess the degree of clusteredness, coined the OPTICS Cordillera. It makes only weak assumptions and is a property of the result, invariant for different partitionings or cluster assignments. We provide bounds and a normalization for the index, and prove that it represents the aspects of clusteredness. Our index is parsimonious with respect to mandatory parameters but also exible by allowing optional parameters to be tuned. The index can be used as a descriptive goodness-of-clusteredness statistic or to compare different results. For illustration we use a data set of handwritten digits which are very differently represented in two dimensions by various popular dimensionality reduction results. Empirically, observers had a hard time to visually judge the clusteredness in these representations but our index provides a clear and easy characterisation of the clusteredness of each result. (authors' abstract) / Series: Discussion Paper Series / Center for Empirical Research Methods
238

Srovnání metod vyhodnocujících výsledky shlukování

Polcer, Ondřej January 2015 (has links)
J. Žižka, O. Polcer: Comparison of methods evaluating results of clustering Diploma thesis, Mendel University in Brno, 2015. This thesis describes in detail data clustering, development of own clustering application, its comparison with the programme Cluto and analysis of results.
239

Improving the Analysis of Complex Networks Using Node-Based Resilience Measures

Matta, John 01 August 2018 (has links)
This dissertation examines various facets of the analysis of complex networks. In the first part, we study the resilience of networks, by examining various attempts to quantify resilience. Some of the measures studied are vertex attack tolerance, integrity, tenacity, toughness and scattering number. We prove empirically that, although these measures are NP-hard to calculate, they can be approximated to within reasonable amounts by a novel heuristic called Greedy-BC that relies on the graph-theoretic measure betweenness centrality. After verifying the accuracy of Greedy-BC, we test it on several well-known classes of networks: Barabasi-Albert networks, HOTNets and PLODs. Experiments determine that random-degree PLOD nets have the highest resilience, perhaps because of their random nature. The second part concerns clustering. We use the resilience measures and the Greedy-BC heuristic from part 1 to partition graphs. Many experiments are conducted with a generalized algorithm, NBR-Clust, using all discussed resilience measures, and expanding the data to a wide variety of real-life and synthetically generated networks. A parametrized resilience measure beta-VAT is used to detect noise, or outliers, in noisy data. Results are extended to another facet of network analysis -- that of cluster overlap. Attack sets of NBR-Clust are found to contain overlaps with high probability, and an algorithm is developed to identify them. One remaining problem with NBR-Clust is time complexity. The usefulness of the method is limited by the slowness of Greedy-BC, and particularly by the slowness of computing betweenness centrality. In an extensive series of experiments, we test several methods for approximating and speeding betweenness centrality calculations, and are able to reduce the time to cluster a 10,000-node graph from approximately 2 days with the original method of calculation, to a few minutes. In another exploration of the algorithmic aspects of resilience, we attempt to generalize some of the results obtained to hypergraphs. It is found that resilience measures like VAT and algorithms like Greedy-BC transfer well to a hypergraph representation. The final part of the dissertation reviews applications of the new clustering method. First NBR-Clust is used to cluster data on autism spectrum disorders. Because classifications of these disorders are vague, and the data noisy, the clustering properties of NBR-Clust are useful. Results hopefully lead to a better understanding of how to classify autism spectrum disorders. Second, we use NBR-Clust to examine gene assay data with the hope of identifying genes that confer resistance to powdery mildew disease in certain species of grapevines.
240

Modelo Assistente para Classificação de Dados Provenientes de Redes Sociais: Um Estudo de Caso com Dados do Twitter

BASONI, H. G. 14 April 2015 (has links)
Made available in DSpace on 2016-08-29T15:33:21Z (GMT). No. of bitstreams: 1 tese_8868_HENRIQUE-DISSERTAÇÃO-FINAL20150710-134005.pdf: 857469 bytes, checksum: 4f5d70e5670ed471fc2f22a88ae1201e (MD5) Previous issue date: 2015-04-14 / Desde seu surgimento as redes sociais virtuais como Twitter têm alcançado exorbitante quantidade de usuários em todo o mundo, tornando-se um ambiente de imensurável potencial para pesquisas sociais, econômicas, culturais e etc. Cada vez mais pesquisadores têm voltado sua atenção para a grande massa de dados gerada diariamente nesse meio. Entretanto, lidar com grandes quantidades de dados é uma tarefa custosa quando realizada manualmente. O objetivo desta pesquisa é propor um conjunto de ferramentas e metodologia tal que possa diminuir o esforço humano gasto na organização de grandes massas de dados provenientes de redes sociais. Para atingir tal objetivo é proposto um modelo de trabalho iterativo, que explora ao máximo o conhecimento existente em uma pequena porção de dados manualmente analisada por especialistas. O modelo de trabalho combina técnicas de recuperação de informação como algoritmos de classificação e agrupamento com objetivo de tornar o resultado do processo mais parecido ao que o especialista obteria caso o realiza-se completamente manualmente. O modelo proposto foi colocado a prova com uso de dois conjuntos de dados extraídos do Twitter e manualmente classificado muito antes da realização desta pesquisa. Os resultados mostraram-se promissores.

Page generated in 0.0867 seconds