Global ETD Search

231	Prediction Of Protein Subcellular Localization Based On Primary Sequence Data Ozarar, Mert 01 January 2003 (has links) (PDF) Subcellular localization is crucial for determining the functions of proteins. A system called prediction of protein subcellular localization (P2SL) that predicts the subcellular localization of proteins in eukaryotic organisms based on the amino acid content of primary sequences using amino acid order is designed. The approach for prediction is to nd the most frequent motifs for each protein in a given class based on clustering via self organizing maps and then to use these most frequent motifs as features for classication by the help of multi layer perceptrons. This approach allows a classication independent of the length of the sequence. In addition to these, the use of a new encoding scheme is described for the amino acids that conserves biological function based on point of accepted mutations (PAM) substitution matrix. The statistical test results of the system is presented on a four class problem. P2SL achieves slightly higher prediction accuracy than the similar studies.
232	Identification and application of extract class refactorings in object-oriented systems Fokaefs, Marios-Eleftherios 11 1900 (has links) Software can be considered a live entity, as it undergoes many alterations throughout its lifecycle. Therefore, code can become rather complex and difficult to understand. More specifically in object-oriented systems, classes may become very large and less cohesive. In order to identify such problematic cases, existing approaches have proposed the use of cohesion metrics. While metrics can identify classes with low cohesion, they usually cannot identify new or independent concepts. In this work, we propose a class decomposition method using an clustering algorithm based on the Jaccard distance between class members. The methodology is able to identify new concepts and rank the solutions according to their impact on the design quality of the system. The methodology was evaluated in terms of assessment by designers, expert assessment and metrics. The evaluation showed the ability of the method to identify new recognizable concepts and improve the design quality of the underlying system. refactoring software reengineering object-oriented programming clustering
233	A new normalized EM algorithm for clustering gene expression data Nguyen, Phuong Minh, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2008 (has links) Microarray data clustering represents a basic exploratory tool to find groups of genes exhibiting similar expression patterns or to detect relevant classes of molecular subtypes. Among a wide range of clustering approaches proposed and applied in the gene expression community to analyze microarray data, mixture model-based clustering has received much attention to its sound statistical framework and its flexibility in data modeling. However, clustering algorithms following the model-based framework suffer from two serious drawbacks. The first drawback is that the performance of these algorithms critically depends on the starting values for their iterative clustering procedures. Additionally, they are not capable of working directly with very high dimensional data sets in the sample clustering problem where the dimension of the data is up to hundreds or thousands. The thesis focuses on the two challenges and includes the following contributions: First, the thesis introduces the statistical model of our proposed normalized Expectation Maximization (EM) algorithm followed by its clustering performance analysis on a number of real microarray data sets. The normalized EM is stable even with random initializations for its EM iterative procedure. The stability of the normalized EM is demonstrated through its performance comparison with other related clustering algorithms. Furthermore, the normalized EM is the first mixture model-based clustering approach to be capable of working directly with very high dimensional microarray data sets in the sample clustering problem, where the number of genes is much larger than the number of samples. This advantage of the normalized EM is illustrated through the comparison with the unnormalized EM (The conventional EM algorithm for Gaussian mixture model-based clustering). Besides, for experimental microarray data sets with the availability of class labels of data points, an interesting property of the convergence speed of the normalized EM with respect to the radius of the hypersphere in its corresponding statistical model is uncovered. Second, to support the performance comparison of different clusterings a new internal index is derived using fundamental concepts from information theory. This index allows the comparison of clustering approaches in which the closeness between data points is evaluated by their cosine similarity. The method for deriving this internal index can be utilized to design other new indexes for comparing clustering approaches which employ a common similarity measure. Clustering. Expectation Maximization (EM) algorithm. Microarray data.
234	Design and performance evaluation of a flexible clustering and allocation scheme for parallel processing. Chingchit, Soontorn January 1999 (has links) Parallel processing is an important and popular aspect of computing and has been developed to meet the demands of high-performance computing applications. In terms of hardware, a large number of processors connected with high speed networks are put together to solve large scale computationally intensive applications. The computer performance improvements made so far have been based on technological developments. In terms of software, many algorithms are developed for application problem execution on parallel systems to achieve required performance. Clustering and scheduling of tasks for parallel implementation is a well researched problem. Several techniques have been studied to improve performance and reduce problem execution times. In this thesis, a new clustering and scheduling scheme, called flexible clustering and scheduling (FCS) algorithm is proposed. It is a novel approach where clustering and scheduling of tasks can be tuned to achieve maximal speedup or efficiency. The proposed scheme is based on the relation between the costs of computation and communication of task clusters. Vital system parameters such as processor speed, number of processors, and communication bandwidth affect speedup and efficiency. Processor speed and communication bandwidth vary from system to system. Most clustering and scheduling strategies do not take into account the system parameters. The low complexity FCS algorithm can adapt itself to suit different parallel computing platforms and it can also be tuned to suit bounded or unbounded number of processors. The analytical, simulation and experimental studies presented in this thesis validate the claims.
235	Apriori approach to graph-based clustering of text documents Hossain, Mahmud Shahriar. January 2008 (has links) (PDF) Thesis (MS)--Montana State University--Bozeman, 2008. / Typescript. Chairperson, Graduate Committee: Rafal A. Angryk. Includes bibliographical references (leaves 59-65).
236	Assessing and quantifying clusteredness: The OPTICS Cordillera Rusch, Thomas, Hornik, Kurt, Mair, Patrick 01 1900 (has links) (PDF) Data representations in low dimensions such as results from unsupervised dimensionality reduction methods are often visually interpreted to find clusters of observations. To identify clusters the result must be appreciably clustered. This property of a result may be called "clusteredness". When judged visually, the appreciation of clusteredness is highly subjective. In this paper we suggest an objective way to assess clusteredness in data representations. We provide a definition of clusteredness that captures important aspects of a clustered appearance. We characterize these aspects and define the extremes rigorously. For this characterization of clusteredness we suggest an index to assess the degree of clusteredness, coined the OPTICS Cordillera. It makes only weak assumptions and is a property of the result, invariant for different partitionings or cluster assignments. We provide bounds and a normalization for the index, and prove that it represents the aspects of clusteredness. Our index is parsimonious with respect to mandatory parameters but also exible by allowing optional parameters to be tuned. The index can be used as a descriptive goodness-of-clusteredness statistic or to compare different results. For illustration we use a data set of handwritten digits which are very differently represented in two dimensions by various popular dimensionality reduction results. Empirically, observers had a hard time to visually judge the clusteredness in these representations but our index provides a clear and easy characterisation of the clusteredness of each result. (authors' abstract) / Series: Discussion Paper Series / Center for Empirical Research Methods
237	Srovnání metod vyhodnocujících výsledky shlukování Polcer, Ondřej January 2015 (has links) J. Žižka, O. Polcer: Comparison of methods evaluating results of clustering Diploma thesis, Mendel University in Brno, 2015. This thesis describes in detail data clustering, development of own clustering application, its comparison with the programme Cluto and analysis of results.
238	Improving the Analysis of Complex Networks Using Node-Based Resilience Measures Matta, John 01 August 2018 (has links) This dissertation examines various facets of the analysis of complex networks. In the first part, we study the resilience of networks, by examining various attempts to quantify resilience. Some of the measures studied are vertex attack tolerance, integrity, tenacity, toughness and scattering number. We prove empirically that, although these measures are NP-hard to calculate, they can be approximated to within reasonable amounts by a novel heuristic called Greedy-BC that relies on the graph-theoretic measure betweenness centrality. After verifying the accuracy of Greedy-BC, we test it on several well-known classes of networks: Barabasi-Albert networks, HOTNets and PLODs. Experiments determine that random-degree PLOD nets have the highest resilience, perhaps because of their random nature. The second part concerns clustering. We use the resilience measures and the Greedy-BC heuristic from part 1 to partition graphs. Many experiments are conducted with a generalized algorithm, NBR-Clust, using all discussed resilience measures, and expanding the data to a wide variety of real-life and synthetically generated networks. A parametrized resilience measure beta-VAT is used to detect noise, or outliers, in noisy data. Results are extended to another facet of network analysis -- that of cluster overlap. Attack sets of NBR-Clust are found to contain overlaps with high probability, and an algorithm is developed to identify them. One remaining problem with NBR-Clust is time complexity. The usefulness of the method is limited by the slowness of Greedy-BC, and particularly by the slowness of computing betweenness centrality. In an extensive series of experiments, we test several methods for approximating and speeding betweenness centrality calculations, and are able to reduce the time to cluster a 10,000-node graph from approximately 2 days with the original method of calculation, to a few minutes. In another exploration of the algorithmic aspects of resilience, we attempt to generalize some of the results obtained to hypergraphs. It is found that resilience measures like VAT and algorithms like Greedy-BC transfer well to a hypergraph representation. The final part of the dissertation reviews applications of the new clustering method. First NBR-Clust is used to cluster data on autism spectrum disorders. Because classifications of these disorders are vague, and the data noisy, the clustering properties of NBR-Clust are useful. Results hopefully lead to a better understanding of how to classify autism spectrum disorders. Second, we use NBR-Clust to examine gene assay data with the hope of identifying genes that confer resistance to powdery mildew disease in certain species of grapevines. Clustering Complex Networks Graph Theory Resilience
239	Modelo Assistente para Classificação de Dados Provenientes de Redes Sociais: Um Estudo de Caso com Dados do Twitter BASONI, H. G. 14 April 2015 (has links) Made available in DSpace on 2016-08-29T15:33:21Z (GMT). No. of bitstreams: 1 tese_8868_HENRIQUE-DISSERTAÇÃO-FINAL20150710-134005.pdf: 857469 bytes, checksum: 4f5d70e5670ed471fc2f22a88ae1201e (MD5) Previous issue date: 2015-04-14 / Desde seu surgimento as redes sociais virtuais como Twitter têm alcançado exorbitante quantidade de usuários em todo o mundo, tornando-se um ambiente de imensurável potencial para pesquisas sociais, econômicas, culturais e etc. Cada vez mais pesquisadores têm voltado sua atenção para a grande massa de dados gerada diariamente nesse meio. Entretanto, lidar com grandes quantidades de dados é uma tarefa custosa quando realizada manualmente. O objetivo desta pesquisa é propor um conjunto de ferramentas e metodologia tal que possa diminuir o esforço humano gasto na organização de grandes massas de dados provenientes de redes sociais. Para atingir tal objetivo é proposto um modelo de trabalho iterativo, que explora ao máximo o conhecimento existente em uma pequena porção de dados manualmente analisada por especialistas. O modelo de trabalho combina técnicas de recuperação de informação como algoritmos de classificação e agrupamento com objetivo de tornar o resultado do processo mais parecido ao que o especialista obteria caso o realiza-se completamente manualmente. O modelo proposto foi colocado a prova com uso de dois conjuntos de dados extraídos do Twitter e manualmente classificado muito antes da realização desta pesquisa. Os resultados mostraram-se promissores. Classificação clustering mineração de texto redes sociais
240	Neighborhood Socio-spatial Organization at Calixtlahuaca, Mexico January 2015 (has links) abstract: This dissertation research examines neighborhood socio-spatial organization at Calixtlahuaca, a Postclassic (1100-1520 AD) urban center in highland Mesoamerica. Neighborhoods are small spatial units where residents interact at a face to face level in the process of daily activities. How were Calixtlahuaca's neighborhoods organized socio-spatially? Were they homogenous or did each neighborhood contain a mixture of different social and economic groups? Calixtlahuaca was a large Aztec-period city-state located in the frontier region between the Tarascan and Triple Alliance empires. As the capital of the Maltazinco polity, administrative, ritual, and economic activities were located here. Four languages, Matlazinca, Mazahua, Otomi, and Nahua, were spoken by the city's inhabitants. The combination of political geography and an unusual urban center provides an opportunity for examining complex neighborhood socio-spatial organization in a Mesoamerican setting. The evidence presented in this dissertation shows that Calixtlahuaca's neighborhoods were socially heterogeneous spaces were residents from multiple social groups and classes coexisted. This further suggests that the cross-cutting ties between neighborhood residents had more impact on influencing certain economic choices than close proximity in residential location. Market areas were the one way that the city was clearly divided spatially into two regions but consumer preferences within the confines of economic resources were similar in both regions. This research employs artifact collections recovered during the Calixtlahuaca Archaeological Project surface survey. The consumption practices of the residents of Calixtlahuaca are used to define membership into several social groups in order to determine the socio-spatial pattern of the city. Economic aspects of city life are examined through the identification of separate market areas that relate to neighborhood patterns. Excavation data was also examined as an alternate line of evidence for each case. The project contributes to the sparse literature on preindustrial urban neighborhoods. Research into social segregation or social clustering in modern cities is plentiful, but few studies examine the patterns of social clustering in the past. Most research in Mesoamerica focuses on the clustering of social class. / Dissertation/Thesis / Doctoral Dissertation Anthropology 2015 Archaeology markets Mesoamerica social clustering urbanism

Search results