Global ETD Search

201	Methods of Determining the Number of Clusters in a Data Set and a New Clustering Criterion Yan, Mingjin 29 December 2005 (has links) In cluster analysis, a fundamental problem is to determine the best estimate of the number of clusters, which has a deterministic effect on the clustering results. However, a limitation in current applications is that no convincingly acceptable solution to the best-number-of-clusters problem is available due to high complexity of real data sets. In this dissertation, we tackle this problem of estimating the number of clusters, which is particularly oriented at processing very complicated data which may contain multiple types of cluster structure. Two new methods of choosing the number of clusters are proposed which have been shown empirically to be highly effective given clear and distinct cluster structure in a data set. In addition, we propose a sequential type of clustering approach, called multi-layer clustering, by combining these two methods. Multi-layer clustering not only functions as an efficient method of estimating the number of clusters, but also, by superimposing a sequential idea, improves the flexibility and effectiveness of any arbitrary existing one-layer clustering method. Empirical studies have shown that multi-layer clustering has higher efficiency than one layer clustering approaches, especially in detecting clusters in complicated data sets. The multi-layer clustering approach has been successfully implemented in clustering the WTCHP microarray data and the results can be interpreted very well based on known biological knowledge. Choosing an appropriate clustering method is another critical step in clustering. K-means clustering is one of the most popular clustering techniques used in practice. However, the k-means method tends to generate clusters containing a nearly equal number of objects, which is referred to as the ``equal-size'' problem. We propose a clustering method which competes with the k-means method. Our newly defined method is aimed at overcoming the so-called ``equal-size'' problem associated with the k-means method, while maintaining its advantage of computational simplicity. Advantages of the proposed method over k-means clustering have been demonstrated empirically using simulated data with low dimensionality. / Ph. D. Gap statistic Multi-layer clustering DD-weighted gap statistic Cluster analysis Weighted gap statistic Number of clusters K-means clustering
202	A Radial Basis Function Approach to a Color Image Classification Problem in a Real Time Industrial Application Sahin, Ferat 27 June 1997 (has links) In this thesis, we introduce a radial basis function network approach to solve a color image classification problem in a real time industrial application. Radial basis function networks are employed to classify the images of finished wooden parts in terms of their color and species. Other classification methods are also examined in this work. The minimum distance classifiers are presented since they have been employed by the previous research. We give brief definitions about color space, color texture, color quantization, color classification methods. We also give an intensive review of radial basis functions, regularization theory, regularized radial basis function networks, and generalized radial basis function networks. The centers of the radial basis functions are calculated by the k-means clustering algorithm. We examine the k-means algorithm in terms of starting criteria, the movement rule, and the updating rule. The dilations of the radial basis functions are calculated using a statistical method. Learning classifier systems are also employed to solve the same classification problem. Learning classifier systems learn the training samples completely whereas they are not successful to classify the test samples. Finally, we present some simulation results for both radial basis function network method and learning classifier systems method. A comparison is given between the results of each method. The results show that the best classification method examined in this work is the radial basis function network method. / Master of Science image processing Radial basis fucntions radial basis function networks k-means algorithm pattern recognition color quantization learning classifier systems
203	Analysis of the visitors' profile of the islands Ilha do Superagüi e Ilha do Mel - Marketing as an instrument for sustainable tourism Niefer, Inge Andrea 05 1900 (has links) The objectives of this work were to analyze and to compare the visitors of the immediate surroundings of two protected areas in the State of Paraná: the National Park of Superagüi and the Ecological Station “Ilha do Mel”, both islands. There was applied a questionnaire with 37 qualitative and quantitative questions. The questionnaire consisted of five parts: sociodemographic characteristics; trip characteristics; environmental conscience and attitudes; favorite activities and motivation; and perception of the destiny. The data were collected through personal interviews that in the average took from 20 to 30 minutes. 327 questionnaires were applied in Superagüi; in the period of December of 1998 to May of 2000, and 392 on the Ilha do Mel, in the period of April of 2000 to June of 2000. There are significant differences among the visitors of the two islands, this practically in all the researched characteristics. The public of the Ilha do Mel is significantly younger, what influences in several other variables, such as: civil status; education degree; and employment situation. 84% of the visitors of Ilha do Mel heard about it through friends/family, while in Superagüi only 67%. Ilha do Mel, for being a tourist destiny already for a longer time and the easy access, receives a larger number of people with repeated visits. Tourism was trip objective to a larger portion of the visitors of Ilha do Mel; in compensation they were observed significantly more researchers in Superagüi. Visitors’ environmental conscience can be considered high on both islands, but the one of the visitors of Ilha do Mel was inferior to Superagüi. Fewer respondents knew that the place they visited is a protected area. The value of the entrance fee that they are willing to pay was significantly smaller, as well as the disposition to follow the rules in favor of the conservation of nature. The interest in social and environmental subjects was significantly higher among the visitors of Superagüi. They were also willing to pay more for the use of environmental sane techniques than the respondents on Ilha do Mel. The interest in practicing the 25 tourist activities was significantly different between the two places. The comparison of the visitors’ attitude towards to problems showed that a part of the interviewees in Superagüi is much less inconvenienced with problems linked to the infrastructure that reduce the comfort during the stay, confirmed this fact by the smaller importance they give to items of tourist infrastructure. Among the visitors of Superagüi there was an accentuated concern with the improvement of the quality of the host community's life, fact not noticed on Ilha do Mel. In terms of motivation, it was shown that the visitors of Superagüi have larger appreciation to the natural and cultural values and the escape of the stress of the city than the ones of Ilha do Mel. There was also accomplished a benefit segmentation, showing that it is possible to identify distinct segments among the visitors of the same place. In Superagüi they were identified the following clusters: 1) the indifferent ones; 2) the non-sociable adventurers; 3) the sociable adventurers; 4) the enthusiasts; and 5) the non-sociable naturalists. On Ilha do Mel there were identified five different clusters: 1) the sociable adventurers; 2) the pure naturalists; 3) the enthusiasts; 4) the indifferent ones; and 5) the cultural naturalists. ecotourism sustainable tourism visitors’ profile protected areas islands conservation green marketing benefit segmentation k-means clustering GV
204	Unsupervised Learning Trojan Geigel, Arturo 04 November 2014 (has links) This work presents a proof of concept of an Unsupervised Learning Trojan. The Unsupervised Learning Trojan presents new challenges over previous work on the Neural network Trojan, since the attacker does not control most of the environment. The current work will presented an analysis of how the attack can be successful by proposing new assumptions under which the attack can become a viable one. A general analysis of how the compromise can be theoretically supported is presented, providing enough background for practical implementation development. The analysis was carried out using 3 selected algorithms that can cover a wide variety of circumstances of unsupervised learning. A selection of 4 encoding schemes on 4 datasets were chosen to represent actual scenarios under which the Trojan compromise might be targeted. A detailed procedure is presented to demonstrate the attack's viability under assumed circumstances. Two tests of hypothesis concerning the experimental setup were carried out which yielded acceptance of the null hypothesis. Further discussion is contemplated on various aspects of actual implementation issues and real world scenarios where this attack might be contemplated. Artificial Intelligence Adaptive Resonance Theory Computer Security K-means Neural Networks Trojan Unsupervsied Learning Artificial Intelligence and Robotics Computer Sciences Computer Security
205	Optimisation multi-objectives d’une infrastructure réseau dédiée aux bâtiments intelligents / Multi-objective optimization of a network infrastructure dedicated to smart buildings Benatia, Mohamed Amin 13 December 2016 (has links) Au cours de cette thèse, nous avons étudié le problème de déploiement des Réseaux de Capteurs Sans-Fil (RCSF) pour des applications indoor tel que le bâtiment intelligent. Le but de notre travail était de développer un outil de déploiement capable d'assister les concepteurs de RCSF lors de la phase de déploiement de ces derniers. Nous avons commencé cette thèse par la modélisation de tous les paramètres qui interviennent lors du déploiement des RCSF, à savoir : coût, connectivité, couverture et durée de vie. Par la suite, nous avons implémenté cinq algorithmes d'optimisation, dont trois multi-objectifs afin de résoudre le problème de déploiement. Deux cas d'études réelles (grande et petite instance) ont été identifiés afin de tester ces algorithmes. Les résultats obtenus ont montré que ces algorithmes sont efficaces quand il s'agit d'un petit bâtiment (petit espace). Par contre, dès que la surface du bâtiment augmente les performances des algorithmes étudiés se dégradent. Pour répondre à cela, nous avons développé et implémenté un algorithme d'optimisation multi-objectifs hybride. Cet algorithme se base sur des notions de clustering et d'analyse de données afin de limiter le nombre d'évaluations directes qu'entreprennent ces méthodes pendant chaque itération. Afin d'assurer cette limitation d'évaluation les fonctions de fitness sont approximées grâce aux réseaux de neurones et l'algorithme de classification K-means. Les résultats obtenus ont montré une très bonne performance sur les deux instances de tailles différentes. Ces résultats ont été comparés à ceux obtenus avec les méthodes classiques utilisées et sont compétitives et prometteuses. / In this thesis, we studied the Wireless Sensor Network deployment for indoor environments with a focus on smart building application. The goal of our work was to develop a WSN deployment tool which is able to assist network designers in the deployment phase. We begin this thesis with network modeling of all the deployment parameters and requirement, such as : cost, coverage, connectivity and network lifetime. Thereafter, we implement five optimisation methods, including three multi-objective optimization agorithms, to resolve WSN deployment problem. Then, two realistics study cases were identified to test the performances of the aforementioned algorithms. The obtained results shows that these algorithms are very efficient for deploying a small scale network in small buildings. However, when the building surface becomes more important the algorithms tends to converge to local optimum while consuming high processing time. To resolve this problem, we develop and implement a new Hybrid multi-objectif optimization algorithm wich limits the number of direct evaluation. This algorithm is based on data-mining methods (Artificial Neural Networks and K-means) and tries to approximate the fitness value of each individual in each generation. At every generation of the algorithm, the population is divided to K clusters and we evaluate only the closest individual to cluster centroide. The fitness value of the rest of population is approximated using a trained ANN. A comparative study was made and the obtained results show that our method outperformes others in the two sudy cases (small and big buildings). Réseaux de capteurs sans-fil RCSF Algorithme de classification K-means Wireless sensor network Building application Artificial neural networks
206	Alternativas para seleção de touros da raça Nelore considerando características múltiplas de interesse econômico / Alternatives for election of bulls of the nelore race considering characteristic multiple of economic interest Val, José Eduardo do 25 May 2006 (has links) Este estudo foi desenvolvido a partir de informações das avaliações genéticas de touros pertencentes a rebanhos participantes do Programa de Melhoramento Genético da Raça Nelore (PMGRN-Nelore Brasil), que desenvolve, desde 1995, um teste de progênie denominado Reprodução Programada (RP), o qual tem como finalidade principal de disponibilizar animais com valores genéticos mais confiáveis no mercado de reprodutores. Assim, as Diferenças Esperadas nas Progênies (DEPs) de 234 touros participantes da RP no período de 1996 a 2003 foram analisadas com os seguintes objetivos: 1- Avaliar o mérito genético dos touros ao longo dos anos, utilizando regressão linear entre a DEP e ano de participação do touro na RP para as características, peso aos 120 e 210 dias, efeitos direto e materno (DDPP120, DDPP210, DMPP120 e DMPP210); peso e perímetro escrotal aos 365 e 450 dias, efeito direto (DDP365, DDP450, DDPE365 e DDPE450) e idade ao primeiro parto (DDIPP); 2- Identificar, por meio de abordagens multivariadas, grupos de animais cujas DEPs apresentem padrões de semelhança, assim como discriminar as variáveis que mais influenciam na divisão dos grupos, numa tentativa de auxiliar a tomada de decisão nos sistemas de produção de bovinos de corte, com vistas a maximizar a produtividade. Os procedimentos multivariados de análises de agrupamento e componentes principais foram aplicados às DEPs de sete características (DMPP120, DMPP210, DDPP365, DDPP450, DDPE365, DDPE450 e DDIPP). As análises foram processadas com o auxílio do software Statistica (STATSOFT, 2004). As tendências genéticas das DEPs relacionadas com as características de fertilidade, DDPE365, DDPE450 e DDIPP, mostraram progressos genéticos de 0,051 e 0,061 cm e -0,026 mês por ano respectivamente, enquanto que DDPP450 foi à característica que obteve maior ganho genético dentre as DEPs de crescimento, 1,467 kg/ano. Com referência às abordagens multivariadas, a análise de agrupamento k-médias foi aplicada e o resultado envolvendo três grupos foi o melhor obtido, dos quais dois se destacaram quanto aos valores médios das DEPs. A importância desses dois grupos de touros foi confirmada pela análise de componentes principais que associou a eles valores superiores de DEPs diretas de peso e perímetro escrotal. A quantidade de variabilidade original retida pelos dois primeiros componentes principais foi de 70,22%. Foram observados progressos genéticos nos touros da Reprodução Programada para todas as características durante o período estudado, indicando que a estratégia de seleção praticada vem sendo efetiva e evidenciando a importância da contribuição dos touros da RP para o melhoramento das características reprodutivas e de crescimento da raça Nelore. Neste estudo pode-se verificar o poder classificatório e discriminatório das análises de agrupamentos e componentes principais, o que muito pode contribuir na classificação de touros, facilitando a seleção de animais em Programas de Melhoramento Genético. / This research was developed with genetic information of sires that belong to herds of the ?Programa de Melhoramento Genético da Raça Nelore? (PMGRN-Nelore Brasil), witch has been carried on, since 1995, a progeny test denominated ?Reprodução Programada? (RP), whose the main aim is to obtain reliable genetic values for sires market. Therefore, the Expected Progeny Difference (EPD) of 243 sires taking part of the RP from 1996 to 2003 were used with the following objectives of: 1- Evaluating the genetic merit over the years applying linear regression between the EPD and the year of the sires RP participation, for the following traits: weight at 120 and 210 days of age, direct and maternal effects (DDPP120, DDPP210, DMPP120 and DMPP210) weight and scrotal circumference at 365 and 450 days of age, direct effect (DDPP365, DDPP450, DDPE365 and DDPE450) and age at first calving (DDIPP); 2- Identifying groups of animals, whose, EPDs show similarity patterns, as well as, verifying which were the variable that showed greater power in discriminating group formations, trying to help the decisions making support in the beef cattle production system by multivariate approaches, in order to maximizing the productivity. The multivariate procedures of clusters analysis and principal components were applied in the EPDs from seven traits (DMPP120, DMPP210, DDPP365, DDPP450, DDPE365, DDPE450 and DDIPP). The analyses were performed by software Statistica (STATSOFT, 2004). The genetic trends of the EPD related to the fertility traits, DDDPE365, DDPE450 e DDIPP, showed some genetic progress of 0.051 and 0.061 cm and ? 0.026 month per year respectively, while, the DDPP450 was the trait that obtained the highest genetic gain in the growth EPDs, 1.467 kg/year. About the multivariate approaches, the k-means clustering analysis was applied and the results of three groups formation were the best option, two of them stood out in relation to values of the EPDs means. The importance of these two groups was confirmed by the analyses of principal components that associate the direct EPDs of weight and scrotal circumference values to them. The quantity of original variability kept in the first main components was 70.22%. It was observed genetic progress in the RP sires for every trait during the studied period, indicating that the selection has been effective and evidencing how important the contribution of the RP sires for the reproductive and growth traits for the Nelore breed improvement is. In this research, the classificatory and discriminatory power of cluster analyses and principal components could be verify, and certainly could contribute in the sire classification, helping the selection in the Animal Breeding Program. análise de agrupamento características de crescimento características de fertilidade cluster analysis componente principais fertility traits genetic trend growth traits k-means. k-médias principal components tendência genética
207	以雲端運算之概念建構資料採礦中關聯規則與集群分析系統 / Construct a concept of cloud computing and data mining system with association rules and clustering analysis 賴建佑 Unknown Date (has links) 雲端運算和資料採礦已成為這二十一世紀的重要發展方向，綜觀現今各個生活層面，已漸漸的融合雲端計算的技術，故結合雲端運算已是一種趨勢。簡而談之，雲端運算是一種讓使用者更加地快速、便利又省成本的一種技術。而資料採礦方面，也已從先前的專門挖掘數字型態的資料，到現在多元的挖掘，像是文字、圖像採礦。資料採礦雖然比雲端運算發展的早，但是其功用是可以相輔相成的，有鑑於此，本研究係要發展出一資料採礦分析系統，使得使用者方便又簡易的操作。並針對特定的資料採礦分析方法-關聯規則及集群分析去研究，並利用Apriori 演算法及K-means方法，和Microsoft Excel VBA和R軟體共同結合出此資料採礦系統。雲端運算資料採礦關聯規則 Apriori演算法集群分析 K-means
208	Structural Modeling and Analysis of Structures in Aorta Images Xu, Hai 2011 August 1900 (has links) Morphology change analysis of aorta images acquired from biological experiments plays a critical role in exploring the relationship between lamina thickness (LT), interlamellar distance (ILD) and fragmentation (furcation points) with respect to pathological conditions. An automated software tool now is available to extract elastic laminae (EL) and measure LT, ILD and fragmentation along their ridge lines in a fine detailed aspect. A statistical randomized complete block design (RCBD) and F-test were used to assess potential (non)-uniformity of LT and ILD along both radial and circumferential directions. Illustrative results for both normotensive and hypertensive thoracic porcine aorta revealed marked heterogeneity along the radial direction in nearly stress-free samples. Quantifying furcation point densities were also found that can offer new information about potential elastin fragmentation, particularly in response to increased loading due to hypertension. Furthermore, when biological scientists analyze the elastic lamina structure, how to automatically generate a macro-level geometric parameter mapping might greatly help them understand the over-all morphology changes of blood vessel cross section. In this dissertation, another automated system is designed to quickly locate more pronounced EL branches to construct layer level abstraction of LT/ILD measurements and transform the sparse pixel level information to dense normalized Virtual Layer Matrix (VLM). The system can automatically compute the EL orientations, identify pronounced ELs, transform the denoised LT measurement points onto a VLM and then provide statistics/segmentation analysis. By applying the k-means segmentation technique to VLMs of LT-ILD, one can easily delineate regions of normal vs. hypertrophic and/or hyperplasia LT-ILD measurements for cross-image references. Vascular Elastin Automated Histology Quantitative Pathology Discrete Radon Transform Furcation Point Analysis Randomized Complete Block Design Virtual Layer Matrix K-Means Hypertension Marfan Syndrome Aging
209	A Document Similarity Measure and Its Applications Gan, Zih-Dian 07 September 2011 (has links) In this paper, we propose a novel similarity measure for document data processing and apply it to text classification and clustering. For two documents, the proposed measure takes three cases into account: (a) The feature considered appears in both documents, (b) the feature considered appears in only one document, and (c) the feature considered appears in none of the documents. For the first case, we give a lower bound and decrease the similarity according to the difference between the feature values of the two documents. For the second case, we give a fixed value disregarding the magnitude of the feature value. For the last case, we ignore its effectiveness. We apply it to the similarity based single-label classifier k-NN and multi-label classifier ML-KNN, and adopt these properties to measure the similarity between a document and a specific set for document clustering, i.e., k-means like algorithm, to compare the effectiveness with other measures. Experimental results show that our proposed method can work more effectively than others. k-means document similarity Similarity measure BEP F1 single-label multi-label Accuracy text classification Entropy document clustering k-NN ML-KNN
210	Representation Of Covariance Matrices In Track Fusion Problems Gunay, Melih 01 November 2007 (has links) (PDF) Covariance Matrix in target tracking algorithms has a critical role at multi- sensor track fusion systems. This matrix reveals the uncertainty of state es- timates that are obtained from diferent sensors. So, many subproblems of track fusion usually utilize this matrix to get more accurate results. That is why this matrix should be interchanged between the nodes of the multi-sensor tracking system. This thesis mainly deals with analysis of approximations of the covariance matrix that can best represent this matrix in order to efectively transmit this matrix to the demanding site. Kullback-Leibler (KL) Distance is exploited to derive some of the representations for Gaussian case. Also com- parison of these representations is another objective of this work and this is based on the fusion performance of the representations and the performance is measured for a system of a 2-radar track fusion system.

Search results