Global ETD Search

631	Clustering, Classification, and Factor Analysis in High Dimensional Data Analysis Wang, Yanhong 17 December 2013 (has links) Clustering, classification, and factor analysis are three popular data mining techniques. In this dissertation, we investigate these methods in high dimensional data analysis. Since there are much more features than the sample sizes and most of the features are non-informative in high dimensional data, dimension reduction is necessary before clustering or classification can be made. In the first part of this dissertation, we reinvestigate an existing clustering procedure, optimal discriminant clustering (ODC; Zhang and Dai, 2009), and propose to use cross-validation to select the tuning parameter. Then we develop a variation of ODC, sparse optimal discriminant clustering (SODC) for high dimensional data, by adding a group-lasso type of penalty to ODC. We also demonstrate that both ODC and SDOC can be used as a dimension reduction tool for data visualization in cluster analysis. In the second part, three existing sparse principal component analysis (SPCA) methods, Lasso-PCA (L-PCA), Alternative Lasso PCA (AL-PCA), and sparse principal component analysis by choice of norm (SPCABP) are applied to a real data set the International HapMap Project for AIM selection to genome-wide SNP data, the classification accuracy is compared for them and it is demonstrated that SPCABP outperforms the other two SPCA methods. Third, we propose a novel method called sparse factor analysis by projection (SFABP) based on SPCABP, and propose to use cross-validation method for the selection of the tuning parameter and the number of factors. Our simulation studies show that SFABP has better performance than the unpenalyzed factor analysis when they are applied to classification problems. Cluster analysis Classification Cross-validation High-dimensional data Optimal score Principal components analysis Tuning parameter Variable selection Factor Analysis
632	Random dot product graphs: a flexible model for complex networks Young, Stephen J. 17 November 2008 (has links) Over the last twenty years, as biological, technological, and social net- works have risen in prominence and importance, the study of complex networks has attracted researchers from a wide range of ﬁelds. As a result, there is a large and diverse body of literature concerning the properties and development of models for complex networks. However, many of the models that have been previously developed, although quite successful at capturing many observed properties of complex networks, have failed to capture the fundamental semantics of the networks. In this thesis, we propose a robust and general model for complex networks that incorporates at a fundamental level semantic information. We show that for a large range of average degrees and with a suitable choice of parameters, this model exhibits the three hallmark properties of complex networks: small diameter, clustering, and skewed degree distribution. Additionally, we provide a structural interpretation of assortativity and apply this strucutral assortativity to the random dot product graph model. We also extend the results of Chung, Lu, and Vu on the spectral gap of the expected degree sequence model to a general class of random graph models with independent edges. We apply this result to the recently developed Stochastic Kronecker graph model of Leskovec, Chakrabarti, Kleinberg, and Faloutsos. Comlex networks Random graphs Power-law Clustering Assortativity Spectral gap Conductance Cluster analysis Mathematical statistics Random graphs
633	Consuming and Communicating Fruit and Vegetables : A Nation-Wide Food Survey and Analysis of Blogs among Swedish Adults Simunaniemi, Anna-Mari January 2011 (has links) The aim of this thesis was to investigate fruit and vegetable (F&V) consumption among Swedish adults and to use F&V-related perceptions for audience segmentation. Further, the aim was to identify motives and approaches of F&V bloggers, as well as to analyze F&V-related online discourses. F&V consumption and related perceptions were surveyed using a questionnaire among a random sample of Swedish adults (18-84y; final response rate 51%; n=1 304). F&V consumption was measured using a self-administered pre-coded 24-h recall and FFQ. The average consumption was close to the recommendations. Women in general and men born outside Sweden as well as the physically active respondents consumed the most F&V. The respondents were divided into two clusters based on their F&V-related perceptions. Positive cluster with more women and higher mean age consumed more F&V, whereas Indifferent cluster experienced more practical, habitual as well as external problems with F&V consumption. Cluster analysis is an example of audience segmentation for communicative purposes. A sample of 50 lay-people blogs with F&V-related content were analyzed with a qualitative content analysis. Two-dimensional categories – level of dietary influential purpose and source of experience – were used to identify blogger ideal types. Exhibitionist with a passive level of dietary influence and lived experiences was the most common type. Persuaders use lived experiences to actively influence their readers, whereas Authorities try to influence mediating others’ experiences. The Mediator is described as a neutral observer. Understanding the role of blogs in everyday communication is important for targeting health messages. A critical discourse analysis was applied to Persuader bloggers’ texts (n=12). Three F&V-related discourses were identified: normative consumption, authentic consumption and altruistic consumption. This analysis is useful for the last process of dietetic communication, namely tailoring the messages. The present four studies approach dietetic communication processes from a research perspective. However, a further step might be to apply these to a health promotion initiative starting from an identified diet-related problem (e.g. low F&V consumption) through audience segmentation (e.g. through cluster analysis) and targeting a relevant channel (e.g. through blogs) finally to tailor the message (e.g. findings from discourse analysis). fruit and vegetables 24-h recall food frequency questionnaire discourse analysis cluster analysis communication blogs Culinary Arts and Meal Science Måltidskunskap
634	Human-centered semantic retrieval in multimedia databases Chen, Xin. January 2008 (has links) (PDF) Thesis (Ph. D.)--University of Alabama at Birmingham, 2008. / Additional advisors: Barrett R. Bryant, Yuhua Song, Alan Sprague, Robert W. Thacker. Description based on contents viewed Oct. 8, 2008; title from PDF t.p. Includes bibliographical references (p. 172-183).
635	Nonparametric evolutionary clustering Xu, Tianbing. January 2009 (has links) Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Computer Science, 2009. / Includes bibliographical references.
636	A customer equity-based segmentation of service consumers an application of multicriterion clusterwise regression for joint segmentation settings / Voorhees, Clay M. Cronin, J. Joseph. January 2006 (has links) Thesis (Ph. D.)--Florida State University, 2006. / Advisor: J. Joseph Cronin Jr., Florida State University, College of Business, Dept. of Marketing. Title and description from dissertation home page (viewed Sept. 27, 2006). Document formatted into pages; contains xi, 209 pages. Includes bibliographical references.
637	Η επίδραση των online κοινωνικών δικτύων στην συμπεριφορά καταναλωτή Ζαχαρής, Χρήστος 13 February 2012 (has links) Στην παρούσα εργασία παρουσιάζεται η επίδραση του facebook ως online κοινωνικό δίκτυο στις καταναλωτικές συμπεριφορές του Έλληνα χρήστη του facebook. Αρχικά στο δεύτερο κεφάλαιο παρουσιάζεται, με την βοήθεια της διεθνούς βιβλιογραφίας, ο ορισμός των online κοινωνικών δικτύων. Ύστερα παρουσιάζονται ιστορικά και στατιστικά στοιχεία των online κοινωνικών δικτύων και στη συνέχεια παρουσιάζονται και αναλύονται τα πιο χρήσιμα εργαλεία του μάρκετινγκ στις νέες τεχνολογίες (eWOM, viral, direct marketing). Ακολουθεί, στο τρίτο κεφάλαιο, η παρουσίαση της μεθοδολογίας της έρευνας και η ανάλυση του ερευνητικού εργαλείου και της μεθόδου δειγματοληψίας. Στην παρούσα εργασία έγινε χρήση μοντέλων έρευνας ακαδημαϊκά αναγνωρισμένων όπως το TAM & trust. Στο τέταρτο κεφάλαιο γίνεται η ανάλυση δεδομένων ξεκινώντας με την ανάλυση των δημογραφικών χαρακτηριστικών των χρηστών του facebook. Στη συνέχεια γίνεται αξιολόγηση των μετρήσεων και τέλος το profiling των χρηστών. Η ανάλυση των δεδομένων, με την βοήθεια των factor & cluster analysis, έδωσε τρεις ομάδες χρηστών, οι οποίες και αναλύονται διεξοδικά. Στο τελευταίο κεφάλαιο προτείνονται στους αναγνώστες ποια εργαλεία του μάρκετινγκ είναι πιο αποτελεσματικά για κάθε μία από τις ομάδες χρηστών, αναφέρονται οι περιορισμοί και γίνονται προτάσεις για μελλοντικές έρευνες / This paper presents the impact of facebook as an online social network on consumer attitudes of the Greek users of facebook. Initially, in second chapter, it is presented, with the help of the international literature, the definition of online social networks. After is presented a history and statistics of online social networks and then is presented and analyzed the most useful tools of marketing to new technologies (eWOM, viral, direct marketing). Then, in chapter three, it follows the presentation of the research methodology and analysis of the research tool and method of sampling. This work uses research models academically recognized such as the TAM & trust. The fourth chapter constitutes data analysis, starting first with analyzing the demographic characteristics of the users of facebook. Then an evaluation of the measurements takes part and finally the profiling of users. The analysis of data, with the help of factor & cluster analysis, gave three clusters, which were analyzed in detail. In the last chapter is suggested to readers which of the marketing tools are most effective for each of the user groups, limitations are mentioned and recommendations are made for future research Online κοινωνικά δίκτυα 381.142 Online social networks eWOM Viral TAM Trust Cluster analysis Consumer behavior analysis
638	Categorização de dados quantitativos para estudos de diversidade genética / Categorization quantitative data for studies of genetic diversity Barroso, Natália Caixeta 15 December 2010 (has links) Made available in DSpace on 2015-03-26T13:32:11Z (GMT). No. of bitstreams: 1 texto completo.pdf: 2217621 bytes, checksum: 73d2ddc4b72290d7ed609d146e107caf (MD5) Previous issue date: 2010-12-15 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The genetic diversity study is an important tool in the identification of genetically divergent individuals, wich, can increase the effect of heterosis in the progeny when combinaded. A statistical technique usually applied in this type of study is the cluster analysis. However, before applying this technique, it must be obtained a similarity matrix (or distance) between the genotypes. These distances can be calculated in several ways, which different proposals are found in the literature for quantitative variables, binary and multicategoric. The transformation of quantitative variables in multicategoric can be used to facilitate their characterization with preliminary useful information. There are quite a few methods to make such changes, but they need to be better understood so that the loss of information occurred in such changes does not damage significantly the results of the analysis. Therefore the purposes of this study are: to determine which of these variables categorization methods are efficient; to research the influence of the choice of different coefficients of dissimilarity in cluster analysis, made from simulated data by using quantitative variables and multicategoric; and to investigate whether some hierarchical methods group efficiently the simulated data. For that, there were made 50 simulations of ten quantitative variables to twenty genotypes of a species of reference as corn, each one with four replications. These data were converted in multicategoric using the following methods: equitable division of amplitude, equitable percentage, square rule, Sturges rule and normal distribution. A number of classes had to be established for the first two methods, which were used four and five classes for both. Were used to create distance matrices, in the original data and multicategoric, the dissimilarity measures: Euclidean distance, the average Euclidean, squared Euclidean distance, Mahalanobis distance and weighted distance. Subsequently, the grouping was done by the method of nearest neighbor and the average linkage between groups (UPGMA). The efficiency of these was verified by the statistics of efficiency cophenetic correlation coefficient, stress and distortion degree between the phenetic and cophenetic matrices. The results showed that the cluster method UPGMA was superior to method of nearest neighbor for all distance measures used. Euclidean distances and average Euclidean showed similar performance in all cluster analysis done. Moreover, these two measures got the best performance in all groups performed. All methods of data categorization achieved a satisfactory performance when grouped by UPGMA, except the method of equal percentage with four and five classes. However, the data which have their classes estimated by the square rule had the most similar dendrogram when compared to the obtained using the original data, and therefore, this is the recommended method to perform the categorization of data. / O estudo da divergência genética é uma ferramenta importante na identificação de indivíduos geneticamente divergentes que, ao serem combinados, possam aumentar o efeito heterótico na progênie. Uma técnica estatística muito aplicada nesse tipo de estudo é a análise de agrupamento. Entretanto, antes dessa técnica ser empregada, deve ser obtida uma matriz de similaridade (ou distância) entre os genótipos. Essas distâncias podem ser calculadas de diversas maneiras, sendo que diferentes propostas são encontradas na literatura para as variáveis quantitativas, binárias e multicategóricas. A transformação de variáveis quantitativas em multicategóricas pode ser utilizada para facilitar sua caracterização com informações preliminares de grande utilidade. Existem vários métodos para se fazer essa transformação, porém estes precisam ser melhor entendidos para que a perda de informações ocorrida na transformação não prejudique significativamente os resultados da análise. Portanto, este trabalho teve como objetivos: verificar quais desses métodos de categorização de variáveis são eficientes; pesquisar a influência da escolha de diferentes coeficientes de dissimilaridades na análise de agrupamentos, feita a partir de dados simulados utilizando variáveis quantitativas e multicategóricas; e averiguar se alguns métodos hierárquicos agrupam com eficiência os dados simulados. Para isto, foram feitas 50 simulações de dez variáveis quantitativas para vinte genótipos de uma espécie de referência como o milho, cada um com quatro repetições. Estes dados foram transformados em multicategóricos através dos métodos: divisão equitativa da amplitude, percentual equitativo, regra do Quadrado, regra de Sturges e distribuição normal. O número de classes tinha que ser estabelecido para os dois primeiros, no caso, foi utilizado quatro e cinco classes para ambos. Foram utilizadas para construir as matrizes de distâncias, nos dados originais e multicategóricos, as medidas de dissimilaridade: distância euclidiana, euclidiana média, quadrado da distância euclidiana, distância de Mahalanobis e distância ponderada. Posteriormente, o agrupamento foi feito pelo método do vizinho mais próximo e pela ligação média entre grupos (UPGMA). A eficiência destes foi verificada através das estatísticas de eficiência coeficiente de correlação cofenética, estresse e grau de distorção entre as matrizes fenéticas e cofenéticas. Os resultados mostraram que o método de agrupamento UPGMA foi superior ao método do vizinho mais próximo para todas as medidas de distância utilizadas. As distâncias euclidiana e euclidiana média apresentaram a mesma performance em todas as análises de agrupamento feitas. Além disso, essas duas medidas obtiveram os melhores desempenhos em todos os agrupamentos realizados. Todos os métodos de categorização de dados conseguiram um desempenho satisfatório quando agrupados por UPGMA, exceto o método do percentual equitativo com quatro e cinco classes. Contudo, os dados que possuem suas classes estimadas pela regra do Quadrado apresentaram o dendrograma mais semelhante com o obtido pormeio dos dados originais, sendo este, então, o método mais recomendado para se fazer a categorização de dados. Categorização Diversidade genética Medidas de dissimilaridade Análise de agrupamento Categorization Genetic diversity Dissimilarity measures Cluster analysis CNPQ::CIENCIAS AGRARIAS
639	Métodos de agrupamento: avaliação e aplicação ao estudo de divergência genética em acessos de alho / Clustering methods: evaluation and application for study of genetic divergence in garlic accessions Silva, Anderson Rodrigo da 13 February 2012 (has links) Made available in DSpace on 2015-03-26T13:32:13Z (GMT). No. of bitstreams: 1 texto completo.pdf: 1211050 bytes, checksum: ef3f5f575d905c3bcac49eef49b02816 (MD5) Previous issue date: 2012-02-13 / This study aimed to assess, as the consistency of the grouping, hierarchical clustering methods UPGMA and Ward and optimization Tocher and modified Tocher by application of Fisher discriminant analysis in groups obtained with each method in the study of genetic divergence among garlic accessions, also identifying the most dissimilar access. The groupings were based on the Mahalanobis distance, which also allowed to quantify the relative importance of characters. The accessions with the highest dissimilarity accesses were 13 (BGH 4505) and 61 (BGH 5958), especially in relation to the average weight of the bulb and productivity. Modified Tocher methods, UPGMA and Ward algorithm presented results agree with each other and form groups. However, the Fisher discriminant analysis applied to groups of hierarchical methods (UPGMA and Ward) showed the lowest apparent error, therefore, more consistent methods for studying the genetic diversity of garlic accessions. / Este estudo teve por objetivo avaliar, quanto a consistência do agrupamento, os métodos de agrupamentos hierárquicos UPGMA e Ward e os de otimização de Tocher e Tocher modificado, pela aplicação da análise discriminante de Fisher aos grupos obtidos com cada método, em estudo da divergência genética entre acessos de alho, identificando também os acessos mais dissimilares. Os agrupamentos foram realizados com base na distância generalizada de Mahalanobis, que também permitiu quantificar a importância relativa dos caracteres. Os acessos que apresentaram maior dissimilaridade foram os acessos 13 (BGH 4505) e 61 (BGH 5958), principalmente em relação ao peso médio do bulbo e produtividade. Os métodos de Tocher modificado, UPGMA e algoritmo de Ward apresentaram resultados concordantes entre si quanto a formação dos grupos. No entanto, pela análise discriminante de Fisher aplicada aos grupos dos métodos hierárquicos (UPGMA e Ward) observou-se as menores taxas de erro aparente, sendo, portanto, os métodos mais consistentes para o estudo da diversidade genética de acessos de alho. Allium sativum L. Análise de cluster Análise discriminante Allium sativum L. Cluster analysis Discriminant analysis CNPQ::CIENCIAS AGRARIAS
640	Modelos de regressão não linear para descrição do crescimento de plantas de alho / Nonlinear regression models for the growth description of plants garlic Reis, Renata Maciel dos 16 July 2012 (has links) Made available in DSpace on 2015-03-26T13:32:16Z (GMT). No. of bitstreams: 1 texto completo.pdf: 1125314 bytes, checksum: 0140d25420c9aa6705ebe6bcf03fb7fa (MD5) Previous issue date: 2012-07-16 / The objective of this study was to choose a nonlinear regression model that best described the dry matter accumulation in different parts of the plant garlic over time (60, 90, 120 and 150 days after planting). Were used 20 garlic accessions belonging to the Vegetable Germplasm Bank of Universidade Federal de Viçosa. In order to work only with groups of similar accessions, was applied the cluster analysis in order to obtaining these clusters. The dry matter of leaf, pseudostem, bulb and root were defined as variables in this cluster analysis, which was conducted by the Ward algorithm, using as dissimilarity measure the Mahalanobis distance. Based on Mojena s method to determine the number of groups, was formed three groups of accessions, whose means of dry matter of bulb, of root and of the whole plant were used to fitting of seven nonlinear regression models, namely : Mitscherlich, Gompertz, Logistic, Meloun I Meloun II, von Bertalanffy and Brody. Aiming to choose the best fitted model to the three characteristics of each group were calculated coefficient of determination (R2), the error mean square (EMS) and the average deviation absolut error. Comparing the values of the evaluators found that, for the three characteristics of the three groups, the best fit model was the Logistic model. / O objetivo deste estudo foi escolher um modelo de regressão não linear que melhor descreve o acúmulo de matéria seca de diferentes partes da planta do alho ao longo do tempo (60, 90, 120 e 150 dias após plantio). Foram utilizados 20 acessos de alho pertencentes ao Banco de Germoplasma de Hortaliças da Universidade Federal de Viçosa (BGH/UFV). A fim de se trabalhar apenas com grupos de acessos semelhantes, aplicou-se a análise de agrupamento para a formação desses grupos. As matérias secas da folha, do pseudocaule, do bulbo e da raiz foram definidas como as variáveis nessa análise de agrupamento, que foi realizado por meio do algoritmo de Ward, utilizando como medida de dissimilaridade a distância generalizada de Mahalanobis. O número ótimo de grupos foi determinado por meio do Método de Mojena, o qual indicou três grupos de acessos, cujas médias de matéria seca do bulbo, da raiz e total da planta foram utilizadas para o ajuste de sete modelos de regressão não linear, a saber: Mitscherlich, Gompertz, Logístico, Meloun I, Meloun II, Brody e von Bertalanffy. A identificação do modelo que melhor se ajustou as três características de cada grupo foi realizada mediante coeficiente de determinação (R2), o quadrado médio do resíduo (QMR) e o desvio médio absoluto dos resíduos (DMA). Comparando os valores desses avaliadores observou-se que, para as três caraterísticas dos três grupos, o modelo que melhor se ajustou foi o modelo Logístico. Allium sativum L. Análise de agrupamento Comparação de modelos Allium sativum L. Cluster analysis Comparison models CNPQ::CIENCIAS AGRARIAS

Search results