• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1354
  • 364
  • 187
  • 127
  • 69
  • 39
  • 37
  • 33
  • 26
  • 25
  • 22
  • 21
  • 19
  • 12
  • 9
  • Tagged with
  • 2709
  • 612
  • 530
  • 428
  • 401
  • 338
  • 287
  • 283
  • 278
  • 247
  • 241
  • 209
  • 206
  • 206
  • 193
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
241

Neighborhood Socio-spatial Organization at Calixtlahuaca, Mexico

January 2015 (has links)
abstract: This dissertation research examines neighborhood socio-spatial organization at Calixtlahuaca, a Postclassic (1100-1520 AD) urban center in highland Mesoamerica. Neighborhoods are small spatial units where residents interact at a face to face level in the process of daily activities. How were Calixtlahuaca's neighborhoods organized socio-spatially? Were they homogenous or did each neighborhood contain a mixture of different social and economic groups? Calixtlahuaca was a large Aztec-period city-state located in the frontier region between the Tarascan and Triple Alliance empires. As the capital of the Maltazinco polity, administrative, ritual, and economic activities were located here. Four languages, Matlazinca, Mazahua, Otomi, and Nahua, were spoken by the city's inhabitants. The combination of political geography and an unusual urban center provides an opportunity for examining complex neighborhood socio-spatial organization in a Mesoamerican setting. The evidence presented in this dissertation shows that Calixtlahuaca's neighborhoods were socially heterogeneous spaces were residents from multiple social groups and classes coexisted. This further suggests that the cross-cutting ties between neighborhood residents had more impact on influencing certain economic choices than close proximity in residential location. Market areas were the one way that the city was clearly divided spatially into two regions but consumer preferences within the confines of economic resources were similar in both regions. This research employs artifact collections recovered during the Calixtlahuaca Archaeological Project surface survey. The consumption practices of the residents of Calixtlahuaca are used to define membership into several social groups in order to determine the socio-spatial pattern of the city. Economic aspects of city life are examined through the identification of separate market areas that relate to neighborhood patterns. Excavation data was also examined as an alternate line of evidence for each case. The project contributes to the sparse literature on preindustrial urban neighborhoods. Research into social segregation or social clustering in modern cities is plentiful, but few studies examine the patterns of social clustering in the past. Most research in Mesoamerica focuses on the clustering of social class. / Dissertation/Thesis / Doctoral Dissertation Anthropology 2015
242

Visual Analytics for Spatiotemporal Cluster Analysis

January 2016 (has links)
abstract: Traditionally, visualization is one of the most important and commonly used methods of generating insight into large scale data. Particularly for spatiotemporal data, the translation of such data into a visual form allows users to quickly see patterns, explore summaries and relate domain knowledge about underlying geographical phenomena that would not be apparent in tabular form. However, several critical challenges arise when visualizing and exploring these large spatiotemporal datasets. While, the underlying geographical component of the data lends itself well to univariate visualization in the form of traditional cartographic representations (e.g., choropleth, isopleth, dasymetric maps), as the data becomes multivariate, cartographic representations become more complex. To simplify the visual representations, analytical methods such as clustering and feature extraction are often applied as part of the classification phase. The automatic classification can then be rendered onto a map; however, one common issue in data classification is that items near a classification boundary are often mislabeled. This thesis explores methods to augment the automated spatial classification by utilizing interactive machine learning as part of the cluster creation step. First, this thesis explores the design space for spatiotemporal analysis through the development of a comprehensive data wrangling and exploratory data analysis platform. Second, this system is augmented with a novel method for evaluating the visual impact of edge cases for multivariate geographic projections. Finally, system features and functionality are demonstrated through a series of case studies, with key features including similarity analysis, multivariate clustering, and novel visual support for cluster comparison. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2016
243

Revealing Microbial Responses to Environmental Dynamics: Developing Methods for Analysis and Visualization of Complex Sequence Datasets.

January 2017 (has links)
abstract: The greatest barrier to understanding how life interacts with its environment is the complexity in which biology operates. In this work, I present experimental designs, analysis methods, and visualization techniques to overcome the challenges of deciphering complex biological datasets. First, I examine an iron limitation transcriptome of Synechocystis sp. PCC 6803 using a new methodology. Until now, iron limitation in experiments of Synechocystis sp. PCC 6803 gene expression has been achieved through media chelation. Notably, chelation also reduces the bioavailability of other metals, whereas naturally occurring low iron settings likely result from a lack of iron influx and not as a result of chelation. The overall metabolic trends of previous studies are well-characterized but within those trends is significant variability in single gene expression responses. I compare previous transcriptomics analyses with our protocol that limits the addition of bioavailable iron to growth media to identify consistent gene expression signals resulting from iron limitation. Second, I describe a novel method of improving the reliability of centroid-linkage clustering results. The size and complexity of modern sequencing datasets often prohibit constructing distance matrices, which prevents the use of many common clustering algorithms. Centroid-linkage circumvents the need for a distance matrix, but has the adverse effect of producing input-order dependent results. In this chapter, I describe a method of cluster edge counting across iterated centroid-linkage results and reconstructing aggregate clusters from a ranked edge list without a distance matrix and input-order dependence. Finally, I introduce dendritic heat maps, a new figure type that visualizes heat map responses through expanding and contracting sequence clustering specificities. Heat maps are useful for comparing data across a range of possible states. However, data binning is sensitive to clustering cutoffs which are often arbitrarily introduced by researchers and can substantially change the heat map response of any single data point. With an understanding of how the architectural elements of dendrograms and heat maps affect data visualization, I have integrated their salient features to create a figure type aimed at viewing multiple levels of clustering cutoffs, allowing researchers to better understand the effects of environment on metabolism or phylogenetic lineages. / Dissertation/Thesis / Chapter 2 Excel file of transcriptome responses / Chapter 2 Perl scripts / Chapter 3 Cluster Aggregation Perl script / Chapter 4 Example of the top-down clustering method used to construct dendritic heat maps / Chapter 4Perl scripts and dendritic heat map images / Chapter 4 Perl scripts and dendritic heat map images / Doctoral Dissertation Geological Sciences 2017
244

"Aprendizado de máquina semi-supervisionado: proposta de um algoritmo para rotular exemplos a partir de poucos exemplos rotulados"

Marcelo Kaminski Sanches 11 August 2003 (has links)
A fim de se utilizar algoritmos de Aprendizado de Máquina para tarefas de classificação, é admitida a existência de um conjunto de exemplos rotulados, conhecido como conjunto de treinamento, o qual é utilizado para o treinamento do classificador. Entretanto, em casos reais, esse conjunto de treinamento pode não conter um número de exemplos suficientemente grande para se induzir um bom classificador. Recentemente, a comunidade científica tem mostrado um grande interesse em uma variação dessa abordagem de aprendizado supervisionado. Essa nova abordagem, conhecida como aprendizado semi-supervisionado, assume que, juntamente com o conjunto de treinamento, há um segundo conjunto, de exemplos não rotulados, também disponível durante o treinamento. Uma das metas do aprendizado semi-supervisionado é o treinamento de classificadores quando uma grande quantidade de exemplos não rotulados está disponível juntamente com um pequeno conjunto de exemplos rotulados. A motivação para o aprendizado semi-supervisionado deve-se ao fato que, em muitas aplicações do mundo real, conjuntos de exemplos não rotulados são facilmente encontrados ou muito baratos para serem coletados, quando comparados aos conjuntos de exemplos rotulados. Um outro fator é que exemplos não rotulados podem ser coletados de forma automática enquanto os rotulados necessitam de especialistas ou outros custosos recursos de classificação. Os exemplos não rotulados podem ser utilizados de diversas maneiras. Neste trabalho é explorado um mecanismo no qual os exemplos não rotulados podem ser utilizados para melhorar tarefas de classificação e é proposto um algoritmo semi-supervisionado, denominado k-meanski, o qual viabiliza o uso de exemplos não rotulados em aprendizado supervisionado. A técnica utilizada pelo algoritmo proposto está baseada em duas premissas. A primeira delas é que os exemplos tendem a se agrupar naturalmente em clusters, ao invés de se distribuirem uniformemente no espaço de descrição dos exemplos. Além disso, cada exemplo do conjunto inicial de exemplos rotulados deve estar localizado perto do centro de um dos clusters existentes no espaço de descrição de exemplos. A segunda premissa diz que a maioria dos exemplos nos clusters pertencem a uma classe específica. Obviamente, a validade dessas premissas é dependente do conjunto de dados utilizado. O algoritmo k-meanski funciona bem nos casos em que os dados estão em conformidade com ambas as premissas. Entretanto, caso elas sejam violadas, a performance do algoritmo não será boa. São mostrados experimentos utilizando conjuntos de dados do mundo real, escolhendo-se aleatoriamente exemplos desses conjuntos para atuarem como exemplos rotulados.
245

Dissimilarity fuctions analysis based on dynamic clustering for symbolic data

Cléa Gomes da Silva, Alzennyr January 2005 (has links)
Made available in DSpace on 2014-06-12T16:01:14Z (GMT). No. of bitstreams: 2 arquivo7274_1.pdf: 1733810 bytes, checksum: 2d9eb7a4489382e5afbf1790810474a0 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2005 / A análise de dados simbólicos (Symbolic Data Analysis) é um novo domínio na área de descoberta automática de conhecimento que visa desenvolver métodos para dados descritos por variáveis que podem assumir como valor conjuntos de categorias, intervalos ou distribuições de probabilidade. Essas novas variáveis permitem levar em conta a variabilidade e/ou a incerteza presente nos dados. O tratamento de dados simbólicos através de técnicas estatísticas e de aprendizagem de máquina necessita da introdução de medidas de distância capazes de manipular tal tipo de dado. Com esse objetivo, diversas funções de dissimilaridade têm sido propostas na literatura. Entretanto, nenhum estudo comparativo acerca do desempenho de tais funções em problemas que envolvem simultaneamente dados simbólicos booleanos e modais foi realizado. A principal contribuição dessa dissertação é realizar uma análise comparativa e uma avaliação empírica sobre funções de dissimilaridade para dados simbólicos, uma vez que esse tipo de estudo, apesar de muito relevante, é quase inexistente na literatura. Além disso, este trabalho também introduz novas funções de dissimilaridade que podem ser usadas no agrupamento dinâmico de dados simbólicos. Os algoritmos de agrupamento dinâmico consistem em obter, simultaneamente, uma partição em um número fixo de classes e a identificação de um representante para cada classe, minimizando localmente um critério que mede a adequação entre as classes e os seus representantes. Para validar esse estudo, foram realizados experimentos com bases de dados de referência na literatura e dois conjuntos de dados artificiais de intervalos com diferentes graus de dificuldade de classificação, objetivando a comparação das funções avaliadas. A precisão dos resultados foi mensurada por um índice externo de agrupamento aplicado na validação cruzada não supervisionada, para as bases de dados reais, e também no quadro de uma experiência Monte Carlo, para as bases de dados artificiais. Com os resultados alcançados é possível verificar a adequação das diversas funções de dissimilaridade aos diferentes tipos de dados simbólicos (multivalorado, multivalorado ordinal, intervalar, e modal de mesmo suporte e de suportes diferentes), bem como identificar as melhores configurações de funções. Testes estatísticos validam as conclusões
246

Uso de rede de Kohonen para a clusterização de objetos de aprendizagem

Silva, Patric Ferreira da 08 August 2007 (has links)
Made available in DSpace on 2016-04-18T21:39:46Z (GMT). No. of bitstreams: 3 Patric Ferreira da Silva1.pdf: 1479671 bytes, checksum: a96eecad303d34f9d8e2a212f283fefc (MD5) Patric Ferreira da Silva2.pdf: 1611161 bytes, checksum: a756da7270e8b0333a05ef0fbbd9d7c4 (MD5) Patric Ferreira da Silva3.pdf: 2568422 bytes, checksum: ec60856a28f2e66a7209520b82882d36 (MD5) Previous issue date: 2007-08-08 / Instituto Presbiteriano Mackenzie / The increasing availability of digital education resources in the Internet, called learning objects, has been followed by the definition of indexation standards. However, the lack of consensus about the definition of learning objects, as well the diversity of metadata approaches for its classification hinders the selection process of these elements. This scenery requires new investigations that make possible the establishment of parameters for the creation of a specific model of artificial neural network for the learning objects clustering. The implementation of this model is linked to a theoretical-methodological option, based on standard metadata criteria, which makes possible the formation of input samples for the construction of a Self-Organizing Maps (Kohonen model) through algorithms and mathematical models. Consequently, the development of this learning objects clustering proposal can subsidize the educational work in presential and on-line environments and to collaborate for the learning objects reusability. It was also object of this research the investigation of as a weight mask, one of the Kohonen model s parameters, affects the final result. For that it was made a comparison of the training result with and without the mask, showing the relevance of this method for obtaining the results of the present research. / A crescente disponibilização, na Internet, de recursos educacionais digitais, denominados de objetos de aprendizagem, tem sido acompanhada da definição de padrões de indexação. Contudo, a falta de consenso sobre a caracterização de objetos de aprendizagem, bem como a diversidade de abordagens de metadados para sua classificação dificulta o processo de seleção destes elementos. Este cenário requer novas investigações que possibilitem o estabelecimento de parâmetros para a criação de um modelo específico de Rede Neural Artificial para a clusterização de objetos de aprendizagem. A implementação deste modelo vinculou-se a uma opção teórico-metodológica pautada em critérios de padrões de metadados, o que possibilitou a formação de um espaço amostral para a construção de um Mapa Auto-Organizável (Rede de Kohonen) por meio de algoritmos e modelos matemáticos. Conseqüentemente, o desenvolvimento desta proposta de clusterização de objetos de aprendizagem pode subsidiar o trabalho educacional presencial e on-line e colaborar para a reusabilidade dos objetos de aprendizagem. Foi também objeto desta pesquisa a investigação de como a máscara de pesos, um dos parâmetros da Rede de Kohonen, afeta resultado final. Para isso foi feita uma comparação do resultado do treinamento com e sem a máscara, o que mostrou a relevância deste método para a obtenção dos resultados da presente pesquisa.
247

A sliding window BIRCH algorithm with performance evaluations

Li, Chuhe January 2017 (has links)
An increasing number of applications covered various fields generate transactional data or other time-stamped data which all belongs to time series data. Time series data mining is a popular topic in the data mining field, it introduces some challenges to improve accuracy and efficiency of algorithms for time series data. Time series data are dynamical, large-scale and high complexity, which makes it difficult to discover patterns among time series data with common methods suitable for static data. One of hierarchical-based clustering methods called BIRCH was proposed and employed for addressing the problems of large datasets. It minimizes the costs of I/O and time. A CF tree is generated during its working process and clusters are generated after four phases of the whole BIRCH procedure. A drawback of BIRCH is that it is not very scalable. This thesis is devoted to improve accuracy and efficiency of BIRCH algorithm. A sliding window BIRCH algorithm is implemented on the basis of BIRCH algorithm. At the end of thesis, the accuracy and efficiency of sliding window BIRCH are evaluated. A performance comparison among SW BIRCH, BIRCH and K-means are also presented with Silhouette Coefficient index and Calinski-Harabaz Index. The preliminary results indicate that the SW BIRCH may achieve a better performance than BIRCH in some cases.
248

Spherical k-Means Clustering

Buchta, Christian, Kober, Martin, Feinerer, Ingo, Hornik, Kurt 09 1900 (has links) (PDF)
Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents. This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment. (authors' abstract)
249

Clusters (k) Identification without Triangle Inequality : A newly modelled theory / Clustering(k) without Triangle Inequality : A newly modelled theory

Narreddy, Naga Sambu Reddy, Durgun, Tuğrul January 2012 (has links)
Cluster analysis characterizes data that are similar enough and useful into meaningful groups (clusters).For example, cluster analysis can be applicable to find group of genes and proteins that are similar, to retrieve information from World Wide Web, and to identify locations that are prone to earthquakes. So the study of clustering has become very important in several fields, which includes psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning and data mining [1] [2].   Cluster analysis is the one of the widely used technique in the area of data mining. According to complexity and amount of data in a system, we can use variety of cluster analysis algorithms. K-means clustering is one of the most popular and widely used among the ten algorithms in data mining [3]. Like other clustering algorithms, it is not the silver bullet. K-means clustering requires pre analysis and knowledge before the number of clusters and their centroids are determined. Recent studies show a new approach for K-means clustering which does not require any pre knowledge for determining the number of clusters [4].   In this thesis, we propose a new clustering procedure to solve the central problem of identifying the number of clusters (k) by imitating the desired number of clusters with proper properties. The proposed algorithm is validated by investigating different characteristics of the analyzed data with modified theory, analyze parameters efficiency and their relationships. The parameters in this theory include the selection of embryo-size (m), significance level (α), distributions (d), and training set (n), in the identification of clusters (k).
250

Sensibilité d'un écoulement de rouleau compressé et des variations cycle à cycle associées à des paramètres de remplissage moteur / Sensitivity of the compressed tumble motion and of the cycle to cycle variations to engine’s air filling parameters.

Cao, Yujun 17 December 2014 (has links)
Ce travail concerne l’étude expérimentale de la sensibilité de l’écoulement du moteur et de ses variations cycle à cycle (VCC) à trois variations des conditions aux limitesliée à l’optimisation du remplissage moteur. Dans la configuration standard, l’écoulement tridimensionnel de rouleau (« tumble ») est décrit pendant les phases d’admission et de compression.Un phasage plus précoce de la loi de levée d’admission augmente le débit de masse aux soupapes et amplifie les fluctuations dès le début de l’admission. L’intensité du rouleau est beaucoup plus faible à phase mi-compression. L’énergie fluctuante au point mort haut est plus faible. Une course rallongée du moteur conduit, en fin de compression, à un basculement de l’écoulement moyen et à une évolution très différente des vitesses fluctuantes,due au confinement différent vue par l’aérodynamique du moteur. Enfin, la modification des conduits d’admission entraîne une variation de l’intensité et une structuration fondamentalement différente de l’écoulement. En outre, pour décrire le transfert vers la turbulence,deux méthodologies de classification des structures de l’écoulement en groupe par corrélation spatiale, puis par « clustering » sont adaptées. L’analyse statistique du contenu des différents groupes et des transitions entre groupes permet de montrer que les VCC sont associées à différentes trajectoires dans l’espace des groupes. Des statistiques conditionnelles sont calculées pour analyser les données de chaque groupe et permettent de définir une décomposition triple. Ces caractérisations plus précises des VCC sont très générales et applicables à des grandes bases de données expérimentales ou numériques. / This experimental work concerns a sensitivity study of the in-cylinder flow in aspark-ignition engine and of the cycle to cycle variations (CCV) by comparing three variationsof boundary conditions related to the optimisation of air filling conditions. In the reference case, the three dimensional tumble flow is characterized during the intake and compression phases. A earlier intake cam phase increases the mass flow rate at inlet valves and amplifiesthe fluctuations immediately after the start of intake phase. The tumble ratio is much lowerat mid-compression phase. The fluctuating energy at top dead center is reduced. A longerengine stroke leads, at the end of compression phase, to a shift of mean flow and to a verydistinct evolution of the fluctuating velocity, due to the different confinement from the pointof view of the engine internal flow. Finally, the modification of intake duct design changes theflow intensity and reorganizes in depth the flow structure. Moreover, to describe the transfer into turbulence, two methodologies of classification in groups of flow structures, by spatial correlation then by clustering, are proposed. A phase-averaged analysis of the statistics of group content and inter-group transitions shows that CCV can be associated with different sets of trajectories during the second half of the compression phase. The conditional statistics are computed to analyse the data in each group, which leads to a triple decomposition. It is important to point out that this more accurate evaluation of CCV is applicable to very large sets of experimental or numerical data.

Page generated in 0.1149 seconds