• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 26
  • 9
  • 6
  • 4
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 59
  • 59
  • 24
  • 15
  • 14
  • 10
  • 10
  • 9
  • 8
  • 8
  • 8
  • 7
  • 7
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Analyse des différences dans le Big Data : Exploration, Explication, Évolution / Difference Analysis in Big Data : Exploration, Explanation, Evolution

Kleisarchaki, Sofia 28 November 2016 (has links)
La Variabilité dans le Big Data se réfère aux données dont la signification change de manière continue. Par exemple, les données des plateformes sociales et les données des applications de surveillance, présentent une grande variabilité. Cette variabilité est dûe aux différences dans la distribution de données sous-jacente comme l’opinion de populations d’utilisateurs ou les mesures des réseaux d’ordinateurs, etc. L’Analyse de Différences a comme objectif l’étude de la variabilité des Données Massives. Afin de réaliser cet objectif, les data scientists ont besoin (a) de mesures de comparaison de données pour différentes dimensions telles que l’âge pour les utilisateurs et le sujet pour le traffic réseau, et (b) d’algorithmes efficaces pour la détection de différences à grande échelle. Dans cette thèse, nous identifions et étudions trois nouvelles tâches analytiques : L’Exploration des Différences, l’Explication des Différences et l’Evolution des Différences.L’Exploration des Différences s’attaque à l’extraction de l’opinion de différents segments d’utilisateurs (ex., sur un site de films). Nous proposons des mesures adaptées à la com- paraison de distributions de notes attribuées par les utilisateurs, et des algorithmes efficaces qui permettent, à partir d’une opinion donnée, de trouver les segments qui sont d’accord ou pas avec cette opinion. L’Explication des Différences s’intéresse à fournir une explication succinte de la différence entre deux ensembles de données (ex., les habitudes d’achat de deux ensembles de clients). Nous proposons des fonctions de scoring permettant d’ordonner les explications, et des algorithmes qui guarantissent de fournir des explications à la fois concises et informatives. Enfin, l’Evolution des Différences suit l’évolution d’un ensemble de données dans le temps et résume cette évolution à différentes granularités de temps. Nous proposons une approche basée sur le requêtage qui utilise des mesures de similarité pour comparer des clusters consécutifs dans le temps. Nos index et algorithmes pour l’Evolution des Différences sont capables de traiter des données qui arrivent à différentes vitesses et des types de changements différents (ex., soudains, incrémentaux). L’utilité et le passage à l’échelle de tous nos algorithmes reposent sur l’exploitation de la hiérarchie dans les données (ex., temporelle, démographique).Afin de valider l’utilité de nos tâches analytiques et le passage à l’échelle de nos algo- rithmes, nous réalisons un grand nombre d’expériences aussi bien sur des données synthé- tiques que réelles.Nous montrons que l’Exploration des Différences guide les data scientists ainsi que les novices à découvrir l’opinion de plusieurs segments d’internautes à grande échelle. L’Explication des Différences révèle la nécessité de résumer les différences entre deux ensembles de donnes, de manière parcimonieuse et montre que la parcimonie peut être atteinte en exploitant les relations hiérarchiques dans les données. Enfin, notre étude sur l’Evolution des Différences fournit des preuves solides qu’une approche basée sur les requêtes est très adaptée à capturer des taux d’arrivée des données variés à plusieurs granularités de temps. De même, nous montrons que les approches de clustering sont adaptées à différents types de changement. / Variability in Big Data refers to data whose meaning changes continuously. For instance, data derived from social platforms and from monitoring applications, exhibits great variability. This variability is essentially the result of changes in the underlying data distributions of attributes of interest, such as user opinions/ratings, computer network measurements, etc. {em Difference Analysis} aims to study variability in Big Data. To achieve that goal, data scientists need: (a) measures to compare data in various dimensions such as age for users or topic for network traffic, and (b) efficient algorithms to detect changes in massive data. In this thesis, we identify and study three novel analytical tasks to capture data variability: {em Difference Exploration, Difference Explanation} and {em Difference Evolution}.Difference Exploration is concerned with extracting the opinion of different user segments (e.g., on a movie rating website). We propose appropriate measures for comparing user opinions in the form of rating distributions, and efficient algorithms that, given an opinion of interest in the form of a rating histogram, discover agreeing and disargreeing populations. Difference Explanation tackles the question of providing a succinct explanation of differences between two datasets of interest (e.g., buying habits of two sets of customers). We propose scoring functions designed to rank explanations, and algorithms that guarantee explanation conciseness and informativeness. Finally, Difference Evolution tracks change in an input dataset over time and summarizes change at multiple time granularities. We propose a query-based approach that uses similarity measures to compare consecutive clusters over time. Our indexes and algorithms for Difference Evolution are designed to capture different data arrival rates (e.g., low, high) and different types of change (e.g., sudden, incremental). The utility and scalability of all our algorithms relies on hierarchies inherent in data (e.g., time, demographic).We run extensive experiments on real and synthetic datasets to validate the usefulness of the three analytical tasks and the scalability of our algorithms. We show that Difference Exploration guides end-users and data scientists in uncovering the opinion of different user segments in a scalable way. Difference Explanation reveals the need to parsimoniously summarize differences between two datasets and shows that parsimony can be achieved by exploiting hierarchy in data. Finally, our study on Difference Evolution provides strong evidence that a query-based approach is well-suited to tracking change in datasets with varying arrival rates and at multiple time granularities. Similarly, we show that different clustering approaches can be used to capture different types of change.
12

Agrupamento espectral através de grafos Laplacianos e uma aplicação no cultivo da soja /

Moura, Larissa. January 2018 (has links)
Orientador: Alice Kimie Miwa Libardi / Banca: Thiago de Melo / Banca: Washington Mio / Resumo: O objetivo desta dissertação é apresentar uma versão detalhada do artigo: "A Tutorial on Spectral Clustering" de U. von Luxburg sobre agrupamentos através de grafos Laplacianos, suas propriedades e mostrar alguns resultados da teoria de agrupamentos. Além disso, serão apresentados três algoritmos de agrupamentos e ilustraremos um deles com uma aplicação no cultivo da soja em diferentes condições de cultivo / Abstract: The main goal of this dissertation is to present a detailed version of the paper: " A Tutorial on Spectral Clustering" of U. von Luxburg on clusters, through Laplacian graphs, their properties and to show some results of the cluster theory. In addition, it will be presented three clustering algorithms and we will illustrate one of them with an application in the soybean cultivation, under different conditions / Mestre
13

Agrupamento espectral através de grafos Laplacianos e uma aplicação no cultivo da soja. / Spectral clustering through Laplacian graphs and an application in soybean cultivation.

Moura, Larissa 16 February 2018 (has links)
Submitted by Larissa Moura null (moura.larie@gmail.com) on 2018-02-26T11:39:11Z No. of bitstreams: 1 moura_larissa_sjrp.pdf: 1591130 bytes, checksum: 7997e476e0c0da8c86b51d6ce91c8898 (MD5) / Approved for entry into archive by Elza Mitiko Sato null (elzasato@ibilce.unesp.br) on 2018-02-26T19:05:03Z (GMT) No. of bitstreams: 1 moura_l_me_sjrp.pdf: 1591130 bytes, checksum: 7997e476e0c0da8c86b51d6ce91c8898 (MD5) / Made available in DSpace on 2018-02-26T19:05:04Z (GMT). No. of bitstreams: 1 moura_l_me_sjrp.pdf: 1591130 bytes, checksum: 7997e476e0c0da8c86b51d6ce91c8898 (MD5) Previous issue date: 2018-02-16 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / O objetivo desta dissertação é apresentar uma versão detalhada do artigo: “A Tutorial on Spectral Clustering” de U. von Luxburg sobre agrupamentos através de grafos Laplacianos, suas propriedades e mostrar alguns resultados da teoria de agrupamentos. Além disso, serão apresentados três algoritmos de agrupamentos e ilustraremos um deles com uma aplicação no cultivo da soja em diferentes condições de cultivo. / The main goal of this dissertation is to present a detailed version of the paper: “ A Tutorial on Spectral Clustering” of U. von Luxburg on clusters, through Laplacian graphs, their properties and to show some results of the cluster theory. In addition, it will be presented three clustering algorithms and we will illustrate one of them with an application in the soybean cultivation, under different conditions.
14

Sistema de localização de facilidades: uma abordagem para mensuração de pontos de demanda e localização de facilidades / Facility location system: a approach to measure demand points and locate facilities

Oliveira, Max Gontijo de 08 October 2012 (has links)
Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2016-04-27T11:59:30Z No. of bitstreams: 2 Dissertação - Max Gontijo de Oliveira - 2012.pdf: 3940401 bytes, checksum: 9d69259096bb8d7b7239f7eb20579d8d (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2016-04-27T12:01:50Z (GMT) No. of bitstreams: 2 Dissertação - Max Gontijo de Oliveira - 2012.pdf: 3940401 bytes, checksum: 9d69259096bb8d7b7239f7eb20579d8d (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2016-04-27T12:01:50Z (GMT). No. of bitstreams: 2 Dissertação - Max Gontijo de Oliveira - 2012.pdf: 3940401 bytes, checksum: 9d69259096bb8d7b7239f7eb20579d8d (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2012-10-08 / Several organizations need to solve the problem of locate and allocate facilities within a geographic area. There are location/allocation problems in various situations, like the distribution of police cars, ambulances, taxi drivers, bus stops among other numerous situations where the location of such facilities is strategic for organization. In location/allocation problems, usually is necessary allocate each demand point to the closest facility. So, each facility will be located in the center of demand points, considering the demand as weight. However, the majority of the real location problems have capacity constraint. Therefore, each facility has a certain capacity based on the type of demand. Facility location problems can be continuous or discrete. In continuous problems (also called Weber problem with multiple sources), any point in the plane is a potential site for the instalation of the facility. There are several approaches for working with continuous models. Furthermore, there are many others works approaches presenting models with capacity constraint. But most of these approaches turns the continous model to a discrete model. The objective of this work thesis is to present an approach to distribution of facilities in instances of the capacitated facility location problem. A case study will be presented with the purpose of evaluating the results. / Diversas organizações precisam lidar com o problema de localizar e alocar facilidades em uma região geográfica. Problemas de localização e alocação podem ser vistos, por exemplo, na distribuição de viaturas policiais, ambulâncias, viaturas de contenção de falhas em redes elétricas, taxistas, pontos de ônibus dentre outras inúmeras situações onde a localização de tais facilidades é um fator estratégico para a organização. Em problemas de localização/alocação de facilidades, geralmente aloca-se cada ponto de demanda à facilidade mais próxima e, localiza-se essa facilidade no centro dos pontos de demanda, considerando o valor da demanda como peso nessa distância. Entretanto, comumente, problemas reais de localização de facilidades possuem restrição de capacidade. Assim, cada facilidade possui uma certa capacidade em função do tipo de demanda. Problemas de localização de facilidades podem ser contínuos ou discretos. Em problemas contínuos (também chamados de problema de Weber com múltiplas fontes), qualquer ponto no plano é um potencial local para se instalar uma facilidade. Existem várias abordagens para trabalhar com modelos contínuos e outras tantas para trabalhar com modelos com restrição de capacidade, mas a maioria dessas abordagens realiza uma discretização do modelo. Assim, o objetivo desse trabalho é apresentar uma abordagem para gerar boas distribuições de facilidades para o problema de localização/alocação contínuo com restrição de capacidade. Um caso de estudo será apresentado com a finalidade de avaliar os resultados obtidos.
15

Hyperplane Clustering : A New Divisive Clustering Algorithm

Yogananda, A P 01 1900 (has links) (PDF)
No description available.
16

Topology Preserving Data Reductions for Computing Persistent Homology

Sens, Aaron M. 04 October 2021 (has links)
No description available.
17

Akcelerace algoritmů pro shlukování tunelů v proteinech / Acceleration of Algorithms for Clustering of Tunnels in Proteins

Jaroš, Marta January 2016 (has links)
This thesis deals with the clustering of tunnels in data obtained from the protein molecular dynamics simulation. This process is very computationaly intensive and it has been a challenge for scientific communities. The goal is to find such an algorithm with optimal time and space complexity ratio. The research of clustering algorithms, work with huge highdimensional datasets, visualisation and cluster-comparing methods are discussed. The thesis provides a proposal of the solution of this problem using the Twister Tries algorithm. The implementation details are analysed and the testing results of the solution quality and space complexity are provided. The goal of the thesis was to prove that we could achieve the same results with a stochastic algorithm - Twister Tries , as with an exact algorithm ( average-linkage ). This assumption was not confirmed confidently. Another finding of the hashing functions analysis shows that we could obtain the same results of hashing with a low dimensional hashing function but in much better computational time.
18

PATTERN EXTRACTION USING A CONTEXT DEPENDENT MEASURE OF DIVERGENCE AND ITS VALIDATION

TEMBE, WAIBHAV DEEPAK 11 October 2001 (has links)
No description available.
19

Generating fishing boats behaviour based on historic AIS data : A method to generate maritime trajectories based on historicpositional data / Genering av fiskebåtsbeteende baserat på historisk AIS dat

Bergman, Oscar January 2022 (has links)
This thesis describes a method to generate new trajectories based on historic positiondata for a given geographical area. The thesis uses AIS-data from fishing boats to first describe a method that uses DBSCAN and OPTICS algorithms to cluster the data into clustersbased on routes where the boats travel and areas where the boats fish.Here bayesian optimization has been utilized to search for parameters for the clusteringalgorithms. In this given scenario it was shown DBSCAN is better in all fields, but it hasmany points where OPTICS has the potential to become better if it was modified a bit.This is followed by a method describing how to take the clusters and build a nodenetwork that then can be traversed using a path finding algorithm combined with internalrules to generate new routes that can be used in simulations to give a realistic enoughsituation picture. Finally a method to evaluate these generated routes are described andused to compare the routes to each other
20

Agrupamento híbrido de dados utilizando algoritmos genéticos / Hybrid clustering techniques with genetic algorithms

Naldi, Murilo Coelho 16 October 2006 (has links)
Técnicas de Agrupamento vêm obtendo bons resultados quando utilizados em diversos problemas de análise de dados, como, por exemplo, a análise de dados de expressão gênica. Porém, uma mesma técnica de agrupamento utilizada em um mesmo conjunto de dados pode resultar em diferentes formas de agrupar esses dados, devido aos possíveis agrupamentos iniciais ou à utilização de diferentes valores para seus parâmetros livres. Assim, a obtenção de um bom agrupamento pode ser visto como um processo de otimização. Esse processo procura escolher bons agrupamentos iniciais e encontrar o melhor conjunto de valores para os parâmetros livres. Por serem métodos de busca global, Algoritmos Genéticos podem ser utilizados durante esse processo de otimização. O objetivo desse projeto de pesquisa é investigar a utilização de Técnicas de Agrupamento em conjunto com Algoritmos Genéticos para aprimorar a qualidade dos grupos encontrados por algoritmos de agrupamento, principalmente o k-médias. Esta investigação será realizada utilizando como aplicação a análise de dados de expressão gênica. Essa dissertação de mestrado apresenta uma revisão bibliográfica sobre os temas abordados no projeto, a descrição da metodologia utilizada, seu desenvolvimento e uma análise dos resultados obtidos. / Clustering techniques have been obtaining good results when used in several data analysis problems, like, for example, gene expression data analysis. However, the same clustering technique used for the same data set can result in different ways of clustering the data, due to the possible initial clustering or the use of different values for the free parameters. Thus, the obtainment of a good clustering can be seen as an optimization process. This process tries to obtain good clustering by selecting the best values for the free parameters. For being global search methods, Genetic Algorithms have been successfully used during the optimization process. The goal of this research project is to investigate the use of clustering techniques together with Genetic Algorithms to improve the quality of the clusters found by clustering algorithms, mainly the k-means. This investigation was carried out using as application the analysis of gene expression data, a Bioinformatics problem. This dissertation presents a bibliographic review of the issues covered in the project, the description of the methodology followed, its development and an analysis of the results obtained.

Page generated in 0.1231 seconds