Global ETD Search

11	DARM: Distance-Based Association Rule Mining Icev, Aleksandar 06 May 2003 (has links) The main goal of this thesis work was to develop, implement and evaluate an algorithm that enables mining association rules from datasets that contain quantified distance information among the items. This was accomplished by extending and enhancing the Apriori Algorithm, which is the standard algorithm to mine association rules. The Apriori algorithm is not able to mine association rules that contain distance information among the items that construct the rules. This thesis enhances the main Apriori property by requiring itemsets forming rules to“deviate properly" in addition to satisfying the minimal support threshold. We say that an itemset deviates properly if all combinations of pair-wise distances among the items are highly conserved in the dataset instances where these items occur. This thesis introduces the notion of proper deviation and provides the precise procedure and measures that characterize it. Integrating the notion of distance preserving frequent itemset and proper deviation into the standard Apriori algorithm leads to the construction of our Distance-Based Association Rule Mining (DARM) algorithm. DARM can be applied in data mining and knowledge discovery from genetic, financial, retail, time sequence data, or any domain where the distance information between items is of importance. This thesis chose the area of gene expression and regulation in eukaryotic organisms as the application domain. The data from the domain was used to produce DARM rules. Sets of those rules were used for building predictive models. The accuracy of those models was tested. In addition, predictive accuracies of the models built with and without distance information were compared. spatial data mining distance-based association rules distance-based Apriori algorithm Data mining Gene expression Data processing Eukaryotic cells
12	Data mining of geospatial data: combining visual and automatic methods Demšar, Urška January 2006 (has links) Most of the largest databases currently available have a strong geospatial component and contain potentially useful information which might be of value. The discipline concerned with extracting this information and knowledge is data mining. Knowledge discovery is performed by applying automatic algorithms which recognise patterns in the data. Classical data mining algorithms assume that data are independently generated and identically distributed. Geospatial data are multidimensional, spatially autocorrelated and heterogeneous. These properties make classical data mining algorithms inappropriate for geospatial data, as their basic assumptions cease to be valid. Extracting knowledge from geospatial data therefore requires special approaches. One way to do that is to use visual data mining, where the data is presented in visual form for a human to perform the pattern recognition. When visual mining is applied to geospatial data, it is part of the discipline called exploratory geovisualisation. Both automatic and visual data mining have their respective advantages. Computers can treat large amounts of data much faster than humans, while humans are able to recognise objects and visually explore data much more effectively than computers. A combination of visual and automatic data mining draws together human cognitive skills and computer efficiency and permits faster and more efficient knowledge discovery. This thesis investigates if a combination of visual and automatic data mining is useful for exploration of geospatial data. Three case studies illustrate three different combinations of methods. Hierarchical clustering is combined with visual data mining for exploration of geographical metadata in the first case study. The second case study presents an attempt to explore an environmental dataset by a combination of visual mining and a Self-Organising Map. Spatial pre-processing and visual data mining methods were used in the third case study for emergency response data. Contemporary system design methods involve user participation at all stages. These methods originated in the field of Human-Computer Interaction, but have been adapted for the geovisualisation issues related to spatial problem solving. Attention to user-centred design was present in all three case studies, but the principles were fully followed only for the third case study, where a usability assessment was performed using a combination of a formal evaluation and exploratory usability. / QC 20110118 geographic information science geoinformatics geovisualisation spatial data mining visual data mining usability evaluation Other information technology Övrig informationsteknik
13	GIS, data mining and wild land fire data within Räddningstjänsten Sandell, Anna January 2001 (has links) <p>Geographical information systems (GIS), data mining and wild land fire would theoretically be suitable to use together. However, would data mining in reality bring out any useful information from wild land fire data stored within a GIS? In this report an investigation is done if GIS and data mining are used within Räddningstjänsten today in some municipalities of the former Skaraborg. The investigation shows that neither data mining nor GIS are used within the investigated municipalities. However, there is an interest in using GIS within the organisations in the future but also some kind of analysis tool, for example data mining. To show how GIS and data mining could be used in the future within Räddningstjänsten some examples on this were constructed.</p> Geographical Information Systems GIS Wild land fire Data mining Spatial data mining Computer and systems science Data- och systemvetenskap
14	Data mining of geospatial data: combining visual and automatic methods Demšar, Urška January 2006 (has links) <p>Most of the largest databases currently available have a strong geospatial component and contain potentially useful information which might be of value. The discipline concerned with extracting this information and knowledge is data mining. Knowledge discovery is performed by applying automatic algorithms which recognise patterns in the data.</p><p>Classical data mining algorithms assume that data are independently generated and identically distributed. Geospatial data are multidimensional, spatially autocorrelated and heterogeneous. These properties make classical data mining algorithms inappropriate for geospatial data, as their basic assumptions cease to be valid. Extracting knowledge from geospatial data therefore requires special approaches. One way to do that is to use visual data mining, where the data is presented in visual form for a human to perform the pattern recognition. When visual mining is applied to geospatial data, it is part of the discipline called exploratory geovisualisation.</p><p>Both automatic and visual data mining have their respective advantages. Computers can treat large amounts of data much faster than humans, while humans are able to recognise objects and visually explore data much more effectively than computers. A combination of visual and automatic data mining draws together human cognitive skills and computer efficiency and permits faster and more efficient knowledge discovery.</p><p>This thesis investigates if a combination of visual and automatic data mining is useful for exploration of geospatial data. Three case studies illustrate three different combinations of methods. Hierarchical clustering is combined with visual data mining for exploration of geographical metadata in the first case study. The second case study presents an attempt to explore an environmental dataset by a combination of visual mining and a Self-Organising Map. Spatial pre-processing and visual data mining methods were used in the third case study for emergency response data.</p><p>Contemporary system design methods involve user participation at all stages. These methods originated in the field of Human-Computer Interaction, but have been adapted for the geovisualisation issues related to spatial problem solving. Attention to user-centred design was present in all three case studies, but the principles were fully followed only for the third case study, where a usability assessment was performed using a combination of a formal evaluation and exploratory usability.</p> geographic information science geoinformatics geovisualisation spatial data mining visual data mining usability evaluation Other information technology Övrig informationsteknik
15	Otimização de algoritmos de agrupamento espacial baseado em densidade aplicados em grandes conjuntos de dados / Optimization of Density-Based Spatial Clustering Algorithms Applied to Large Data Sets Daniel, Guilherme Priólli [UNESP] 12 August 2016 (has links) Submitted by Guilherme Priólli Daniel (gui.computacao@yahoo.com.br) on 2016-09-06T13:30:29Z No. of bitstreams: 1 Dissertação_final.pdf: 2456534 bytes, checksum: 4d2279141f7c034de1e4e4e261805db8 (MD5) / Approved for entry into archive by Juliano Benedito Ferreira (julianoferreira@reitoria.unesp.br) on 2016-09-09T17:54:56Z (GMT) No. of bitstreams: 1 daniel_gp_me_sjrp.pdf: 2456534 bytes, checksum: 4d2279141f7c034de1e4e4e261805db8 (MD5) / Made available in DSpace on 2016-09-09T17:54:56Z (GMT). No. of bitstreams: 1 daniel_gp_me_sjrp.pdf: 2456534 bytes, checksum: 4d2279141f7c034de1e4e4e261805db8 (MD5) Previous issue date: 2016-08-12 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / A quantidade de dados gerenciados por serviços Web de grande escala tem crescido significantemente e passaram a ser chamados de Big Data. Esses conjuntos de dados podem ser definidos como um grande volume de dados complexos provenientes de múltiplas fontes que ultrapassam a capacidade de armazenamento e processamento dos computadores atuais. Dentro desses conjuntos, estima-se que 80% dos dados possuem associação com alguma posição espacial. Os dados espaciais são mais complexos e demandam mais tempo de processamento que os dados alfanuméricos. Nesse sentido, as técnicas de MapReduce e sua implementação têm sido utilizadas a fim de retornar resultados em tempo hábil com a paralelização dos algoritmos de prospecção de dados. Portanto, o presente trabalho propõe dois algoritmos de agrupamento espacial baseado em densidade: o VDBSCAN-MR e o OVDBSCAN-MR. Ambos os algoritmos utilizam técnicas de processamento distribuído e escalável baseadas no modelo de programação MapReduce com intuito de otimizar o desempenho e permitir a análise em conjuntos Big Data. Por meio dos experimentos realizados foi possível verificar que os algoritmos desenvolvidos apresentaram melhor qualidade nos agrupamentos encontrados em comparação com os algoritmos tomados como base. Além disso, o VDBSCAN-MR obteve um melhor desempenho que o algoritmo sequencial e suportou a aplicação em grandes conjuntos de dados espaciais. / The amount of data managed by large-scale Web services has increased significantly and it arise to the status of Big Data. These data sets can be defined as a large volume of complex data from multiple data sources exceeding the storage and processing capacity of current computers. In such data sets, about 80% of the data is associated with some spatial position. Spatial data is even more complex and require more processing time than what would be required for alphanumeric data. In that sense, MapReduce techniques and their implementation have returned results timely with parallelization of data mining algorithms and could apply for Big Data sets. Therefore, this work develops two density-based spatial clustering algorithms: VDBSCAN-MR and OVDBSCAN-MR. Both algorithms use distributed and scalable processing techniques based on the MapReduce programming model in order to optimize performance and enable Big Data analysis. Throughout experimentation, we observed that the developed algorithms have better quality clusters compared to the base algorithms. Furthermore, VDBSCAN-MR achieved a better performance than the original sequential algorithm and it supported the application on large spatial data sets. VDBSCAN-MR OVDBSCAN-MR Big Data Prospecção de dados espaciais Spatial Data Mining Agrupamento Espacial Spatial Clustering MapReduce
16	GIS, data mining and wild land fire data within Räddningstjänsten Sandell, Anna January 2001 (has links) Geographical information systems (GIS), data mining and wild land fire would theoretically be suitable to use together. However, would data mining in reality bring out any useful information from wild land fire data stored within a GIS? In this report an investigation is done if GIS and data mining are used within Räddningstjänsten today in some municipalities of the former Skaraborg. The investigation shows that neither data mining nor GIS are used within the investigated municipalities. However, there is an interest in using GIS within the organisations in the future but also some kind of analysis tool, for example data mining. To show how GIS and data mining could be used in the future within Räddningstjänsten some examples on this were constructed. Geographical Information Systems GIS Wild land fire Data mining Spatial data mining Information Systems
17	A Language and Visual Interface to Specify Complex Spatial Pattern Mining Li, Xiaohui 12 1900 (has links) The emerging interests in spatial pattern mining leads to the demand for a flexible spatial pattern mining language, on which easy to use and understand visual pattern language could be built. It is worthwhile to define a pattern mining language called LCSPM to allow users to specify complex spatial patterns. I describe a proposed pattern mining language in this paper. A visual interface which allows users to specify the patterns visually is developed. Visual pattern queries are translated into the LCSPM language by a parser and data mining process can be triggered afterwards. The visual language is based on and goes beyond the visual language proposed in literature. I implemented a prototype system based on the open source JUMP framework. Data mining. Pattern perception. spatial data mining visual interface complex spatial pattern matrix analysis prototype system JUMP
18	Application of Spatiotemporal Data Mining to Air Quality Data Biancardi, Michael Anthony 05 1900 (has links) This thesis explores the use of spatiotemporal data mining in the air quality domain to understand causes of PM2.5 air pollution. PM2.5 refers to fine particulate matter less than 2.5 microns in diameter and is a major threat to human and environmental health. A review of air quality modeling methods is provided, emphasizing data-driven modeling techniques. While data mining methods have been applied to air quality data, including temporal sequence mining algorithms, spatiotemporal sequence mining methods have not been broadly applied to study air pollution. However, air pollution is highly spatial in nature, so such methods can offer new insights into air quality. This thesis applies one such method, the Spatiotemporal Sequence Miner (STS Miner) algorithm, to air quality data from a low-cost sensor network to explore causes and trends related to PM2.5. To facilitate the use of this method, an open-source library called OpenSTSMiner is developed to implement this algorithm. Various domain results are found; for instance, low temperature and low relative humidity are strongly associated with worsening levels of air quality. Lastly, to highlight the utility of the STS Miner algorithm, a comparison is presented between STS Miner and spatial Markov chains, another spatiotemporal modeling method used in the air quality domain. Spatiotemporal data mining air pollution air quality data mining spatial data mining spatial statistics Computer Science Environmental Sciences Geography
19	Spatial scale analysis of landscape processes for digital soil mapping in Ireland Cavazzi, Stefano January 2013 (has links) Soil is one of the most precious resources on Earth because of its role in storing and recycling water and nutrients essential for life, providing a variety of ecosystem services. This vulnerable resource is at risk from degradation by erosion, salinity, contamination and other effects of mismanagement. Information from soil is therefore crucial for its sustainable management. While the demand for soil information is growing, the quantity of data collected in the field is reducing due to financial constraints. Digital Soil Mapping (DSM) supports the creation of geographically referenced soil databases generated by using field observations or legacy data coupled, through quantitative relationships, with environmental covariates. This enables the creation of soil maps at unexplored locations at reduced costs. The selection of an optimal scale for environmental covariates is still an unsolved issue affecting the accuracy of DSM. The overall aim of this research was to explore the effect of spatial scale alterations of environmental covariates in DSM. Three main targets were identified: assessing the impact of spatial scale alterations on classifying soil taxonomic units; investigating existing approaches from related scientific fields for the detection of scale patterns and finally enabling practitioners to find a suitable scale for environmental covariates by developing a new methodology for spatial scale analysis in DSM. Three study areas, covered by detailed reconnaissance soil survey, were identified in the Republic of Ireland. Their different pedological and geomorphological characteristics allowed to test scale behaviours across the spectrum of conditions present in the Irish landscape. The investigation started by examining the effects of scale alteration of the finest resolution environmental covariate, the Digital Elevation Model (DEM), on the classification of soil taxonomic units. Empirical approaches from related scientific fields were subsequently selected from the literature, applied to the study areas and compared with the experimental methodology. Wavelet analysis was also employed to decompose the DEMs into a series of independent components at varying scales and then used in DSM analysis of soil taxonomic units. Finally, a new multiscale methodology was developed and evaluated against the previously presented experimental results. The results obtained by the experimental methodology have proved the significant role of scale alterations in the classification accuracy of soil taxonomic units, challenging the common practice of using the finest available resolution of DEM in DSM analysis. The set of eight empirical approaches selected in the literature have been proved to have a detrimental effect on the selection of an optimal DEM scale for DSM applications. Wavelet analysis was shown effective in removing DEM sources of variation, increasing DSM model performance by spatially decomposing the DEM. Finally, my main contribution to knowledge has been developing a new multiscale methodology for DSM applications by combining a DEM segmentation technique performed by k-means clustering of local variograms parameters calculated in a moving window with an experimental methodology altering DEM scales. The newly developed multiscale methodology offers a way to significantly improve classification accuracy of soil taxonomic units in DSM. In conclusion, this research has shown that spatial scale analysis of environmental covariates significantly enhances the practice of DSM, improving overall classification accuracy of soil taxonomic units. The newly developed multiscale methodology can be successfully integrated in current DSM analysis of soil taxonomic units performed with data mining techniques, so advancing the practice of soil mapping. The future of DSM, as it successfully progresses from the early pioneering years into an established discipline, will have to include scale and in particular multiscale investigations in its methodology. DSM will have to move from a methodology of spatial data with scale to a spatial scale methodology. It is now time to consider scale as a key soil and modelling attribute in DSM. 631.4
20	Avaliação da eficiência do uso da mineração de dados clássica e espacial na estimativa de produtividade de grãos em imagens obtidas por meio de aeronave remotamente pilotada Viniski, Antônio David 16 March 2018 (has links) Submitted by Angela Maria de Oliveira (amolivei@uepg.br) on 2018-05-08T17:08:26Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Antonio David Viniski.pdf: 3962317 bytes, checksum: f5afcd11e4083b0ae065ae21490ac77f (MD5) / Made available in DSpace on 2018-05-08T17:08:26Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Antonio David Viniski.pdf: 3962317 bytes, checksum: f5afcd11e4083b0ae065ae21490ac77f (MD5) Previous issue date: 2018-03-16 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / O sensoriamento remoto agrícola tem fornecido um volumoso conjunto de dados espaciais, os quais podem ser utilizados em diferentes segmentos, como na estimativa de produtividade de grãos. Dentre as tecnologias empregadas no SR, a utilização de aeronaves remotamente pilotadas (RPA) na agricultura vêm crescendo, sendo uma alternativa na obtenção de dados para a estimativa de produtividade. Porém, esses conjuntos de dados gerados demandam métodos e técnicas capazes de extrair informações úteis e relevantes dos mesmos. Algumas técnicas de geoestatística, como a krigagem, têm sido empregadas, mas a utilização da mineração de dados (MD), assim como da mineração de dados espaciais (MDE), podem ser alternativas viáveis para suprir essa demanda. Este trabalho teve como objetivo avaliar o uso de técnicas de MD e MDE na estimativa da produtividade de grãos de soja e trigo, utilizando dados de imagens obtidas por meio de RPA. A área de estudo localiza-se no município de Piraí do Sul, Paraná. Foi utilizada uma RPA de asa fixa para o acompanhamento das culturas de soja e trigo. No imageamento do trigo foram utilizadas duas câmeras, uma com a captura de imagens no espectro visível (RGB), e outra no infravermelho próximo (NIR), tendo sendo analisadas também as resoluções espaciais de 10 e 20 cm/pixel para cada câmera. Para a soja apenas a câmera RGB foi utilizada e as resoluções espaciais sobrevoadas foram 10, 20 e 26 cm/pixel. Os dados do atributo meta, a produtividade das culturas, foram obtidos por meio de colhedoras de precisão. Os atributos de predição, correspondendo aos valores das bandas espectrais e altitude do terreno, foram submetidos aos algoritmos de MD empregando as técnicas de regressão linear múltipla (RLM), redes neurais artificiais (RNA) e máquina de vetores de suporte para regressão (SVR). Para a MDE, foi utilizado o modelo aditivo generalizado (GAM). Para fins de comparação, os dados foram também analisados pelo método tradicional de krigagem. As técnicas foram testadas considerando duas abordagens principais: (i) utilizando apenas as bandas espectrais para estimativa e, (ii) utilizando as bandas espectrais e os valores de altitude do terreno. Para a MD clássica, os melhores resultados foram obtidos com a técnica SVR, utilizando o kernel Laplacian. Na MDE, o método GAM com a função de ajuste gaussiana apresentou os melhores resultados. Tanto para as técnicas clássicas de MD como para a MDE, a incorporação da altitude nos modelos de regressão possibilitou aumento considerável nos coeficientes de correlação e determinação, com consequente diminuição no erro (RMSE). Os valores de correlação obtidos com a MDE foram semelhantes aos obtidos com o método de krigagem, porém a MDE foi mais eficiente em avaliar o impacto dos atributos de predição (valores das bandas espectrais e altitude) na estimativa do atributo meta. Com isso, conclui-se que a MDE mostra-se viável de ser utilizada como ferramenta na geração de modelos para estimativa de produtividade de grãos com base em dados de imagens de RPA. / Agricultural remote sensing (RS) has provided a massive set of spatial data which can be used in different segments, such as in grain yield estimation. Among the technologies applied in RS, the use of remotely piloted aircraft (RPA) in agriculture is growing as an alternative to obtain data for estimating productivity. However, these generated data sets require methods and techniques capable of extracting useful and relevant information from them. Some geostatistics techniques have been applied, such as kriging, but the use of data mining (DM) as well as spatial data mining (SDM) can be viable alternatives to meet that demand. The goal of this work was to evaluate the use of DM and SDM techniques for estimating soybean and wheat grain yield using image data obtained by RPA. The study area is located in Piraí do Sul, Paraná State. A fixed wing RPA was used to monitor soybean and wheat crops. In wheat crop imaging two cameras were used, one to capture images in the visible spectrum (RGB), and the other one using the near infrared (NIR) spectrum. Also, it was analyzed the spatial resolutions of 10 and 20 cm / pixel for each camera. For soybean only the RGB camera was used and the overhead spatial resolutions were 10, 20 and 26 cm / pixel. The goal attribute data (crop yield), was obtained by precision harvester. The prediction attributes, corresponding to the values of spectral bands and terrain altitude, were submitted to DM algorithms using the multiple linear regression (MLR), artificial neural networks (ANN) and support vector regression (SVR) techniques. For SDM, the generalized additive model (GAM) was used. For comparison purposes, data were also analyzed by the traditional kriging method. The techniques were tested using two main approaches: (i) using only spectral bands for estimation and, (ii) using spectral bands and terrain altitude values. For classical DM, the best results were obtained with SVR technique, using the Laplacian kernel. The GAM method with the Gaussian fit function presented the best results for SDM. For both classical DM and SDM techniques, adding altitude in the regression models allowed a considerable increase in correlation and determination coefficients, with consequent decrease in error (RMSE). The correlation values obtained with SDM were similar to those obtained with kriging method, but SDM was more efficient in evaluating the impact of the prediction attributes (spectral bands and altitude) in the estimation of the goal attribute. Thus, it is concluded that SDM can be useful as a tool for estimating grain yield based on RPA image data. Mineração de dados espaciais sensoriamento remoto aprendizado de máquina resolução espacial drone krigagem. spatial data mining remote sensing machine learning spatial resolution drone kriging

Search results