Global ETD Search

1	The GDense Algorithm for Clustering Data Streams with High Quality Lin, Shu-Yi 25 June 2009 (has links) In recent years, mining data streams has been widely studied. A data streams is a sequence of dynamic, continuous, unbounded and real time data items with a very high data rate that can only be read once. In data mining, clustering is one of use- ful techniques for discovering interesting data in the underlying data objects. The problem of clustering can be defined formally as follows: given n data points in the d- dimensional metric space, partition the data points into k clusters such that the data points within a cluster are more similar to each other than data points in different clusters. In the data streams environment, the difficulties of data streams clustering contain storage overhead, low clustering quality and a low updating efficiency. Cur- rent clustering algorithms can be broadly classified into four categories: partition, hierarchical, density-based and grid-based approaches. The advantage of the grid- based algorithm is that it can handle large databases. Based on the density-based approach, the insertion or deletion of data affects the current clustering only in the neighborhood of this data. Combining the advantages of the grid-based approach and density-based approach, the CDS-Tree algorithm was proposed. Although it can handle large databases, its clustering quality is restricted to the grid partition and the threshold of a dense cell. Therefore, in this thesis, we present a new clustering algo- rithm with high quality, GDense, for data streams. The GDense algorithm has high quality due to two kinds of partition: cells and quadcells, and two kinds of threshold: £_ and (1/4) . Moreover, in our GDense algorithm, in the data insertion part, the 7 cases takes 3 factors about the cell and the quadcell into consideration. In the deletion part, the 10 cases take 5 factors about the cell into consideration. From our simulation results, no matter what condition (including the number of data points, the number of cells, the size of the sliding window, and the threshold of dense cell) is, the clustering purity of our GDense algorithm is always higher than that of the CDS-Tree algorithm. Moreover, we make a comparison of the purity between the our GDense algorithm and the CDS-Tree algorithm with outliers. No matter whether the number of outliers is large or small, the clustering purity of our GDense algorithm is still higher than that of the CDS-Tree and we can improve about 20% the clustering purity as compared to the CDS-Tree algorithm. density-based grid-based clustering data streams
2	PERFORMANCE STUDY OF SOW-AND-GROW: A NEW CLUSTERING ALGORITHM FOR BIG DATA Maier, Joshua 01 May 2020 (has links) DBSCAN is a density-based clustering algorithm that is known for being able to cluster irregular shaped clusters and can handle noise points as well. For very large sets of data, however, this algorithm becomes inefficient because it must go through each and every point and look at its neighborhood in order to determine the clusters. Also, DBSCAN is hard to implement in parallel due to the structure of the data and its sequential data access. The Sow and Grow algorithm is a parallel, density-based clustering algorithm. It utilizes a concept of growing points in order to more efficiently find clusters as opposed to going through every point in the dataset in a sequential order. We create an initial seed set of variable size based on user input and a dynamic growing points vector to cluster the data. Our algorithm is designed for shared memory and can be run in parallel using threads. For our experiments, multiple datasets were used with a varying number of points and dimensions. We used this dataset to show the significant speedup the Sow-and-Grow algorithm produces as compared to other parallel, density-based clustering algorithms. On some datasets, Sow-and-Grow achieves a speedup of 8 times faster than another density-based algorithm. We also looked at how changing the number of seeds affects the results in terms of runtime and clusters discovered. clustering density-based parallel Sow-and-Grow
3	Design and implementation of scalable hierarchical density based clustering Dhandapani, Sankari 09 November 2010 (has links) Clustering is a useful technique that divides data points into groups, also known as clusters, such that the data points of the same cluster exhibit similar properties. Typical clustering algorithms assign each data point to at least one cluster. However, in practical datasets like microarray gene dataset, only a subset of the genes are highly correlated and the dataset is often polluted with a huge volume of genes that are irrelevant. In such cases, it is important to ignore the poorly correlated genes and just cluster the highly correlated genes. Automated Hierarchical Density Shaving (Auto-HDS) is a non-parametric density based technique that partitions only the relevant subset of the dataset into multiple clusters while pruning the rest. Auto-HDS performs a hierarchical clustering that identifies dense clusters of different densities and finds a compact hierarchy of the clusters identified. Some of the key features of Auto-HDS include selection and ranking of clusters using custom stability criterion and a topologically meaningful 2D projection and visualization of the clusters discovered in the higher dimensional original space. However, a key limitation of Auto-HDS is that it requires O(nn) storage, and O(nn*logn) computational complexity, making it scale up to only a few 10s of thousands of points. In this thesis, two extensions to Auto-HDS are presented for lower dimensional datasets that can generate clustering identical to Auto-HDS but can scale to much larger datasets. We first introduce Partitioned Auto-HDS that provides significant reduction in time and space complexity and makes it possible to generate the Auto-HDS cluster hierarchy on much larger datasets with 100s of millions of data points. Then, we describe Parallel Auto-HDS that takes advantage of the inherent parallelism available in Partitioned Auto-HDS to scale to even larger datasets without a corresponding increase in actual run time when a group of processors are available for parallel execution. Partitioned Auto-HDS is implemented on top of GeneDIVER, a previously existing Java based streaming implementation of Auto-HDS, and thus it retains all the key features of Auto-HDS including ranking, automatic selection of clusters and 2D visualization of the discovered cluster topology. / text Density based clustering Hierarchical clustering Hadoop Map-reduce
4	Mobile Location Estimation Using Genetic Algorithm and Clustering Technique for NLOS Environments Hung, Chung-Ching 10 September 2007 (has links) For the mass demands of personalized security services, such as tracking, supervision, and emergent rescue, the location technologies of mobile communication have drawn much attention of the governments, academia, and industries around the world. However, existing location methods cannot satisfy the requirements of low cost and high accuracy. We hypothesized that a new mobile location algorithm based on the current GSM system will effectively improve user satisfaction. In this study, a prototype system will be developed, implemented, and experimented by integrating the useful information such as the geometry of the cell layout, and the related mobile positioning technologies. The intersection of the regions formed by the communication space of the base stations will be explored. Furthermore, the density-based clustering algorithm (DCA) and GA-based algorithm will be designed to analyze the intersection region and estimate the most possible location of a mobile phone. Simulation results show that the location error of the GA-based is less than 0.075 km for 67% of the time, and less than 0.15 km for 95% of the time. The results of the experiments satisfy the location accuracy demand of E-911. Non-line-of-sight (NLOS) Genetic algorithm Density-based clustering algorithm (DCA)
5	Handwritten digit and script recognition using density based random vector functional link network Park, Gwang Hoon January 1995 (has links) No description available. Neural network
6	Discovering Intrinsic Points of Interest from Spatial Trajectory Data Sources Piekenbrock, Matthew J. 13 June 2018 (has links) No description available. Computer Science Clustering Density-based clustering point of interest discovery
7	Analýza vlivu proudění plynu v oblasti umístění vzorku v komoře enviromentálního rastrovacího elektronového mikroskopu / Analysis of the influence of the gas flow in the placement of the sample in the chamber of the environmental scanning electron microscope Bednář, Eduard January 2016 (has links) This thesis deals with the simulation of fluid dynamics in environmental scanning electron microscope and evaluate solvers setup, the degree of discretization, choice of turbulent model and proposal optimal design of environmental scanning electron microscope. The theoretical part describes the issue of environmental scanning electron microscopy, software SolidWorks and ANSYS Fluent, basic equations describing fluid status, fluid turbulence, the mean free path of molecules and electron scattering. The practical part of the thesis is to create the model of environmental scanning electron microscope AQUASEM II in CAD system SolidWorks and simulation of fluid flow in the sample chamber before aperture PLA1 by ANSYS Fluent. A series of simulations provided the perfect setting solver. These knowledge are used in the second stage of the practical part, where is proposed optimal shape of the table sample and the input aperture PLA1.
8	Understanding methods for internal and external preference mapping and clustering in sensory analysis Yenket, Renoo January 1900 (has links) Doctor of Philosophy / Department of Human Nutrition / Edgar Chambers IV / Preference mapping is a method that provides product development directions for developers to see a whole picture of products, liking and relevant descriptors in a target market. Many statistical methods and commercial statistical software programs offering preference mapping analyses are available to researchers. Because of numerous available options, there are two questions addressed in this research that most scientists must answer before choosing a method of analysis: 1) are the different methods providing the same interpretation, co-ordinate values and object orientation; and 2) which method and program should be used with the data provided? This research used data from paint, milk and fragrance studies, representing complexity from lesser to higher. The techniques used are principal component analysis, multidimensional preference map (MDPREF), modified preference map (PREFMAP), canonical variate analysis, generalized procrustes analysis and partial least square regression utilizing statistical software programs of SAS, Unscrambler, Senstools and XLSTAT. Moreover, the homogeneousness of consumer data were investigated through hierarchical cluster analysis (McQuitty’s similarity analysis, median, single linkage, complete linkage, average linkage, and Ward’s method), partitional algorithm (k-means method), nonparametric method versus four manual clustering groups (strict, strict-liking-only, loose, loose-liking-only segments). The manual clusters were extracted according to the most frequently rated highest for best liked and least liked products on hedonic ratings. Furthermore, impacts of plotting preference maps for individual clusters were explored with and without the use of an overall mean liking vector. Results illustrated various statistical software programs were not similar in their oriented and co-ordinate values, even when using the same preference method. Also, if data were not highly homogenous, interpretation could be different. Most computer cluster analyses did not segment consumers relevant to their preferences and did not yield as homogenous clusters as manual clustering. The interpretation of preference maps created by the highest homogeneous clusters had little improvement when applied to complicated data. Researchers should look at key findings from univariate data in descriptive sensory studies to obtain accurate interpretations and suggestions from the maps, especially for external preference mapping. When researchers make recommendations based on an external map alone for complicated data, preference maps may be overused. External Internal preference map Consumer Hierarchical Paritional cluster analysis Density-based algorithm Food Science (0359)
9	Density Based Data Clustering Albarakati, Rayan 01 March 2015 (has links) Data clustering is a data analysis technique that groups data based on a measure of similarity. When data is well clustered the similarities between the objects in the same group are high, while the similarities between objects in different groups are low. The data clustering technique is widely applied in a variety of areas such as bioinformatics, image segmentation and market research. This project conducted an in-depth study on data clustering with focus on density-based clustering methods. The latest density-based (CFSFDP) algorithm is based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively larger distance from points with higher densities. This method has been examined, experimented, and improved. These methods (KNN-based, Gaussian Kernel-based and Iterative Gaussian Kernel-based) are applied in this project to improve (CFSFDP) density-based clustering. The methods are applied to four milestone datasets and the results are analyzed and compared. Clustering analysis density-based CFSFDP Iterative Gaussian Kernel-based Other Computer Engineering
10	Visual Data Mining Techniques for Functional Actigraphy Data: An Object-Oriented Approach in R Sharif, Abbass 01 December 2012 (has links) Actigraphy, a technology for measuring a subject's overall activity level almost continuously over time, has gained a lot of momentum over the last few years. An actigraph, a watch-like device that can be attached to the wrist or ankle of a subject, uses an accelerometer to measure human movement every minute or even every 15 seconds. Actigraphy data is often treated as functional data. In this dissertation, we discuss what has been done regarding the visualization of actigraphy data, and then we will explain the three main goals we achieved: (i) develop new multivariate visualization techniques for actigraphy data; (ii) integrate the new and current visualization tools into an R package using object-oriented model design; and (iii) develop an adaptive user-friendly web interface for actigraphy software. data aggregation data envelopes data images density-based plots time series visualization Statistics and Probability

Search results