Global ETD Search

1	Distributed Algorithms for SVD-based Least Squares Estimation Peng, Yu-Ting 19 July 2011 (has links) Singular value decomposition (SVD) is a popular decomposition method for solving least-squares estimation problems. However, for large datasets, SVD is very time consuming and memory demanding in obtaining least squares solutions. In this paper, we propose a least squares estimator based on an iterative divide-and-merge scheme for large-scale estimation problems. The estimator consists of several levels. At each level, the input matrices are subdivided into submatrices. The submatrices are decomposed by SVD respectively and the results are merged into smaller matrices which become the input of the next level. The process is iterated until the resulting matrices are small enough which can then be solved directly and efficiently by the SVD algorithm. However, the iterative divide-and-merge algorithms executed on a single machine is still time demanding on large scale datasets. We propose two distributed algorithms to overcome this shortcoming by permitting several machines to perform the decomposition and merging of the submatrices in each level in parallel. The first one is implemented in MapReduce on the Hadoop distributed platform which can run the tasks in parallel on a collection of computers. The second one is implemented on CUDA which can run the tasks in parallel using the Nvidia GPUs. Experimental results demonstrate that the proposed distributed algorithms can greatly reduce the time required to solve large-squares problems. CUDA Matrix decomposition large-scale dataset least-squares solution SVD MapReduce Distributed
2	A Similarity-based Data Reduction Approach Ouyang, Jeng 07 September 2009 (has links) Finding an efficient data reduction method for large-scale problems is an imperative task. In this paper, we propose a similarity-based self-constructing fuzzy clustering algorithm to do the sampling of instances for the classification task. Instances that are similar to each other are grouped into the same cluster. When all the instances have been fed in, a number of clusters are formed automatically. Then the statistical mean for each cluster will be regarded as representing all the instances covered in the cluster. This approach has two advantages. One is that it can be faster and uses less storage memory. The other is that the number of new representative instances need not be specified in advance by the user. Experiments on real-world datasets show that our method can run faster and obtain better reduction rate than other methods. fuzzy similarity Large-scale dataset data reduction prototype reduction instance-filtering instance-abstraction
3	Distribution-based Summarization for Large Scale Simulation Data Visualization and Analysis Wang, Ko-Chih 11 July 2019 (has links) No description available. Computer Science Computer Engineering

1

Page generated in 0.0355 seconds