Global ETD Search

1	Remote High Performance Visualization of Big Data for Immersive Science Abidi, Faiz Abbas 15 June 2017 (has links) Remote visualization has emerged as a necessary tool in the analysis of big data. High-performance computing clusters can provide several benefits in scaling to larger data sizes, from parallel file systems to larger RAM profiles to parallel computation among many CPUs and GPUs. For scalable data visualization, remote visualization tools and infrastructure is critical where only pixels and interaction events are sent over the network instead of the data. In this paper, we present our pipeline using VirtualGL, TurboVNC, and ParaView to render over 40 million points using remote HPC clusters and project over 26 million pixels in a CAVE-style system. We benchmark the system by varying the video stream compression parameters supported by TurboVNC and establish some best practices for typical usage scenarios. This work will help research scientists and academicians in scaling their big data visualizations for real time interaction. / Master of Science / With advancements made in the technology sector, there are now improved and more scientific ways to see the data. 10 years ago, nobody would have thought what a 3D movie is or how it would feel to watch a movie in 3D. Some may even have questioned if it is possible. But watching 3D cinema is typical now and we do not care much about what goes behind the scenes to make this experience possible. Similarly, is it possible to see and interact with 3D data in the same way Tony Stark does in the movie Iron Man? The answer is yes, it is possible with several tools available now and one of these tools is called ParaView, which is mostly used for scientific visualization of data like climate research, computational fluid dynamics, astronomy among other things. You can either visualize this data on a 2D screen or in a 3D environment where a user will feel a sense of immersion as if they are within the scene looking and interacting with the data. But where is this data actually drawn? And how much time does it take to draw if we are dealing with large datasets? Do we want to draw all this 3D data on a local machine or can we make use of powerful remote machines that do the drawing part and send the final image through a network to the client? In most cases, drawing on a remote machine is a better solution when dealing with big data and the biggest bottleneck is how fast can data be sent to and received from the remote machines. In this work, we seek to understand the best practices of drawing big data on remote machines using ParaView and visualizing it in a 3D projection room like a CAVE (see section 2.2 for details on what is a CAVE). Remote rendering CAVE HPC ParaView Big Data
2	Ensembles for Distributed Data Shoemaker, Larry 21 October 2005 (has links) Many simulation data sets are so massive that they must be distributed among disk farms attached to different computing nodes. The data is partitioned into spatially disjoint sets that are not easily transferable among nodes due to bandwidth limitations. Conventional machine learning methods are not designed for this type of data distribution. Experts mark a training data set with different levels of saliency emphasizing speed rather than accuracy due to the size of the task. The challenge is to develop machine learning methods that learn how the expert has marked the training data so that similar test data sets can be marked more efficiently. Ensembles of machine learning classifiers are typically more accurate than individual classifiers. An ensemble of machine learning classifiers requires substantially less memory than the corresponding partition of the data set. This allows the transfer of ensembles among partitions. If all the ensembles are sent to each partition, they can vote for a level of saliency for each example in the partition. Different partitions of the data set may not have any salient points, especially if the data set has a time step dimension. This means the learned classifier for such partitions can not vote for saliency since they have not been trained to recognize it. In this work, we investigate the performance of different ensembles of classifiers on spatially partitioned data sets. Success is measured by the correct recognition of unknown and salient regions of data points. Random forests Nearest centroid Exodus ParaView Region labeling American Studies Arts and Humanities

Search results

Remote High Performance Visualization of Big Data for Immersive Science

Ensembles for Distributed Data