• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 585
  • 114
  • 109
  • 75
  • 40
  • 39
  • 27
  • 22
  • 19
  • 10
  • 7
  • 7
  • 6
  • 6
  • 5
  • Tagged with
  • 1212
  • 1212
  • 177
  • 164
  • 163
  • 152
  • 149
  • 148
  • 145
  • 128
  • 112
  • 110
  • 109
  • 108
  • 107
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Big Maritime Data: The promises and perils of the Automatic Identification System : Shipowners and operators’ perceptions

Kouvaras, Andreas January 2022 (has links)
The term Big Data has been gaining importance both at the academic and at the business level. Information technology plays a critical role in shipping since there is a high demand for fast transfer and communication between the parts of a shipping contract. The development of Automatic Identification System (AIS) is intended to improve maritime safety by tracking the vessels and exchange inter-ship information.  This master’s thesis purpose was to a) investigate in which business decisions the Automatic Identification System helps the shipowners and operators (i.e., users), b) find the benefits and perils arisen from its use, and c) investigate the possible improvements based on the users’ perceptions. This master’s thesis is a qualitative study using the interpretivism paradigm. Data were collected through semi-structured interviews. A total of 6 people participated with the following criteria: a) position on technical department or DPA or shipowner, b) participating on business decisions, c) shipping company owns a fleet, and d) deals with AIS data. The Thematic Analysis led to twenty-six codes, twelve categories and five concepts. Empirical findings showed that AIS data mostly contributes to make strategic business decisions. Participants are interested in using AIS data to measure the efficiency of their fleet and ports, to estimate the fuel consumption, to reduce their costs, to protect the environment and people’s health, to analyze the trade market, to predict the time of arrival, the optimal route and speed, to maintain highest security levels and to reduce the inaccuracies due to manual input of some AIS attributes. Finally, participants mentioned some AIS challenges including technological improvement (e.g., transponders, antennas) as well as the operation of autonomous vessels.  Finally, this master’s thesis contributes using the prescriptive and descriptive theories and help stakeholders to find new decisions while researchers and developers to advance their products.
62

Place des mégadonnées et des technologies de l'Intelligence Artificielle dans les activités de communication des petites et moyennes entreprises au Canada

El Didi, Dina 23 November 2022 (has links)
Le développement des mégadonnées et des technologies de l'Intelligence Artificielle a donné naissance à une économie numérique contrôlée par les géants du web (GAFAM). Cette économie témoigne d’une certaine inégalité quant à l'accès et à la gestion des mégadonnées et des technologies de l'Intelligence Artificielle. La présente étude vise à explorer l'inégalité entre les grandes organisations et les petites et moyennes entreprises (PME) au sujet de l'accès et de l'utilisation des mégadonnées et des technologies de l'IA. Pour ce, il s'agit de répondre à la question suivante : « Comment les équipes de communication dans les PME, au Canada, envisagent l'usage et l'importance des mégadonnées et des technologies de l'IA pour leur travail ? » Le cadre théorique mobilisé dans ce travail de recherche est, d’un côté, la sociologie des usages qui aidera à comprendre et à analyser les usages des mégadonnées et des technologies de l'IA par les équipes de communication des PME ; d'un autre côté, l'approche narrative qui permettra de décrire les contextes de pratiques de ces usages. Nous avons eu recours à une méthode mixte. La méthode quantitative, via un questionnaire en ligne, a permis d'identifier la place qu'occupent ces technologies actuellement dans le travail régulier des professionnels de la communication des PME ainsi que les défis qu'ils affrontent pour la mise en place et l'utilisation de ces technologies. La méthode qualitative, via des entrevues semi-dirigées, a servi à mieux comprendre les contextes de pratiques où ces technologies sont utilisées ou pourraient être utilisées. Les résultats ont suggéré qu'il existe un écart entre les PME et les grandes organisations par rapport à l'exploitation et à l'utilisation de ces technologies. Cet écart est dû avant tout à certains défis tels que le manque de connaissances et d'expertise et le manque d'intérêt envers ces technologies. Cette inégalité pourrait être mitigée en mettant en place un plan de formation des gestionnaires afin de garantir des changements au niveau de la culture organisationnelle. Les résultats ont fait émerger l'importance de l'intervention humaine sans laquelle les idées générées par les mégadonnées et les technologies de l'IA risquent d'être biaisées. Ainsi, compte tenu des limites de cette étude exploratoire, elle a permis d'avancer les connaissances en faisant émerger quelques pistes de recherches futures en ce qui concerne les mégadonnées et les technologies de l'IA et leur importance pour les activités de communication dans les PME.
63

A Smart and Interactive Edge-Cloud Big Data System

Stauffer, Jake 08 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Data and information have increased exponentially in recent years. The promising era of big data is advancing many new practices. One of the emerging big data applications is healthcare. Large quantities of data with varying complexities have been leading to a great need in smart and secure big data systems. Mobile edge, more specifically the smart phone, is a natural source of big data and is ubiquitous in our daily lives. Smartphones offer a variety of sensors, which make them a very valuable source of data that can be used for analysis. Since this data is coming directly from personal phones, that means the generated data is sensitive and must be handled in a smart and secure way. In addition to generating data, it is also important to interact with the big data. Therefore, it is critical to create edge systems that enable users to access their data and ensure that these applications are smart and secure. As the first major contribution of this thesis, we have implemented a mobile edge system, called s2Edge. This edge system leverages Amazon Web Service (AWS) security features and is backed by an AWS cloud system. The implemented mobile application securely logs in, signs up, and signs out users, as well as connects users to the vast amounts of data they generate. With a high interactive capability, the system allows users (like patients) to retrieve and view their data and records, as well as communicate with the cloud users (like physicians). The resulting mobile edge system is promising and is expected to demonstrate the potential of smart and secure big data interaction. The smart and secure transmission and management of the big data on the cloud is essential for healthcare big data, including both patient information and patient measurements. The second major contribution of this thesis is to demonstrate a novel big data cloud system, s2Cloud, which can help enhance healthcare systems to better monitor patients and give doctors critical insights into their patients' health. s2Cloud achieves big data security through secure sign up and log in for the doctors, as well as data transmission protection. The system allows the doctors to manage both patients and their records effectively. The doctors can add and edit the patient and record information through the interactive website. Furthermore, the system supports both real-time and historical modes for big data management. Therefore, the patient measurement information can, not only be visualized and demonstrated in real-time, but also be retrieved for further analysis. The smart website also allows doctors and patients to interact with each other effectively through instantaneous chat. Overall, the proposed s2Cloud system, empowered by smart secure design innovations, has demonstrated the feasibility and potential for healthcare big data applications. This study will further broadly benefit and advance other smart home and world big data applications. / 2023-06-01
64

Big data, data mining, and machine learning: value creation for business leaders and practitioners

Dean, J. January 2014 (has links)
No / Big data is big business. But having the data and the computational power to process it isn't nearly enough to produce meaningful results. Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners is a complete resource for technology and marketing executives looking to cut through the hype and produce real results that hit the bottom line. Providing an engaging, thorough overview of the current state of big data analytics and the growing trend toward high performance computing architectures, the book is a detail-driven look into how big data analytics can be leveraged to foster positive change and drive efficiency. With continued exponential growth in data and ever more competitive markets, businesses must adapt quickly to gain every competitive advantage available. Big data analytics can serve as the linchpin for initiatives that drive business, but only if the underlying technology and analysis is fully understood and appreciated by engaged stakeholders.
65

Large Web Archive Collection Infrastructure and Services

Wang, Xinyue 20 January 2023 (has links)
The web has evolved to be the primary carrier of human knowledge during the information age. The ephemeral nature of much web content makes web knowledge preservation vital in preserving human knowledge and memories. Web archives are created to preserve the current web and make it available for future reuse. A growing number of web archive initia- tives are actively engaging in web archiving activities. Web archiving standards like WARC, for formatted storage, have been established to standardize the preservation of web archive data. In addition to its preservation purpose, web archive data is also used as a source for research and for lost information recovery. However, the reuse of web archive data is inherently challenging because of the scale of data size and requirements of big data tools to serve and analyze web archive data efficiently. In this research, we propose to build web archive infrastructure that can support efficient and scalable web archive reuse with big data formats like Parquet, enabling more efficient quantitative data analysis and browsing services. Upon the Hadoop big data processing platform with components like Apache Spark and HBase, we propose to replace the WARC (web archive) data format with a columnar data format Parquet to facilitate more efficient reuse. Such a columnar data format can provide the same features as WARC for long-term preservation. In addition, the columnar data format introduces the potential for better com- putational efficiency and data reuse flexibility. The experiments show that this proposed design can significantly improve quantitative data analysis tasks for common web archive data usage. This design can also serve web archive data for a web browsing service. Unlike the conventional web hosting design for large data, this design primarily works on top of the raw large data in file systems to provide a hybrid environment around web archive reuse. In addition to the standard web archive data, we also integrate Twitter data into our design as part of web archive resources. Twitter is a prominent source of data for researchers in a vari- ety of fields and an integral element of the web's history. However, Twitter data is typically collected through non-standardized tools for different collections. We aggregate the Twitter data from different sources and integrate it into the suggested design for reuse. We are able to greatly increase the processing performance of workloads around social media data by overcoming the data loading bottleneck with a web-archive-like Parquet data format. / Doctor of Philosophy / The web has evolved to be the primary carrier of human knowledge during the information age. The ephemeral nature of much web content makes web knowledge preservation vital in preserving human knowledge and memories. Web archives are created to preserve the current web and make it available for future reuse. In addition to its preservation purpose, web archive data is also used as a source for research and for lost information discovery. However, the reuse of web archive data is inherently challenging because of the scale of data size and requirements of big data tools to serve and analyze web archive data efficiently. In this research, we propose to build a web archive big data processing infrastructure that can support efficient and scalable web archive reuse like quantitative data analysis and browsing services. We adopt industry frameworks and tools to establish a platform that can provide high-performance computation for web archive initiatives and users. We propose to convert the standard web archive data file format to a columnar data format for efficient future reuse. Our experiments show that our proposed design can significantly improve quantitative data analysis tasks for common web archive data usage. Our design can also serve an efficient web browsing service without adopting a sophisticated web hosting architecture. In addition to the standard web archive data, we also integrate Twitter data into our design as a unique web archive resource. Twitter is a prominent source of data for researchers in a variety of fields and an integral element of the web's history. We aggregate the Twitter data from different sources and integrate it into the suggested design for reuse. We are able to greatly increase the processing performance of workloads around social media data by overcoming the data loading bottleneck with a web-archive-like Parquet data format.
66

A Framework for Hadoop Based Digital Libraries of Tweets

Bock, Matthew 17 July 2017 (has links)
The Digital Library Research Laboratory (DLRL) has collected over 1.5 billion tweets for the Integrated Digital Event Archiving and Library (IDEAL) and Global Event Trend Archive Research (GETAR) projects. Researchers across varying disciplines have an interest in leveraging DLRL's collections of tweets for their own analyses. However, due to the steep learning curve involved with the required tools (Spark, Scala, HBase, etc.), simply converting the Twitter data into a workable format can be a cumbersome task in itself. This prompted the effort to build a framework that will help in developing code to analyze the Twitter data, run on arbitrary tweet collections, and enable developers to leverage projects designed with this general use in mind. The intent of this thesis work is to create an extensible framework of tools and data structures to represent Twitter data at a higher level and eliminate the need to work with raw text, so as to make the development of new analytics tools faster, easier, and more efficient. To represent this data, several data structures were designed to operate on top of the Hadoop and Spark libraries of tools. The first set of data structures is an abstract representation of a tweet at a basic level, as well as several concrete implementations which represent varying levels of detail to correspond with common sources of tweet data. The second major data structure is a collection structure designed to represent collections of tweet data structures and provide ways to filter, clean, and process the collections. All of these data structures went through an iterative design process based on the needs of the developers. The effectiveness of this effort was demonstrated in four distinct case studies. In the first case study, the framework was used to build a new tool that selects Twitter data from DLRL's archive of tweets, cleans those tweets, and performs sentiment analysis within the topics of a collection's topic model. The second case study applies the provided tools for the purpose of sociolinguistic studies. The third case study explores large datasets to accumulate all possible analyses on the datasets. The fourth case study builds metadata by expanding the shortened URLs contained in the tweets and storing them as metadata about the collections. The framework proved to be useful and cut development time for all four of the case studies. / Master of Science
67

Use of the Traffic Speed Deflectometer for Concrete and Composite Pavement Structural Health Assessment: A Big-Data-Based Approach Towards Concrete and Composite Pavement Management and Rehabilitation

Scavone Lasalle, Martin 23 August 2022 (has links)
The latest trends in highway pavement management aim at implementing a rational, data-driven procedure to allocate resources for pavement maintenance and rehabilitation. To this end, decision-making is based on network-wide surface condition and structural capacity data – preferably collected in a non-destructive manner such as a deflection testing device. This more holistic approach was proven to be more cost-effective than the current state of the art, in which the pavement manager grounds their maintenance and rehabilitation-related decision making on surface distress measurements. However, pavement practitioners still rely mostly on surface distress because traditional deflection measuring devices are not practical for network-level data collection. Traffic-speed deflection devices, among which the Traffic Speed Deflectometer [TSD], allow measuring pavement surface deflections at travel speeds as high as 95 km/h [60 miles per hour], and reporting the said measurements with a spatial resolution as dense as 5cm [2 inches] between consecutive measurements. Since their inception in the early 2000s, and mostly over the past 15 years, numerous research efforts and trial tests focused on the interpretation of the deflection data collected by the TSD, its validity as a field testing device, and its comparability against the staple pavement deflection testing device – the Falling Weight Deflectometer [FWD]. The research efforts have concluded that although different in nature than the FWD, the TSD does furnish valid deflection measurements, from which the pavement structural health can be assessed. Most published TSD-related literature focused on TSD surveys of flexible pavement networks and the estimation of structural health indicators for hot-mix asphalt pavement structures from the resulting data – a sensible approach given that the majority of the US paved road pavement network is asphalt. Meanwhile, concrete and composite pavements (a minority of the US pavement network that yet accounts for nearly half of the US Interstate System) have been mostly neglected in TSD-related research, even though the TSD has been deemed a suitable device for sourcing deflection data from which to infer the structural health of the pavement slabs and the load-carrying joints. Thus, this Dissertation's main objective is to fulfill this gap in knowledge, providing the pavement manager/practitioner with a streamlined, comprehensive interpretation procedure to turn dense TSD deflection measurements collected at a jointed pavement network into characterization parameters and structural health metrics for both the concrete slab system, the sub-grade material, and the load-carrying joints. The proposed TSD data analysis procedure spans over two stages: Data extraction and interpretation. The Data Extraction Stage applies a Lasso-based regularization scheme [Basis Pursuit coupled with Reweighted L1 Minimization] to simultaneously remove the white noise from the TSD deflection measurements and extract the deflection response generated as the TSD travels over the pavement's transverse joints. The examples presented demonstrate that this technique can actually pinpoint the location of structurally weak spots within the pavement network from the network-wide TSD measurements, such as deteriorated transverse joints or segments with early stages of fatigue damage, worthy of further investigation and/or structural overhaul. Meanwhile, the Interpretation Stage implements a linear-elastic jointed-slab-on-ground mathematical model to back-calculate the concrete pavement's and subgrade's stiffness and the transverse joints' load transfer efficiency index [LTE] from the denoised TSD measurements. In this Dissertation, the performance of this back-calculation technique is analyzed with actual TSD data collected at a 5-cm resolution at the MnROAD test track, for which material properties results and FWD-based deflection test results at select transverse joints are available. However, during an early exploratory analysis of the available 5-cm data, a discrepancy between the reported deflection slope and velocity data and simulated measurements was found: The simulated deflection slopes mismatch the observations for measurements collected nearby the transverse joints whereas the measured and simulated deflection velocities are in agreement. Such a finding prompted a revision of the well-known direct relationship between TSD-based deflection velocity and slope data, concluding that it only holds on very specific cases, and that a jointed pavement is a case in which deflection velocity and slope do not correlate directly. As a consequence, the back-calculation approach to the pavement properties and the joints' LTE index was implemented with the TSD's deflection velocity data as input. Validation results of the back-calculation tool using TSD data from the MnROAD low volume road showed a reasonable agreement with the comparison data available while at the same time providing an LTE estimate for all the transverse joints (including those for which FWD-based deflection data is unavailable), suggesting that the proposed data analysis technique is practical for corridor-wide screening. In summary, this Dissertation presents a streamlined TSD data extraction and interpretation technique that can (1) highlight the location of structurally deficient joints within a jointed pavement corridor worthy of further investigation with an FWD and/or localized repair, thus optimizing the time the FWD spends on the road; and 2) reasonably estimate the structural parameters of a concrete pavement structure, its sub-grade, and the transverse joints, thus providing valuable data both for inventory-keeping and rehabilitation management. / Doctor of Philosophy / When allocating funds for network-wide pavement maintenance, such as the State or Country level, the engineer relies on as much pavement condition data as possible to optimally assign the most suitable maintenance or rehabilitation treatment to each pavement segment. Currently, practitioners rely mostly on surface condition data to decide on how to maintain their roads, as this data can be collected fast and easily with automated vehicle-mounted equipment and analyzed by computer software. However, managerial decisions based solely on surface condition data do not optimally make use of the Agency resources, for they do not precisely account for the pavements' structural capacity when assigning maintenance solutions. As such, the manager may allocate a surface treatment on a structurally weak segment with a poor surface which will be prone to an early failure (thus wasting the investment) or, conversely, reconstruct a deteriorated yet strong segment that could be fixed with a surface treatment. The reason for such a sub-optimal managerial practice has been the lack of a commercially-available pavement testing device capable of producing structural health data at a similar rate as the existing surface scanning equipment – pavement engineers could only appeal to crawling-speed or stop-and-go deflection devices to gather such data, which are fit for project-level applications but totally unsuitable for routine network-wide surveying. Yet, this trend reverted in the early 2000s with the launch of the Traffic Speed Deflectometer [TSD], a device capable of getting dense pavement deflection measurements (spaced as close as 5cm [2 inches] between each other) while traveling at speeds higher than 50 mph. Following the device's release, numerous research activities studied its feasibility as a network-wide routine data collection device and developed analysis schemes to interpret the collected measurements into pavement structural condition information. This research effort is still ongoing, the Transportation Pooled Fund [TPF] Project 5(385) is aimed in that direction, and set the goal of furnishing standards on the acquisition, storage, and interpretation of TSD data for pavement management. This being said, data collection and analysis protocols should be drafted to interpret the data gathered by the TSD on flexible and rigid pavements. Concerning TSD-based evaluation of flexible asphalt pavements, abundant published literature discussing exists; whereas TSD surveying of concrete and composite (concrete + asphalt) pavements has been off the center of attention, partly because these pavements constitute only a minority of the US paved highway network – even though they account for roughly half of the Interstate system. Yet, the TSD has been found suitable to provide valuable structural health information concerning both the pavement slabs and the load-bearing joints, the weakest element of such structures. With this in mind, this Dissertation research is aimed at bridging this existing gap in knowledge: a streamlined analysis methodology is proposed to process the TSD deflection data collected while surveying a jointed rigid pavement and derive important structural health metrics for the manager to drive their decision-making. Broadly speaking, this analysis methodology is constituted by two main elements: • The Data Extraction stage, in which the TSD deflection data is mined to both clear it from measurement noise and extract meaningful features, such as the pulse responses generated as the TSD travels over the pavement joints. • The Interpretation stage, which is more pavement engineering-related. Herein, the filtered TSD measurements are utilized to fit a pavement response model so that the pavement structural parameters (its stiffness, the strength of the sub-grade soil, and the joints' structural health) can be inferred. This Dissertation spans both the mathematical grounds for these analysis techniques, validation tests on computer-generated data, and experiments done with actual TSD data to test their applicability. The ultimate intention is for these techniques to eventually be adopted in practice as routine analysis of the TSD data for a more rational and resource-wise pavement management.
68

Screening and Engineering Phenotypes using Big Data Systems Biology

Huttanus, Herbert M. 20 September 2019 (has links)
Biological systems display remarkable complexity that is not properly accounted for in small, reductionistic models. Increasingly, big data approaches using genomics, proteomics, metabolomics etc. are being applied to predicting and modifying the emergent phenotypes produced by complex biological systems. In this research, several novel tools were developed to assist in the acquisition and analysis of biological big data for a variety of applications. In total, two entirely new tools were created and a third, relatively new method, was evaluated by applying it to questions of clinical importance. 1) To assist in the quantification of metabolites at the subcellular level, a strategy for localized in-vivo enzymatic assays was proposed. A proof of concept for this strategy was conducted in which the local availability of acetyl-CoA in the peroxisomes of yeast was quantified by the production of polyhydroxybutyrate (PHB) using three heterologous enzymes. The resulting assay demonstrated the differences in acetyl-CoA availability in the peroxisomes under various culture conditions and genetic alterations. 2) To assist in the design of genetically modified microbe strains that are stable over many generations, software was developed to automate the selection of gene knockouts that would result in coupling cellular growth with production of a desired chemical. This software, called OptQuick, provides advantages over contemporary software for the same purpose. OptQuick can run considerably faster and uses a free optimization solver, GLPK. Knockout strategies generated by OptQuick were compared to case studies of similar strategies produced by contemporary programs. In these comparisons, OptQuick found many of the same gene targets for knockout. 3) To provide an inexpensive and non-invasive alternative for bladder cancer screening, Raman-based urinalysis was performed on clinical urine samples using RametrixTM software. RametrixTM has been previously developed and employed to other urinalysis applications, but this study was the first instance of applying this new technology to bladder cancer screening. Using a pool of 17 bladder cancer positive urine samples and 39 clinical samples exhibiting a range of healthy or other genitourinary disease phenotypes, RametrixTM was able to detect bladder cancer with a sensitivity of 94% and a specificity of 54%. 4) Methods for urine sample preservation were tested with regard to their effect on subsequent analysis with RametrixTM. Specifically, sterile filtration was tested as a potential method for extending the duration at which samples may be kept at room temperature prior to Raman analysis. Sterile filtration was shown to alter the chemical profile initially, but did not prevent further shifts in chemical profile over time. In spite of this, both unfiltered and filtered urine samples alike could be used for screening for chronic kidney disease or bladder cancer even after being stored for 2 weeks at room temperature, making sterile filtration largely unnecessary. / Doctor of Philosophy / Biological systems display remarkable complexity that is not properly accounted for in conventional, reductionistic models. Thus, there is a growing trend in biological studies to use computational analysis on large databases of information such as genomes containing thousands of genes or chemical profiles containing thousands of metabolites in a single cell. In this research, several new tools were developed to assist with gathering and processing large biological datasets. In total, two entirely new tools were created and a third, relatively new method, was evaluated by applying it to questions of medical importance. The first two tools are for bioengineering applications. Bioengineers often want to understand the complex chemical network of a cell’s metabolism and, ultimately, alter that network so as to force the cell to make more of a desired chemical like a biofuel or medicine. The first tool discussed in this dissertation offers a way to measure the concentration of key chemicals within a cell. Unlike previous methods for measuring these concentrations, however, this method limits its search to a specific compartment within the cell, which is important to many bioengineering strategies. The second technology discussed in this paper uses computer simulations of the cells entire metabolism to determine what genetic alterations might lead to better produce a chemical of interest. The third tool involves analyzing the chemical makeup of urine samples to screen for diseases such as bladder cancer. Two studies were conducted with this third tool. The first study shows that Raman spectroscopy can distinguish between bladder cancer and related diseases. The second v study addresses whether sterilizing the urine samples through filtration is necessary to preserve the samples for analysis. It was found that filtration was neither beneficial nor necessary.
69

On Grouped Observation Level Interaction and a Big Data Monte Carlo Sampling Algorithm

Hu, Xinran 26 January 2015 (has links)
Big Data is transforming the way we live. From medical care to social networks, data is playing a central role in various applications. As the volume and dimensionality of datasets keeps growing, designing effective data analytics algorithms emerges as an important research topic in statistics. In this dissertation, I will summarize our research on two data analytics algorithms: a visual analytics algorithm named Grouped Observation Level Interaction with Multidimensional Scaling and a big data Monte Carlo sampling algorithm named Batched Permutation Sampler. These two algorithms are designed to enhance the capability of generating meaningful insights and utilizing massive datasets, respectively. / Ph. D.
70

Remote High Performance Visualization of Big Data for Immersive Science

Abidi, Faiz Abbas 15 June 2017 (has links)
Remote visualization has emerged as a necessary tool in the analysis of big data. High-performance computing clusters can provide several benefits in scaling to larger data sizes, from parallel file systems to larger RAM profiles to parallel computation among many CPUs and GPUs. For scalable data visualization, remote visualization tools and infrastructure is critical where only pixels and interaction events are sent over the network instead of the data. In this paper, we present our pipeline using VirtualGL, TurboVNC, and ParaView to render over 40 million points using remote HPC clusters and project over 26 million pixels in a CAVE-style system. We benchmark the system by varying the video stream compression parameters supported by TurboVNC and establish some best practices for typical usage scenarios. This work will help research scientists and academicians in scaling their big data visualizations for real time interaction. / Master of Science

Page generated in 0.0499 seconds