1 |
Big Data Analytics: A Literature Review PerspectiveAl-Shiakhli, Sarah January 2019 (has links)
Big data is currently a buzzword in both academia and industry, with the term being used todescribe a broad domain of concepts, ranging from extracting data from outside sources, storingand managing it, to processing such data with analytical techniques and tools.This thesis work thus aims to provide a review of current big data analytics concepts in an attemptto highlight big data analytics’ importance to decision making.Due to the rapid increase in interest in big data and its importance to academia, industry, andsociety, solutions to handling data and extracting knowledge from datasets need to be developedand provided with some urgency to allow decision makers to gain valuable insights from the variedand rapidly changing data they now have access to. Many companies are using big data analyticsto analyse the massive quantities of data they have, with the results influencing their decisionmaking. Many studies have shown the benefits of using big data in various sectors, and in thisthesis work, various big data analytical techniques and tools are discussed to allow analysis of theapplication of big data analytics in several different domains.
|
2 |
Performance Evaluation of Hadoop based Big Data Applications with HiBench Benchmarking tool on IaaS Cloud PlatformsMuthiah, Karthika, Ms. 01 January 2017 (has links)
Cloud computing is a computing paradigm where large numbers of devices are connected through networks that provide a dynamically scalable infrastructure for applications, data and storage. Currently, many businesses, from small scale to big companies and industries, are changing their operations to utilize cloud services because cloud platforms could increase company’s growth through process efficiency and reduction in information technology spending [Coles16]. Companies are relying on cloud platforms like Amazon Web Services, Google Compute Engine, and Microsoft Azure, etc., for their business development.
Due to the emergence of new technologies, devices, and communications, the amount of data produced is growing rapidly every day. Big data is a collection of large dataset, typically hundreds of gigabytes, terabytes or petabytes. Big data storage and the analytics of this huge volume of data are a great challenge for companies and new businesses to handle, which is a primary focus of this paper.
This research was conducted on Amazon’s Elastic Compute Cloud (EC2) and Microsoft Azure platforms using the HiBench Hadoop Big Data Benchmark suite [HiBench16]. Processing huge volumes of data is a tedious task that is normally handled through traditional database servers. In contrast, Hadoop is a powerful framework is used to handle applications with big data requirements efficiently by using the MapReduce
algorithm to run them on systems with many commodity hardware nodes. Hadoop’s distributed file system facilitates rapid storage and data transfer rates of big data among the nodes and remains operational even when a node failure has occurred in a cluster. HiBench is a big data benchmarking tool that is used for evaluating the performance of big data applications whose data are handled and controlled by the Hadoop framework cluster. Hadoop cluster environment was enabled and evaluated on two cloud platforms. A quantitative comparison was performed on Amazon EC2 and Microsoft Azure along with a study of their pricing models. Measures are suggested for future studies and research.
|
3 |
INTEGRATING CONNECTED VEHICLE DATA FOR OPERATIONAL DECISION MAKINGRahul Suryakant Sakhare (9320111) 26 April 2023 (has links)
<p> </p>
<p>Advancements in technology have propelled the availability of enriched and more frequent information about traffic conditions as well as the external factors that impact traffic such as weather, emergency response etc. Most newer vehicles are equipped with sensors that transmit their data back to the original equipment manufacturer (OEM) at near real-time fidelity. A growing number of such connected vehicles (CV) and the advent of third-party data collectors from various OEMs have made big data for traffic commercially available for use. Agencies maintaining and managing surface transportation are presented with opportunities to leverage such big data for efficiency gains. The focus of this dissertation is enhancing the use of CV data and applications derived from fusing it with other datasets to extract meaningful information that will aid agencies in data driven efficient decision making to improve network wide mobility and safety performance. </p>
<p>One of the primary concerns of CV data for agencies is data sampling, particularly during low-volume overnight hours. An evaluation of over 3 billion CV records in May 2022 in Indiana has shown an overall CV penetration rate of 6.3% on interstates and 5.3% on non-interstate roadways. Fusion of CV traffic speeds with precipitation intensity from NOAA’s High-Resolution Rapid-Refresh (HRRR) data over 42 unique rainy days has shown reduction in the average traffic speed by approximately 8.4% during conditions classified as very heavy rain compared to no rain. </p>
<p>Both aggregate analysis and disaggregate analysis performed during this study enables agencies and automobile manufacturers to effectively answer the often-asked question of what rain intensity it takes to begin impacting traffic speeds. Proactive measures such as providing advance warnings that improve the situational awareness of motorists and enhance roadway safety should be considered during very heavy rain periods, wind events, and low daylight conditions.</p>
<p>Scalable methodologies that can be used to systematically analyze hard braking and speed data were also developed. This study demonstrated both quantitatively and qualitatively how CV data provides an opportunity for near real-time assessment of work zone operations using metrics such as congestion, location-based speed profiles and hard braking. The availability of data across different states and ease of scalability makes the methodology implementable on a state or national basis for tracking any highway work zone with little to no infrastructure investment. These techniques can provide a nationwide opportunity in assessing the current guidelines and giving feedback in updating the design procedures to improve the consistency and safety of construction work zones on a national level. </p>
<p>CV data was also used to evaluate the impact of queue warning trucks sending digital alerts. Hard-braking events were found to decrease by approximately 80% when queue warning trucks were used to alert motorists of impending queues analyzed from 370 hours of queueing with queue trucks present and 58 hours of queueing without the queue trucks present, thus improving work zone safety. </p>
<p>Emerging opportunities to identify and measure traffic shock waves and their forming or recovery speed anywhere across a roadway network are provided due to the ubiquity of the CV data providers. A methodology for identifying different shock waves was presented, and among the various case studies found typical backward forming shock wave speeds ranged from 1.75 to 11.76 mph whereas the backward recovery shock wave speeds were between 5.78 to 16.54 mph. The significance of this is illustrated with a case study of a secondary crash that suggested accelerating the clearance by 9 minutes could have prevented the secondary crash incident occurring at the back of the queue. Such capability of identifying and measuring shock wave speeds can be utilized by various stakeholders for traffic management decision-making that provide a holistic perspective on the importance of both on scene risk as well as the risk at the back of the queue. Near real-time estimation of shock waves using CV data can recommend travel time prediction models and serve as input variables to navigation systems to identify alternate route choice opportunities ahead of a driver’s time of arrival. </p>
<p>The overall contribution of this thesis is developing scalable methodologies and evaluation techniques to extract valuable information from CV data that aids agencies in operational decision making.</p>
|
4 |
Development of a continuous condition monitoring system based on probabilistic modelling of partial discharge data for polymeric insulation cablesAhmed, Zeeshan 09 August 2019 (has links)
Partial discharge (PD) measurements have been widely accepted as an efficient online insulation condition assessment method in high voltage equipment. Two sets of experimental PD measuring setups were established with the aim to study the variations in the partial discharge characteristics over the insulation degradation in terms of the physical phenomena taking place in PD sources, up to the point of failure. Probabilistic lifetime modeling techniques based on classification, regression and multivariate time series analysis were performed for a system of PD response variables, i.e. average charge, pulse repetition rate, average charge current, and largest repetitive discharge magnitude over the data acquisition period. Experimental lifelong PD data obtained from samples subjected to accelerated degradation was used to study the dynamic trends and relationships among those aforementioned response variables. Distinguishable data clusters detected by the T-Stochastics Neighborhood Embedding (tSNE) algorithm allows for the examination of the state-of-the-art modeling techniques over PD data. The response behavior of trained models allows for distinguishing the different stages of the insulation degradation. An alternative approach utilizing a multivariate time series analysis was performed in parallel with Classification and Regression models for the purpose of forecasting PD activity (PD response variables corresponding to insulation degradation). True observed data and forecasted data mean values lie within the 95th percentile confidence interval responses for a definite horizon period, which demonstrates the soundness and accuracy of models. A life-predicting model based on the cointegrated relations between the multiple response variables, trained model responses correlated with experimentally evaluated time-to-breakdown values and well-known physical discharge mechanisms, can be used to set an emergent alarming trigger and as a step towards establishing long-term continuous monitoring of partial discharge activity. Furthermore, this dissertation also proposes an effective PD monitoring system based on wavelet and deflation compression techniques required for an optimal data acquisition as well as an algorithm for high-scale, big data reduction to minimize PD data size and account only for the useful PD information. This historically recorded useful information can thus be used for, not only postault diagnostics, but also for the purpose of improving the performance of modelling algorithms as well as for an accurate threshold detection.
|
Page generated in 0.1423 seconds