Global ETD Search

1	Data analytics on Yelp data set Tata, Maitreyi January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William H. Hsu / In this report, I describe a query-driven system which helps in deciding which restaurant to invest in or which area is good to open a new restaurant in a specific place. Analysis is performed on already existing businesses in every state. This is based on certain factors such as the average star rating, the total number of reviews associated with a specific restaurant, the price range of the restaurant etc. The results will give an idea of successful restaurants in a city, which helps you decide where to invest and what are the things to be kept in mind while starting a new business. The main scope of the project is to concentrate on Analytics and Data Visualization. Data analytics Data visualization
2	Data Analytics Methods in Wind Turbine Design and Operations Lee, Giwhyun 16 December 2013 (has links) This dissertation develops sophisticated data analytic methods to analyze structural loads on, and power generation of, wind turbines. Wind turbines, which convert the kinetic energy in wind into electrical power, are operated within stochastic environments. To account for the influence of environmental factors, we employ a conditional approach by modeling the expectation or distribution of response of interest, be it the structural load or power output, conditional on a set of environmental factors. Because of the different nature associated with the two types of responses, our methods also come in different forms, conducted through two studies. The first study presents a Bayesian parametric model for the purpose of estimating the extreme load on a wind turbine. The extreme load is the highest stress level that the turbine structure would experience during its service lifetime. A wind turbine should be designed to resist such a high load to avoid catastrophic structural failures. To assess the extreme load, turbine structural responses are evaluated by conducting field measurement campaigns or performing aeroelastic simulation studies. In general, data obtained in either case are not sufficient to represent various loading responses under all possible weather conditions. An appropriate extrapolation is necessary to characterize the structural loads in a turbine’s service life. This study devises a Bayesian spline method for this extrapolation purpose and applies the method to three sets of load response data to estimate the corresponding extreme loads at the roots of the turbine blades. In the second study, we propose an additive multivariate kernel method as a new power curve model, which is able to incorporate a variety of environmental factors in addition to merely the wind speed. In the wind industry, a power curve refers to the functional relationship between the power output generated by a wind turbine and the wind speed at the time of power generation. Power curves are used in practice for a number of important tasks including predicting wind power production and assessing a turbine’s energy production efficiency. Nevertheless, actual wind power data indicate that the power output is affected by more than just wind speed. Several other environmental factors, such as wind direction, air density, humidity, turbulence intensity, and wind shears, have potential impact. Yet, in industry practice, as well as in the literature, current power curve models primarily consider wind speed and, with comparatively less frequency, wind speed and direction. Our model provides, conditional on a given environmental condition, both the point estimation and density estimation of the power output. It is able to capture the nonlinear relationships between environmental factors and wind power output, as well as the high-order inter- action effects among some of the environmental factors. To illustrate the application of the new power curve model, we conduct case studies that demonstrate how the new method can help with quantifying the benefit of vortex generator installation, advising pitch control adjustment, and facilitating the diagnosis of faults. Wind energy Data Analytics Methods
3	The Use of Data Analytics in Internal Audit to Improve Decision-Making: An Investigation of Data Visualizations and Data Source Seymore, Megan 08 1900 (has links) The purpose of this dissertation was to examine how managers' judgments from an internal auditor's recommendation are influenced by some aspects of newer data sources and the related visualizations. This study specifically examined how managers' judgments from an internal auditor's recommendation are influenced by the (1) supportiveness of non-financial data with the internal auditor's recommendation and (2) evaluability of visual representations for non-financial data used to communicate the recommendation. This was investigated in a setting where financial data does not support the internal auditor's recommendation. To test my hypotheses, I conducted an experiment that uses an inventory write-down task to examine the likelihood that a manager agrees with an internal auditor's inventory write-down recommendation. This task was selected as it requires making a prediction and both financial and newer non-financial data sources are relevant to inform this judgment. The study was conducted with MBA students who proxy for managers in organizations. Evaluability of visual representations was operationalized as the (1) proximity of financial and non-financial graphs, and (2) type of non-financial graph as requiring a length judgment or not. This dissertation contributes to accounting literature and the internal auditing profession. First, I contribute to recent experimental literature on data analytics by providing evidence that newer non-financial data sources will be integrated into managers' judgments even when financial data is inconsistent. However, I also identified that the effectiveness of appropriate agreement with an internal auditor's recommendation depends on the evaluability of the visualizations for non-financial data. Second, I expand on the literature that examines managers' agreement with recommendations from internal auditors by examining an unexplored yet relevant context of using newer non-financial data sources and communicating these results. Specifically, I identified how the evaluability of visual representations for non-financial data interacts with the supportiveness of non-financial data with the internal auditor's recommendation to create differences in managers' agreement with the recommendation. I also identified confidence in the internal auditor's recommendation as an explanatory variable in some situations. My findings also have practical value for the internal auditing profession to understand the importance of appropriate visualizations in audit reporting. internal audit data analytics visualization
4	Privacy Preserving Network Security Data Analytics DeYoung, Mark E. 24 April 2018 (has links) The problem of revealing accurate statistics about a population while maintaining privacy of individuals is extensively studied in several related disciplines. Statisticians, information security experts, and computational theory researchers, to name a few, have produced extensive bodies of work regarding privacy preservation. Still the need to improve our ability to control the dissemination of potentially private information is driven home by an incessant rhythm of data breaches, data leaks, and privacy exposure. History has shown that both public and private sector organizations are not immune to loss of control over data due to lax handling, incidental leakage, or adversarial breaches. Prudent organizations should consider the sensitive nature of network security data and network operations performance data recorded as logged events. These logged events often contain data elements that are directly correlated with sensitive information about people and their activities -- often at the same level of detail as sensor data. Privacy preserving data publication has the potential to support reproducibility and exploration of new analytic techniques for network security. Providing sanitized data sets de-couples privacy protection efforts from analytic research. De-coupling privacy protections from analytical capabilities enables specialists to tease out the information and knowledge hidden in high dimensional data, while, at the same time, providing some degree of assurance that people's private information is not exposed unnecessarily. In this research we propose methods that support a risk based approach to privacy preserving data publication for network security data. Our main research objective is the design and implementation of technical methods to support the appropriate release of network security data so it can be utilized to develop new analytic methods in an ethical manner. Our intent is to produce a database which holds network security data representative of a contextualized network and people's interaction with the network mid-points and end-points without the problems of identifiability. / Ph. D. / Network security data is produced when people interact with devices (e.g., computers, printers, mobile telephones) and networks (e.g., a campus wireless network). The network security data can contain identifiers, like user names, that strongly correlate with real world people. In this work we develop methods to protect network security data from privacy-invasive misuse by the ’honest-but-curious’ authorized data users and unauthorized malicious attackers. Our main research objective is the design and implementation of technical methods to support the appropriate release of network security data so it can be utilized to develop new analytic methods in an ethical manner. Our intent is to produce a data set which holds network security data representative of people’s interaction with a contextualized network without the problems of identifiability. Privacy Data Analytics Network Security
5	Exploiting Application Characteristics for Efficient System Support of Data-Parallel Machine Learning Cui, Henggang 01 May 2017 (has links) Large scale machine learning has many characteristics that can be exploited in the system designs to improve its efficiency. This dissertation demonstrates that the characteristics of the ML computations can be exploited in the design and implementation of parameter server systems, to greatly improve the efficiency by an order of magnitude or more. We support this thesis statement with three case study systems, IterStore, GeePS, and MLtuner. IterStore is an optimized parameter server system design that exploits the repeated data access pattern characteristic of ML computations. The designed optimizations allow IterStore to reduce the total run time of our ML benchmarks by up to 50×. GeePS is a parameter server that is specialized for deep learning on distributed GPUs. By exploiting the layer-by-layer data access and computation pattern of deep learning, GeePS provides almost linear scalability from single-machine baselines (13× more training throughput with 16 machines), and also supports neural networks that do not fit in GPU memory. MLtuner is a system for automatically tuning the training tunables of ML tasks. It exploits the characteristic that the best tunable settings can often be decided quickly with just a short trial time. By making use of optimization-guided online trial-and-error, MLtuner can robustly find and re-tune tunable settings for a variety of machine learning applications, including image classification, video classification, and matrix factorization, and is over an order of magnitude faster than traditional hyperparameter tuning approaches. Big Data Analytics Large-Scale Machine Learning
6	Business model innovation to explore data analytics value; A case study of Caterpillar and Ericsson. Kritikos, Konstantinos, Barreiros, Jacinto January 2016 (has links) The aim of the thesis is to identify a roadmap for well-established companies towards business model innovation to explore data analytics value. The business model innovation currently taking place at Caterpillar and Ericsson in order to explore data analytics value is presented to answer the question: “How do established companies explore data analytics to innovate their business models?” Initially, the problem discussion, formulation and purpose are given. Then, the relevant theory is presented covering the importance of data analytics, IT infrastructure challenges due to the increased volume of data created, data analytics methods currently being used, smart connected products and the Internet of Things. The meaning of business model innovation is given, followed by a well-structured business model process which includes the business model canvas for representation purposes. The business areas affected by data analytics value and the barriers of business model innovation are given as well. After that, the theory addressing business model innovation to explore data analytics value is presented and the main industries which are currently on this journey along with the required initial steps and the business models that can come out of this process are identified. The challenges and risks if the option of not following this route is chosen are also shown. The method section follows to explain the case study design, data collection method and way of analysis. The results cover all the information gathered from numerous sources including on-line available information, papers, interviews, videos, end of year reviews and most importantly current Caterpillar and Ericsson mid-level management employee answers to a questionnaire created and distributed by the authors. The business model canvas tool is used to aid the reader understanding Caterpillar’s and Ericsson’s business model innovation. Each company’s business model is given before and after data analytics adoption. Finally, the analysis of the results and the link with the theory is given in order to answer the thesis question. Business model innovation data analytics Caterpillar Ericsson
7	A Model-driven Visual Analytic Framework for Local Pattern Analysis Zhao, Kaiyu 09 February 2016 (has links) The ultimate goal of any visual analytic task is to make sense of the data and gain insights. Unfortunately, the process of discovering useful information is becoming more challenging nowadays due to the growing data scale. Particularly, the human cognitive capabilities remain constant whereas the scale and complexity of data are not. Meanwhile, visual analytics largely relies on human analytic in the loop which imposes challenge to traditional human-driven workflow. It is almost impossible to show every aspect of details to the user while diving into local region of the data to explain phenomenons hidden in the data. For example, while exploring the data subsets, it is always important to determine which partitions of data contain more important information. Also, determining the subset of features is vital before further doing other analysis. Furthermore, modeling on these subsets of data locally can yield great finding but also introduces bias. In this work, a model driven visual analytic framework is proposed to help identify interesting local patterns from the above three aspects. This dissertation work aims to tackle these subproblems in the following three topics: model-driven data exploration, model-driven feature analysis and local model diagnosis. First, the model-driven data exploration focus on the problem of modeling subset of data to identify the co-movement of time-series data within certain subset time partitions, which is an important application in a number of domains such as medical science, finance, business and engineering. Second, the model-driven feature analysis is to discover the important subset of interesting features while analyzing local feature similarities. Within the financial risk dataset collected by domain expert, we discover that the feature correlation among different data partitions (i.e., small and large companies) are very different. Third, local model diagnosis provides a tool to identify interesting local regression models at local regions of the data space which makes it possible for the analysts to model the whole data space with a set of local models while knowing the strength and weakness of them. The three tools provide an integrated solution for identifying interesting patterns within local subsets of data. visual analytics data analytics interactive exploration
8	Using Data Analytics in Agriculture to Make Better Management Decisions Liebe, Douglas Michael 19 May 2020 (has links) The goal of this body of work is to explore various aspects of data analytics (DA) and its applications in agriculture. In our research, we produce decisions with mathematical models, create models, evaluate existing models, and review how certain models are best applied. The increasing granularity in decisions being made on farm, like individualized feeding, sub-plot level crop management, and plant and animal disease prevention, creates complex systems requiring DA to identify variance and patterns in data collected. Precision agriculture requires DA to make decisions about how to feasibly improve efficiency or performance in the system. Our research demonstrates ways to provide recommendations and make decisions in such systems. Our first research goal was to clarify research on the topic of endophyte-infected tall fescue by relating different infection-measuring techniques and quantifying the effect of infection-level on grazing cattle growth. Cattle graze endophyte-infected tall fescue in many parts of the U.S and this feedstuff is thought to limit growth performance in those cattle. Our results suggest ergovaline concentration makes up close to 80% of the effect of measured total ergot alkaloids and cattle average daily gain decreased 33 g/d for each 100ppb increase in ergovaline concentration. By comparing decreased weight gain to the costs of reseeding a pasture, producers can make decisions related to the management of infected pastures. The next research goal was to evaluate experimental and feed factors that affect measurements associated with ruminant protein digestion. Measurements explored were 0-h washout, potentially degradable, and undegradable protein fractions, protein degradation rate and digestibility of rumen undegradable protein. Our research found that the aforementioned measurements were significantly affected by feedstuff characteristics like neutral detergent fiber content and crude protein content, and also measurement variables like bag pore size, incubation time, bag area, and sample size to bag area ratio. Our findings suggest that current methods to measure and predict protein digestion lack robustness and are therefore not reliable to make feeding decisions or build research models. The first two research projects involved creating models to help researchers and farmers make better decisions. Next, we aimed to produce a summary of existing DA frameworks and propose future areas for model building in agriculture. Machine learning models were discussed along with potential applications in animal agriculture. Additionally, we discuss the importance of model evaluation when producing applicable models. We propose that the future of DA in agriculture comes with increasing decision making done without human input and better integration of DA insights into farmer decision-making. After detailing how mathematical models and machine learning could be used to further research, models were used to predict cases of clinical mastitis (CM) in dairy cows. Machine learning models took daily inputs relating to activity and production to produce probabilities of CM. By considering the economic costs of treatment and non-treatment in CM cases, we provide insight into the lack of applicable models being produced, and why smarter data collection, representative datasets, and validation that reflects how the model will be used are needed. The overall goal of this body of work was to advance our understanding of agriculture and the complex decisions involved through the use of DA. Each project sheds light on model building, model evaluation, or model applicability. By relating modeling techniques in other fields to agriculture, this research aims to improve translation of these techniques in future research. As data collection in agriculture becomes even more commonplace, the need for good modeling practices will increase. / Doctor of Philosophy / Data analytics (DA) has become more popular with the increasing data collection capabilities using technologies like sensors, improvement in data storage techniques, and expanding literature on algorithms that can be used in prediction and summarization. This body of work explores many aspects of agricultural DA and its applications on-farm. The field of precision agriculture has risen from an influx of data and new possibilities for using these data. Even small farms are now able to collect data using technologies like sensor-equipped tractors and drones which are relatively inexpensive. Our research shows how using mathematical models combined with these data can help researchers produce more applicable tools and, in turn, help producers make more targeted decisions. We examine cases where models improve the understanding of a system, specifically, the effect of endophyte infection in tall fescue pastures, the effect of measurement on protein digestibility for ration formulation, and methods to predict sparse diseases using big data. Although DA is widely applied, specific agricultural research on topics such as model types, model performance, and model utility needs to be done. This research presented herein expands on these topics in detail, using DA and mathematical models to make predictions and understand systems while utilizing applicable DA frameworks for future research. data analytics Modeling decision-making agriculture
9	Data Mining Academic Emails to Model Employee Behaviors and Analyze Organizational Structure Straub, Kayla Marie 06 June 2016 (has links) Email correspondence has become the predominant method of communication for businesses. If not for the inherent privacy concerns, this electronically searchable data could be used to better understand how employees interact. After the Enron dataset was made available, researchers were able to provide great insight into employee behaviors based on the available data despite the many challenges with that dataset. The work in this thesis demonstrates a suite of methods to an appropriately anonymized academic email dataset created from volunteers' email metadata. This new dataset, from an internal email server, is first used to validate feature extraction and machine learning algorithms in order to generate insight into the interactions within the center. Based solely on email metadata, a random forest approach models behavior patterns and predicts employee job titles with $96%$ accuracy. This result represents classifier performance not only on participants in the study but also on other members of the center who were connected to participants through email. Furthermore, the data revealed relationships not present in the center's formal operating structure. The culmination of this work is an organic organizational chart, which contains a fuller understanding of the center's internal structure than can be found in the official organizational chart. / Master of Science Data analytics Machine learning social computing
10	What does Big Data has in-store for organisations: An Executive Management Perspective Hussain, Zahid I., Asad, M., Alketbi, R. January 2017 (has links) No / With a cornucopia of literature on Big Data and Data Analytics it has become a recent buzzword. The literature is full of hymns of praise for big data, and its potential applications. However, some of the latest published material exposes the challenges involved in implementing Big Data (BD) approach, where the uncertainty surrounding its applications is rendering it ineffective. The paper looks at the mind-sets and perspective of executives and their plans for using Big Data for decision making. Our data collection involved interviewing senior executives from a number of world class organisations in order to determine their understanding of big data, its limitations and applications. By using the information gathered by this is used to analyse how well executives understand big data and how well organisations are ready to use it effectively for decision making. The aim is to provide a realistic outlook on the usefulness of this technology and help organisations to make suitable and realistic decisions on its investment. Professionals and academics are becoming increasingly interested in the field of big data (BD) and data analytics. Companies invest heavily into acquiring data, and analysing it. More recently the focus has switched towards data available through the internet which appears to have brought about new data collection opportunities. As the smartphone market developed further, data sources extended to include those from mobile and sensor networks. Consequently, organisations started using the data and analysing it. Thus, the field of business intelligence emerged, which deals with gathering data, and analysing it to gain insights and use them to make decisions (Chen, et al., 2012). BD is seem to have a huge immense potential to provide powerful information businesses. Accenture claims (2015) that organisations are extremely satisfied with their BD projects concerned with enhancing their customer reach. Davenport (2006) has presented applications in which companies are using the power of data analytics to consistently predict behaviours and develop applications that enable them to unearth important yet difficult to see customer preferences, and evolve rapidly to generate revenues. Big data Data analytics Decision making

Search results