11 |
Big Data-analyser och beslutsfattande i svenska myndigheterVictoria, Åkestrand, My, Wisen January 2017 (has links)
Det finns mycket data att samla in om människor och mängden av data som går att samla in ökar. Allt fler verksamheter tar steget in i Big Data-‐användningen och svenska myndigheter är en av dem. Att analysera Big Data kan generera bättre beslutsunderlag, men det finns en problematik i hur inhämtad data ska analyseras och användas vid beslutsprocessen. Studiens resultat visar på att svenska myndigheter inte kan använda befintliga beslutsmodeller vid beslut som grundas i en Big Data-‐analys. Resultatet av studien visar även på att svenska myndigheter inte använder sig av givna steg i beslutsprocessen, utan det handlar mest om att identifiera Big Data-‐ analysens innehåll för att fatta ett beslut. Då beslutet grundas i vad Big Data-‐ analysen pekar på så blir det kringliggande aktiviteterna som insamling av data, kvalitetssäkring av data, analysering av data och visualisering av data allt mer essentiella.
|
12 |
Critical analysis of Big Data challenges and analytical methodsSivarajah, Uthayasankar, Kamal, M.M., Irani, Zahir, Weerakkody, Vishanth J.P. 08 October 2016 (has links)
Yes / Big Data (BD), with their potential to ascertain valued insights for enhanced decision-making process, have recently
attracted substantial interest from both academics and practitioners. Big Data Analytics (BDA) is increasingly
becoming a trending practice that many organizations are adopting with the purpose of constructing
valuable information from BD. The analytics process, including the deployment and use of BDA tools, is seen by
organizations as a tool to improve operational efficiency though it has strategic potential, drive new revenue
streams and gain competitive advantages over business rivals. However, there are different types of analytic applications
to consider. Therefore, prior to hasty use and buying costly BD tools, there is a need for organizations to
first understand the BDA landscape.Given the significant nature of the BDand BDA, this paper presents a state-ofthe-
art review that presents a holistic view of the BD challenges and BDA methods theorized/proposed/
employed by organizations to help others understand this landscape with the objective of making robust investment
decisions. In doing so, systematically analysing and synthesizing the extant research published on BD and
BDA area. More specifically, the authors seek to answer the following two principal questions: Q1 –What are the
different types of BD challenges theorized/proposed/confronted by organizations? and Q2 – What are the different
types of BDA methods theorized/proposed/employed to overcome BD challenges?. This systematic literature review
(SLR) is carried out through observing and understanding the past trends and extant patterns/themes in the
BDA research area, evaluating contributions, summarizing knowledge, thereby identifying limitations, implications
and potential further research avenues to support the academic community in exploring research
themes/patterns. Thus, to trace the implementation of BD strategies, a profiling method is employed to analyze
articles (published in English-speaking peer-reviewed journals between 1996 and 2015) extracted from the
Scopus database. The analysis presented in this paper has identified relevant BD research studies that have
contributed both conceptually and empirically to the expansion and accrual of intellectual wealth to the BDA
in technology and organizational resource management discipline.
|
13 |
A note on exploration of IoT generated big data using semanticsRanjan, R., Thakker, Dhaval, Haller, A., Buyya, R. 27 July 2017 (has links)
Yes / Welcome to this special issue of the Future Generation Computer Systems (FGCS) journal. The special issue compiles seven technical contributions that significantly advance the state-of-the-art in exploration of Internet of Things (IoT) generated big data using semantic web techniques and technologies.
|
14 |
Visualization of multivariate process data for fault detection and diagnosisWang, Ray Chen 02 October 2014 (has links)
This report introduces the concept of three-dimensional (3D) radial plots for the visualization of multivariate large scale datasets in plant operations. A key concept of this representation of data is the introduction of time as the third dimension in a two dimensional radial plot, which allows for the display of time series data in any number of process variables. This report shows the ability of 3D radial plots to conduct systemic fault detection and classification in chemical processes through the use of confidence ellipses, which capture the desired operating region of process variables during a defined period of steady-state operation. Principal component analysis (PCA) is incorporated into the method to reduce multivariate interactions and the dimensionality of the data. The method is applied to two case studies with systemic faults present (compressor surge and column flooding) as well as data obtained from the Tennessee Eastman simulator, which contained localized faults. Fault classification using the interior angles of the radial plots is also demonstrated in the paper. / text
|
15 |
Impact analysis of characteristics in product development : Change in product property with respect to component generationsLindström, Frej, Andersson, Daniel January 2017 (has links)
Scania has developed a unique modular product system which is an important successfactor, creating exibility and lies at the heart of their business model. R&Duse product and vehicle product properties to describe the product key factors. Theseproduct properties are both used during the development of new features and products,and also utilized by the project oce to estimate the total contribution of a project.Scania want to develop a new method to understand and be able to track and comparethe projects eect over time and also predict future vehicle improvements. In this thesis, we investigate how to quantify the impact on vehicle product propertiesand predict component improvements, based on data sources that have not beenutilized for these purposes before. The impact objective is ultimately to increase the understandingof the development process of heavy vehicles and the aim for this projectwas to provide statistical methods that can be used for investigative and predictivepurposes. First, with analysis of variance we statistically veried and quantied differencesin a product property between comparable vehicle populations with respectto component generations. Then, Random Forest and Articial Neural Networks wereimplemented to predict future eect on product property with respect to componentimprovements. We could see a dierence of approximately 10 % between the comparablecomponents of interest, which was more than the expected dierence. Theexpectations are based on performance measurements from a test environment. Theimplemented Random Forest model was not able to predict future eect based on theseperformance measures. Articial Neural Networks was able to capture structures fromthe test environment and its predictive performance and reliability was, under the givencircumstances, relatively good.
|
16 |
Recurring Query Processing on Big DataLei, Chuan 18 August 2015 (has links)
The advances in hardware, software, and networks have enabled applications from business enterprises, scientific and engineering disciplines, to social networks, to generate data at unprecedented volume, variety, velocity, and varsity not possible before. Innovation in these domains is thus now hindered by their ability to analyze and discover knowledge from the collected data in a timely and scalable fashion. To facilitate such large-scale big data analytics, the MapReduce computing paradigm and its open-source implementation Hadoop is one of the most popular and widely used technologies. Hadoop's success as a competitor to traditional parallel database systems lies in its simplicity, ease-of-use, flexibility, automatic fault tolerance, superior scalability, and cost effectiveness due to its use of inexpensive commodity hardware that can scale petabytes of data over thousands of machines. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving high-volume data, have become a bedrock component in most of these analytic applications. Efficient execution and optimization techniques must be designed to assure the responsiveness and scalability of these recurring queries. In this dissertation, we thoroughly investigate topics in the area of recurring query processing on big data.
In this dissertation, we first propose a novel scalable infrastructure called Redoop that treats recurring query over big evolving data as first class citizens during query processing. This is in contrast to state-of-the-art MapReduce/Hadoop system experiencing significant challenges when faced with recurring queries including redundant computations, significant latencies, and huge application development efforts. Redoop offers innovative window-aware optimization techniques for recurring query execution including adaptive window-aware data partitioning, window-aware task scheduling, and inter-window caching mechanisms. Redoop retains the fault-tolerance of MapReduce via automatic cache recovery and task re-execution support as well.
Second, we address the crucial need to accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated data sets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commonly expressed as the maximum allowed latency for producing results before their merits decay. On top of Redoop, we built a scalable multi-query sharing engine tailored for recurring workloads in the MapReduce infrastructure, called Helix. Helix deploys new sliced window-alignment techniques to create sharing opportunities among recurring queries without introducing additional I/O overheads or unnecessary data scans. Furthermore, Helix introduces a cost/benefit model for creating a sharing plan among the recurring queries, and a scheduling strategy for executing them to maximize the SLA satisfaction.
Third, recurring analytics queries tend to be expensive, especially when query processing consumes data sets in the hundreds of terabytes or more. Time sensitive recurring queries, such as fraud detection, often come with tight response time constraints as query deadlines. Data sampling is a popular technique for computing approximate results with an acceptable error bound while reducing high-demand resource consumption and thus improving query turnaround times. In this dissertation, we propose the first fast approximate query engine for recurring workloads in the MapReduce infrastructure, called Faro. Faro introduces two key innovations: (1) a deadline-aware sampling strategy that builds samples from the original data with reduced sample sizes compared to uniform sampling, and (2) adaptive resource allocation strategies that maximally improve the approximate results while assuring to still meet the response time requirements specified in recurring queries.
In our comprehensive experimental study of each part of this dissertation, we demonstrate the superiority of the proposed strategies over state-of-the-art techniques in scalability, effectiveness, as well as robustness.
|
17 |
ProGENitor : an application to guide your careerHauptli, Erich Jurg 20 January 2015 (has links)
This report introduces ProGENitor; a system to empower individuals with career advice based on vast amounts of data. Specifically, it develops a machine learning algorithm that shows users how to efficiently reached specific career goals based upon the histories of other users. A reference implementation of this algorithm is presented, along with experimental results that show that it provides quality actionable intelligence to users. / text
|
18 |
An examination of privacy in the socio-technological context of Big Data and the socio-cultural context of ChinaFu, Tao 01 August 2015 (has links)
Privacy has been an academic concern, ethical issue and legislative conundrum. No other factors have shaped the understanding of privacy as much as the development of technologies – be it the invention of press machines, telephones or cameras. With the diffusion of mobile Internet, social media, the Internet of Things and the penetration of devices such as smartphones, the global positioning system, surveillance cameras, sensors and radio frequency identification tags, Big Data, designed to economically extract value from a huge amount and variety of data, has been accumulating exponentially since 2012. Data-driven businesses collect, combine, use, share and analyze consumers’ personal information for business revenues. Consumers’ shopping habits, viewing habits, browsing history and many other online behaviors have been commodified. Never before in history had privacy been threatened by the latest communication technologies as it is today. This dissertation aims to study some of the rising issues of technology and businesses that relate to privacy in China, a rising economic power of the East. China is a country with Confucian heritage and governed under decades of Communist leadership. Its philosophical traditions and social fabric have shaped the perception of privacy since more than 2,000 years ago. “Private” was not taken as negative but being committed to the public or the greater good was an expected virtue in ancient China. The country also has a long tradition of peer surveillance whether it was under the baojia system or the later-on Urban and Rural Residents’ Committees. But after China adopted the reform and open-up policy in 1978, consumerism has inspired the new Chinese middle class to pursue more private space as a lifestyle. Alibaba, Baidu and Tencent are globally top-ranking Chinese Internet companies with huge numbers of users, tractions and revenues, whose businesses depend heavily on consumers’ personal data. As a response to the increase of consumer data and the potential intrusion of privacy by Internet and information service providers (IISPs), the Ministry of Industry and Information Technology, a regulator of China’s Internet industry, enacted laws to regulate the collection and use of personal information by the IISPs. Drawing upon the literature and privacy theories of Westin, Altman and Nissenbaum and the cultural theory of Hofstede, this study investigated the compliance of Chinese businesses’ privacy policies with relevant Chinese laws and the information provided in the privacy policies regarding the collection, use and disclosure of Internet users’ personal information; Chinese consumers’ privacy attitudes and actions, including the awareness, concerns, control, trust and trade-offs related to privacy; the differences among Chinese Fundamentalists, Pragmatists and Unconcerned using Core Privacy Orientation Index; and the conceptualization of privacy in present China. A triangulation of quantitative and qualitative methods such as case study, content analysis, online survey and semantic network analysis were employed to answer research questions and test hypotheses. This study found Chinese IISPs represented by Alibaba, Baidu and Tencent comply well with Chinese laws. Tencent provides the most information about the collection, use and disclosure of consumers’ personal information. Chinese consumers know little about Big Data technologies in terms of collecting their personal information. They have the most concerns about other individuals and the least about the government when their personal information is accessed without their knowledge. When their personal information is collected by online businesses, Chinese consumers’ have more concerns about their online chats, their images and emails and the fewer concerns about searches performed, websites browsed, shopping and viewing habits. Less than one-third of Chinese surveyed take pro-active measures to manage online privacy settings. Chinese consumers make more efforts to avoid being tracked by people who might criticize, harass, or target them; advertisers and hackers or criminals. They rarely make themselves invisible from government, law enforcement persons or people they are familiar with such as people from their past, family members and romantic partners. Chinese consumers are more trusting of the laws and regulations issued by the government than they are of online businesses to protect personal data. Chinese only trade privacy for benefits occasionally but when they see more benefits from privacy trade-offs, they have fewer concerns. To Chinese consumers, privacy means personal information, including but not limited to, family, home address, phone number, Chinese ID number, password to bank accounts and other online accounts, the leaking and disclosure of which without the owners’ consent to people whom they do not want the information to be known will result in a sense of insecurity.
|
19 |
Utmaningar i upphandlingsprocessen av Big datasystem : En kvalitativ studie om vilka utmaningar organisationer upplever vid upphandlingar av Big data-systemEriksson, Marcus, Pujol Gibson, Ricard January 2017 (has links)
Organisationer har idag tillgång till stora mängder data som inte kan hanteras av traditionella Business Intelligence‐verktyg. Big data karakteriseras av stor volym, snabb hastighet och stor variation av data. För att hantera dessa karaktärer av data behöver organisationer upphandla ett Big data‐system för att ha möjlighet att utvinna värde. Många organisationer är medvetna om att investeringar i Big data kan bli lönsamma men vägen dit är otydlig. Studiens syfte är att undersöka vilka utmaningar organisationer står inför i samband med upphandling av ett Big data‐system och var i upphandlingsprocessen dessa utmaningar uppstår. Det empiriska materialet har samlats in från tre stora svenska företag och myndigheter som har upphandlat ett Big data‐ system. Analys av materialet har genomförts med Critical Incident Technique att identifiera utmaningar organisationer upplever i samband med upphandling av ett Big data‐system. I studiens resultat framgår det att organisationer upplever utmaningar med att förstå behovet av ett Big data‐system, skapa projektgruppen, välja projektmetod, skapa kravspecifikationen och hantera känslig och personlig data. / Organizations today have access to massive amounts of data that cannot be handled by traditional Business Intelligence tools. Big data is characterized by big volume, high velocity and variation. Organizations need to acquire a Big Data system, in order to handle the characteristics of the data and be able to generate business value. Today’s organizations are aware that investing in Big Data can be profitable but getting there is a challenge. The purpose of this study is to investigate the challenges the organizations may experience in the process of acquiring a Big Data system and when these challenges arise. The empirical data has been collected by interviewing three large Swedish companies and authorities which have acquired a Big Data system. The Critical Incident Technique has been used in order to identify the challenges which the organizations had experienced in the process of acquiring a Big Data system. The findings of the study shows that organizations experience challenges when they are understanding the need of the Big data‐system, creating the project team, choosing the project method, defining the requirements of the system and managing sensitive and personal data.
|
20 |
Identification of Patterns of Fatal Injuries in Humans through Big DataSilva, Jesus, Romero, Ligia, Pineda, Omar Bonerge, Herazo-Beltran, Yaneth, Zilberman, Jack January 2020 (has links)
External cause injuries are defined as intentionally or unintentionally harm or injury to a person, which may be caused by trauma, poisoning, assault, accidents, etc., being fatal (fatal injury) or not leading to death (non-fatal injury). External injuries have been considered a global health problem for two decades. This work aims to determine criminal patterns using data mining techniques to a sample of patients from Mumbai city in India.
|
Page generated in 0.019 seconds