Global ETD Search

581	Privacy-awareness in the era of Big Data and machine learning / Integritetsmedvetenhet i eran av Big Data och maskininlärning Vu, Xuan-Son January 2019 (has links) Social Network Sites (SNS) such as Facebook and Twitter, have been playing a great role in our lives. On the one hand, they help connect people who would not otherwise be connected before. Many recent breakthroughs in AI such as facial recognition [49] were achieved thanks to the amount of available data on the Internet via SNS (hereafter Big Data). On the other hand, due to privacy concerns, many people have tried to avoid SNS to protect their privacy. Similar to the security issue of the Internet protocol, Machine Learning (ML), as the core of AI, was not designed with privacy in mind. For instance, Support Vector Machines (SVMs) try to solve a quadratic optimization problem by deciding which instances of training dataset are support vectors. This means that the data of people involved in the training process will also be published within the SVM models. Thus, privacy guarantees must be applied to the worst-case outliers, and meanwhile data utilities have to be guaranteed. For the above reasons, this thesis studies on: (1) how to construct data federation infrastructure with privacy guarantee in the big data era; (2) how to protect privacy while learning ML models with a good trade-off between data utilities and privacy. To the first point, we proposed different frameworks em- powered by privacy-aware algorithms that satisfied the definition of differential privacy, which is the state-of-the-art privacy-guarantee algorithm by definition. Regarding (2), we proposed different neural network architectures to capture the sensitivities of user data, from which, the algorithm itself decides how much it should learn from user data to protect their privacy while achieves good performance for a downstream task. The current outcomes of the thesis are: (1) privacy-guarantee data federation infrastructure for data analysis on sensitive data; (2) privacy-guarantee algorithms for data sharing; (3) privacy-concern data analysis on social network data. The research methods used in this thesis include experiments on real-life social network dataset to evaluate aspects of proposed approaches. Insights and outcomes from this thesis can be used by both academic and industry to guarantee privacy for data analysis and data sharing in personal data. They also have the potential to facilitate relevant research in privacy-aware representation learning and related evaluation methods. Differential Privacy Machine Learning Deep Learning Big Data Computer Sciences Datavetenskap (datalogi)
582	Fast Data Analysis Methods For Social Media Data Nhlabano, Valentine Velaphi 07 August 2018 (has links) The advent of Web 2.0 technologies which supports the creation and publishing of various social media content in a collaborative and participatory way by all users in the form of user generated content and social networks has led to the creation of vast amounts of structured, semi-structured and unstructured data. The sudden rise of social media has led to their wide adoption by organisations of various sizes worldwide in order to take advantage of this new way of communication and engaging with their stakeholders in ways that was unimaginable before. Data generated from social media is highly unstructured, which makes it challenging for most organisations which are normally used for handling and analysing structured data from business transactions. The research reported in this dissertation was carried out to investigate fast and efficient methods available for retrieving, storing and analysing unstructured data form social media in order to make crucial and informed business decisions on time. Sentiment analysis was conducted on Twitter data called tweets. Twitter, which is one of the most widely adopted social network service provides an API (Application Programming Interface), for researchers and software developers to connect and collect public data sets of Twitter data from the Twitter database. A Twitter application was created and used to collect streams of real-time public data via a Twitter source provided by Apache Flume and efficiently storing this data in Hadoop File System (HDFS). Apache Flume is a distributed, reliable, and available system which is used to efficiently collect, aggregate and move large amounts of log data from many different sources to a centralized data store such as HDFS. Apache Hadoop is an open source software library that runs on low-cost commodity hardware and has the ability to store, manage and analyse large amounts of both structured and unstructured data quickly, reliably, and flexibly at low-cost. A Lexicon based sentiment analysis approach was taken and the AFINN-111 lexicon was used for scoring. The Twitter data was analysed from the HDFS using a Java MapReduce implementation. MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The results demonstrate that it is fast, efficient and economical to use this approach to analyse unstructured data from social media in real time. / Dissertation (MSc)--University of Pretoria, 2019. / National Research Foundation (NRF) - Scarce skills / Computer Science / MSc / Unrestricted Big Data Machine Learning Sentiment Analysis Text Mining Apache Hadoop UCTD
583	P-SGLD : Stochastic Gradient Langevin Dynamics with control variates Bruzzone, Andrea January 2017 (has links) Year after years, the amount of data that we continuously generate is increasing. When this situation started the main challenge was to find a way to store the huge quantity of information. Nowadays, with the increasing availability of storage facilities, this problem is solved but it gives us a new issue to deal with: find tools that allow us to learn from this large data sets. In this thesis, a framework for Bayesian learning with the ability to scale to large data sets is studied. We present the Stochastic Gradient Langevin Dynamics (SGLD) framework and show that in some cases its approximation of the posterior distribution is quite poor. A reason for this can be that SGLD estimates the gradient of the log-likelihood with a high variability due to naïve sampling. Our approach combines accurate proxies for the gradient of the log-likelihood with SGLD. We show that it produces better results in terms of convergence to the correct posterior distribution than the standard SGLD, since accurate proxies dramatically reduce the variance of the gradient estimator. Moreover, we demonstrate that this approach is more efficient than the standard Markov Chain Monte Carlo (MCMC) method and that it exceeds other techniques of variance reduction proposed in the literature such as SAGA-LD algorithm. This approach also uses control variates to improve SGLD so that it is straightforward the comparison with our approach. We apply the method to the Logistic Regression model. Big Data Bayesian Inference MCMC SGLD Estimated Gradient Logistic Regression Probability Theory and Statistics Sannolikhetsteori och statistik
584	A picture is worth a thousand words, or? : Individuals use of visual dashboards Nilsson, Elin, Nyborg, Mikael January 2020 (has links) Purpose – The increasing amounts of data has become an important factor for organizations. A visual dashboard is a BI tool that can be used for communication of insights from big data. One way for individuals in organizations to get insights from timely and large data sets is through visualizations displayed in visual dashboards, but studies show that most of them fall short of their potential. Therefore, the aim of this study is to examine how individuals make use of visual dashboards. Design/Methodology – To obtain this understanding a literature review was performed, followed by a study conducted in two phases. Firstly, a multiple-case study of four organizations was performed, which included interviews and the think-aloud technique. Secondly, the findings from the multiple-case study were tested through interviews with experts in the BI area. Findings – The findings indicate that low democratization, scarce effects, and simplicity are reasons for why the use of visual dashboards is not fully exploited. Low attention and understanding combined with a lack of timely information means that data-driven actions are not taken. The phase of predictive analysis has not yet been reached, rather organizations are still using the visual dashboard for descriptive analysis, which in turn hinders the possibility for effects. For these reasons the use of visual dashboards does not meet the often described purpose to make better and faster decisions, and organizations are still to take steps in that direction. Research limitations – The sampling of industries in the multiple-case study could affect variables as number of KPIs. Business Administration Företagsekonomi
585	En kvalitativ granskning av rollen Data Scientist och deras arbetsuppgifter i förhållande till kännetecken för Big Data / A qualitative review of the role of Data Scientists and their work tasks in relation to the characteristics of Big Data Otterheim, Oskar January 2020 (has links) Detta examensarbete har inspirerats av det inkonsekventa tillämpandet av olika kännetecken vid definiering av Big Data och hur det vidsträckta arbetet som expertrollen för konceptet går att förhålla till just dessa kännetecken. Studiens syfte är således att ge en klarare bild över vad den diffusa rollen Data Scientist går ut på och lyfta fram vad de grundläggande uppgifterna i rollen innebär med dessa kännetecken som riktlinjer. Information om rollen har samlats in genom semi-strukturerade intervjuer med Data Scientists i verksamheter av varierande typer och storlekar. Studiens analys ger målande beskrivningar över hur arbetet för deltagande respondenter ser ut, och fastslår hur deras arbetsuppgifter förhåller sig till olika kännetecken för Big Data. Studiens resultat skildrar hur arbetet för de olika respondenterna förhåller sig till definitionen för Big Data, och hur arbetet skiljer sig beroende på vilken typ och storlek av verksamhet som Data Scientists är verksam i. Resultatet belyser också att arbetet för Data Scientists går att gemensamt förhålla till kännetecknen Value, Visualization och Validity vilket besvarar studiens grundläggande frågeställningen. Resultatet och undersökningen i sig reflekteras över i uppsatsens diskussionsdel där upptäckter som gjorts under arbetets gång skildras, både om Big Data som koncept och om rollen Data Scientist, vilket bland annat ger förslag på vidare studier som kan leda till kategoriseringar av rollen. Big Data Data Scientist V-modell Information Systems, Social aspects
586	“We Traded Our Privacy for Comfortability” : A Study About How Big Data is Used and Abused by Major International Companies Hansson, Madelene, Manfredsson, Adam January 2020 (has links) Due to digitalization, e-commerce and online presence is something most of us take for granted. Companies are moving more towards an internet-based arena of sales, rather than traditional commerce in physical stores. This development has led to that firms’ choses to market themselves through various online channels such as social media. Big data is our digital DNA that we leave behind on every part of the internet that we utilize. Big data has become an international commodity that can be sold, stored and used. The authors of this thesis have investigated the way international firms extract and use big data to construct customized marketing for their customers. This thesis has also examined the ethical perspective of how this commodity is handled and used, and people’s perception regarding the matter. This is interesting to investigate since very few researches has been previously conducted combining big data usage with ethics. To accomplish the aim of this thesis, significant theory has been reviewed and accounted for. Also, a qualitative research has been conducted, where two large companies that are working closely with big data has been investigated through a case-study. The authors have also conducted six semi-structured interviews with people between the age of 20-30 years old. The outcome of this thesis shows the importance of implementing ethics within the concept and usage of big data and provide insight into the mind of the consumer that has been lacking in previous research of this subject. Business Administration Företagsekonomi
587	Bodies of Data: The Social Production of Predictive Analytics Madisson Whitman (6881324) 26 June 2020 (has links) Bodies of Data challenges the promise of big data in knowing and organizing people by explicating how data are made and theorizing mismatches between actors, data, and institutions. Situated at a large public university in the United States that hosts approximately 30,000 undergraduate students, this research ethnographically traces the development and deployment of an app for student success that draws from traditional (demographic information, enrollment history, grade distributions) and non-traditional (WiFi network usage, card swipes, learning management systems) student data to anticipate the likelihood of graduation in a four-year period. The app, which offers an interface for students based on nudging, is the product of collaborations between actors who specialize in educational technology. As these actors manage the app, they must also interpret data against the students who generate those data, many of whom do not neatly mirror their data counterparts. The central question animating this research asks how the designers of the app create order—whether through material bodies that are knowable to data collection or reorganized demographic groupings—as they render students into data.<br><br>To address this question and investigate practices of making data, I conducted 12 months of ethnographic fieldwork, using participant observation and interviewing with university administrators, data scientists, app developers, and undergraduate students. Through a theoretical approach informed by anthropology, science and technology studies, critical data studies, and feminist theory, I analyze how data and the institution make each other through the modeling of student bodies and reshaping of subjectivity. I leverage technical glitches—slippages between students and their data—and failure at large at the institution as analytics to both expose otherwise hidden processes of ordering and productively read failure as an opportunity for imagining what data could do. Predictive projects that derive from big data are increasingly common in higher education as institutions look to data to understand populations. Bodies of Data empirically provides evidence regarding how data are made through sociotechnical processes, in which data are not for understanding but for ordering. As universities look to big data to inform decision-making, the findings of this research contradict assumptions that data provide neutral and objective ways of knowing students. Anthropology big data predictive analytics higher education nudge data doubles
588	Method for Collecting Relevant Topics from Twitter supported by Big Data Silva, Jesús, Senior Naveda, Alexa, Gamboa Suarez, Ramiro, Hernández Palma, Hugo, Niebles Núẽz, William 07 January 2020 (has links) There is a fast increase of information and data generation in virtual environments due to microblogging sites such as Twitter, a social network that produces an average of 8, 000 tweets per second, and up to 550 million tweets per day. That's why this and many other social networks are overloaded with content, making it difficult for users to identify information topics because of the large number of tweets related to different issues. Due to the uncertainty that harms users who created the content, this study proposes a method for inferring the most representative topics that occurred in a time period of 1 day through the selection of user profiles who are experts in sports and politics. It is calculated considering the number of times this topic was mentioned by experts in their timelines. This experiment included a dataset extracted from Twitter, which contains 10, 750 tweets related to sports and 8, 758 tweets related to politics. All tweets were obtained from user timelines selected by the researchers, who were considered experts in their respective subjects due to the content of their tweets. The results show that the effective selection of users, together with the index of relevance implemented for the topics, can help to more easily find important topics in both sport and politics. Big data Sports Virtual reality Data generation Microblogging Time-periods User profile Social networking (online)
589	Defining, analyzing and determining power losses - due to icing on wind turbine blades Canovas Lotthagen, Zandra January 2020 (has links) The wind power industry is one of the fastest-growing renewable energy industries in the world. Since more energy can be extracted from wind when the density is higher, a lot of the investments made in the wind power industry are made in cold climates. But with cold climates come harsh weather conditions such as icing. The icing on wind power rotor blades causes the aerodynamic properties of the blade to shift and with further ice accretion, the wind power plant can come to a standstill causing a loss of power, until the ice is melted. How big these losses are, depend greatly on site-specific variables such as elevation, temperature, and precipitation. The literature claims these ice-related losses can correspond to 10-35% of the annual expected energy output. Some studies have been made to standardize an ice loss determining method to be used by the industry, yet a standardization of calculating these losses do not exist. It was therefore interesting for this thesis to investigate the different methods that are being used. By using historical Supervisory Control and Data Acquisition (SCADA) data for two different sites located in Sweden, a robust ice determining code was created to identify ice losses. Nearly 32 million data points are being analyzed, and the data itself is provided by Siemens Gamesa which is one of the biggest companies within the wind power industry. A sensitivity analysis was made, and it was shown that a reference dataset reaching from May to September for four years could be used to clearly identify ice losses. To find the ice losses, three different scenarios were tested. The three scenarios use different temperature intervals to find ice losses. For scenario 1 all data points below 0 degrees are investigated. And for scenario 2 and 3 this interval is stretching from 3 degrees and below versus 5 degrees and below. It was found that Scenario 3, was the optimal way to identify the ice losses. Scenario 3 filtered the raw data so that only data points with a temperature below five degrees was used. For the two sites investigated, the annual ice losses were found to lower the annual energy output by 5-10%. Further, the correlation between temperature, precipitation, and ice losses was investigated. It was found that low temperature and high precipitation is strongly correlated to ice losses. Wind Power De-icing Defining Ice losses Task19 Temperature threshold Power curve Big Data Energy Systems Energisystem
590	The Implementation of social CRM : Key features and significant challenges associated with thepractical implementation of social Customer RelationshipManagement Kansbod, Julia January 2022 (has links) The rise of social media has challenged the traditional notionof CRM and introduced a new paradigm, known as socialCRM. While there are many benefits and opportunitiesassociated with the integration of social data in CRMsystems, a majority of companies are failing their social CRMimplementation. Since social CRM is still considered to be ayoung phenomenon, knowledge regarding itsimplementation and functionalities is limited. The purpose ofthis study is to contribute to the current state of knowledgeregarding the factors which influence the practicalimplementation of social CRM. In order to capturestate-of-the-art knowledge on this topic, a literature reviewwas conducted. In addition, interviews with CRM expertsworking within five Swedish companies were included inorder to gain additional insights from practice. Findingsindicate that the key features needed for social CRMimplementation revolve around the real-time monitoring,collection, processing, storing and analyzing of social data.Advanced technical tools, such as Big Data Technology, aredeemed required in order to handle large volumes of dataand properly transform it into valuable knowledge. The mostsignificant challenges identified heavily revolve aroundlimited knowledge as well as various technical andorganizational limitations. Additionally, findings indicatethat a multitude of uncertainties of practitioners revolvearound data legislations and privacy concerns. Hence, whilesocial CRM can entail a multitude of benefits, there are asignificant number of challenges which seem to stand in theway of unlocking the full potential of social CRM. In orderfor social CRM implementation to be made more accessiblefor organizations in the future, there is a need for moreknowledge and clarity regarding factors such as technicalsolutions, organizational changes and legislations. Big Data CRM GDPR Social CRM Social media Web 2.0 Computer Sciences Datavetenskap (datalogi)

Search results