Global ETD Search

521	Evaluating Presto as an SQL on Hadoop solution : A Case at Truecaller Ahmed, Sahir January 2016 (has links) Truecaller is a mobile application with over 200 million unique users worldwide. Every day truecaller stores over 1 billion rows of data that they use to analyse for improving their product. The data is stored in Hadoop, which is a framework for storing and analysing large amounts of data on a distributed file system. In order to be able to analyse these large amounts of data the analytics team needs a new solution for more lightweight, ad-hoc analysis. This thesis evaluates the performance of the query engine Presto to see if it meets the requirements to help the data analytics team at truecaller gain efficiency. By using a design-science methodology, Presto’s pros and cons are presented. Presto is recommended as a solution to be used together with the tools today for specific lightweight use cases for users that are familiar with the data sets used by the analytics team. Other solutions for future evaluation are also recommended before taking a final decision.Keywords: Hadoop, Big Data, Presto, Hive, SQL on Hadoop / <p>Validerat; 20160819 (global_studentproject_submitter)</p> Social Behaviour Law Samhälls- beteendevetenskap juridik Hadoop Big Data Presto SQL on Hadoop
522	La vie privée à l'épreuve de la relation de soin / Privacy put to the test in care relationship Nieto, Adrien 20 November 2017 (has links) L'existence de mécanismes juridiques de protection de la vie privée en droit commun est irréfutable. Ceux dont le patient peut se prévaloir à l'occasion de la relation de soin demeurent nébuleux. La spécificité de cette relation, et des atteintes physiques et morales à la vie privée qui y sont consommées - regard, le toucher, nudité et échange d’informations privées - justifient un encadrement spécial et des protections spécifiques, existantes - mais à repenser - pour accompagner les enjeux posés par l'évolution et la modification de la relation de soin. L'émergence de nouveaux acteurs en santé, aux aspirations propres, modifie incontestablement l'objectif et les conséquences de cette relation. La donnée de santé, composante sous-estimée de la vie privée, en ce qu'elle ne transite plus uniquement du patient vers le professionnel de santé - et inversement - doit être encadrée, tant les enjeux économiques et politiques qui y sont afférents sont importants. La "valeur" de la vie privée doit être recentrée, à l’heure où la consommation, l’échange instantané d’informations et la publicité semblent avoir pris le pas sur elle. / The existence of legal mechanisms for the protection of privacy under common law is irrefutable. Those that the patient can claim during the care relationship remain unclear. The specific nature of this relationship, and the physical and moral impairments to privacy that are consumed in it - look, touch, nudity and the exchange of private information - justify a special framework and specific protections, existing but repensable, for accompany the stakes posed by the evolution and the modification of the care relationship. The emergence of new actors in health, with their own aspirations, undoubtedly modifies the objective and consequences of this relationship. Health data, an underestimated component of privacy, in that it n° longer passes only from the patient to the healthcare professional - and vice versa - must be framed, both the economic and political stakes associated with it . The "value" of privacy must be refocused, at a time when consumption, instantaneous exchange of information and “publicy” seem to have taken precedence over it. Relation de soin Prise en charge Privacy Care relationship Health Management of patient Big Data Modesty
523	A knowledge based approach of toxicity prediction for drug formulation : modelling drug vehicle relationships using soft computing techniques Mistry, Pritesh January 2015 (has links) This multidisciplinary thesis is concerned with the prediction of drug formulations for the reduction of drug toxicity. Both scientific and computational approaches are utilised to make original contributions to the field of predictive toxicology. The first part of this thesis provides a detailed scientific discussion on all aspects of drug formulation and toxicity. Discussions are focused around the principal mechanisms of drug toxicity and how drug toxicity is studied and reported in the literature. Furthermore, a review of the current technologies available for formulating drugs for toxicity reduction is provided. Examples of studies reported in the literature that have used these technologies to reduce drug toxicity are also reported. The thesis also provides an overview of the computational approaches currently employed in the field of in silico predictive toxicology. This overview focuses on the machine learning approaches used to build predictive QSAR classification models, with examples discovered from the literature provided. Two methodologies have been developed as part of the main work of this thesis. The first is focused on use of directed bipartite graphs and Venn diagrams for the visualisation and extraction of drug-vehicle relationships from large un-curated datasets which show changes in the patterns of toxicity. These relationships can be rapidly extracted and visualised using the methodology proposed in chapter 4. The second methodology proposed, involves mining large datasets for the extraction of drug-vehicle toxicity data. The methodology uses an area-under-the-curve principle to make pairwise comparisons of vehicles which are classified according to the toxicity protection they offer, from which predictive classification models based on random forests and decisions trees are built. The results of this methodology are reported in chapter 6.
524	Business analytics in traditional industries – tackling the new age of data and analytics. Fors, Anton, Ohlson, Emelie January 2016 (has links) Decision-making is no longer based on human preferences and expertise alone. The era of big data brings up new challenges with business analytics for organizations that want a competitive advantage. Previous research shows that a lot of studies have been made on why this era is now crucial to organizations but not how they can adapt it. In this case study there is a glimpse of how a traditional organization with an old mindset can catch up on the new technological advantages. The purpose of this study is to understand how a traditional company in Sweden is affected by analytics and if it is valuable to them.For us to be able to create our theoretical framework, we based our on peer-reviewed material but also technological and science blogs from key experts in the field. The material examines the most essential and crucial elements within the area of business analytics and data management. The theoretical framework has guided our work when formulating and refining the research question and the interview questions.The results of the study clearly show that our case is on the right track with new development and projects, but there are still a lot of milestones to achieve before these are fulfilled. Issues within the company have to be solved and there is a need to modify and change the culture in the organization to a more data-driven decisive culture. The study gives a clear insight into the challenges that organizations have to face and overcome before making radical changes. Business analytics big data decision-making business intelligence data Computer and Information Sciences Data- och informationsvetenskap
525	The role of Big Data in the evolution of Platform based Ecosystems : A case study of an emerging platform-based ecosystem in the software engineering industry Kostis, Angelos January 2016 (has links) Platform based ecosystems are becoming dominant models in the software engineering industry. ‘Big data’ has recently gained increased attention from both academia and practitioners and it is believed that big data affects every sector and industry. While an abundance of research focuses on big data and platform-based ecosystems, these two are typically approached as secluded spheres. This study aimed toward an investigation of big data’s role in the evolution of platform-based ecosystems in the software engineering industry. In the present thesis the influence of big data on the software engineering industry and more specifically, the impact of big data on the evolution of software ecosystems, is examined. A case study focused on a platform owner and pioneer in the software engineering industry has been conducted. This study identifies challenges and opportunities triggered by the advent of big data in context of platform-based ecosystems. Hence, considerable insight regarding the impact of big data on contemporary platform providers and the evolution of platform-centric ecosystems is gained. The findings illustrate that software ecosystems are affected by big data in a positive manner, but some identified challenges emerge and have to be tackled. Additionally, in this paper, it is suggested that both academia and practitioners should dig deeper into this relationship and identify how the evolution of platform-based ecosystems is impacted by the advent of big data. platform-based ecosystem software ecosystem big data Information Systems
526	Social media addiction : The paradox of visibility & vulnerability Kempa, Ewelina January 2015 (has links) We currently post a large amount of personal information about ourselves on social media sites. Many times though, users of these services are poorly aware of what kind of terms and conditions they agree to. There are in fact many techniques available that ensure users privacy, yet not many organizations make the effort to have those in place. Making a profit is what matters for companies and information on users is highly valued. It is the lack of regulations regarding data collection that enable organizations not to consider their users privacy. The data that can be collected is vast, it is important to understand that everything we do online, every search, click, shop and view is stored and the information is many times sold along to third-parties. Using information on users, companies can make profit by for example making predictions on the users, figuring out what they are interested in buying. It is nevertheless very difficult to make long-lasting regulations as the web constantly changes and grows. A qualitative research was conducted to observe to what extent social media addiction and its consequences is being discussed and researched. Interviews with social media users were also conducted. After an analysis on the findings it is clear that many users in fact would like to have more privacy online yet they feel the need to accept the term and conditions any way. Many users also state that they happily would like to read the terms and conditions, had they been written in a different way. Online privacy Internet addiction Big Data Regulations Social media Computer and Information Sciences Data- och informationsvetenskap
527	Social Big Data and Privacy Awareness Sang, Lin January 2015 (has links) Based on the rapid development of Big Data, the data from the online social network becomea major part of it. Big data make the social networks became data-oriented rather than social-oriented. Taking this into account, this dissertation presents a qualitative study to research howdoes the data-oriented social network aﬀect its users’ privacy management for nowadays. Within this dissertation, an overview of Big Data and privacy issues on the social network waspresented as a background study. We adapted the communication privacy theory as a frameworkfor further analysis how individuals manage their privacy on social networks. We study socialnetworks as an entirety in this dissertation. We selected Facebook as a case study to present theconnection between social network, Big Data and privacy issues. The data that supported the result of this dissertation collected by the face-to-face and in-depthinterview study. As consequence, we found that the people divided the social networks intodiﬀerent level of openness in order to avoid the privacy invasions and violations, according totheir privacy concern. They reduced and transferred their sharing from an open social networkto a more close one. However, the risk of privacy problems actually raised because peopleneglected to understand the data process on social networks. They focused on managed theeveryday sharing but too easily allowed other application accessed their personal data on thesocial network (such like the Facebook proﬁle). Big data Online social networks Communication privacy management theory Information Systems
528	The Challenges of Personal Data Markets and Privacy Spiekermann-Hoff, Sarah, Böhme, Rainer, Acquisti, Alessandro, Hui, Kai-Lung January 2015 (has links) (PDF) Personal data is increasingly conceived as a tradable asset. Markets for personal information are emerging and new ways of valuating individuals' data are being proposed. At the same time, legal obligations over protection of personal data and individuals' concerns over its privacy persist. This article outlines some of the economic, technical, social, and ethical issues associated with personal data markets, focusing on the privacy challenges they raise. JEL K20, K40, L14, L20, M30
529	Fast Computation on Processing Data Warehousing Queries on GPU Devices Cyrus, Sam 29 June 2016 (has links) Current database management systems use Graphic Processing Units (GPUs) as dedicated accelerators to process each individual query, which results in underutilization of GPU. When a single query data warehousing workload was run on an open source GPU query engine, the utilization of main GPU resources was found to be less than 25%. The low utilization then leads to low system throughput. To resolve this problem, this paper suggests a way to transfer all of the desired data into the global memory of GPU and keep it until all queries are executed as one batch. The PCIe transfer time from CPU to GPU is minimized, which results in better performance in less time of overall query processing. The execution time was improved by up to 40% when running multiple queries, compared to dedicated processing. Database performance High performance computing Big data Parallel queries on GPU Data warehouse queries Computer Sciences
530	Matrices efficientes pour le traitement du signal et l'apprentissage automatique / Efficient matrices for signal processing and machine learning Le Magoarou, Luc 24 November 2016 (has links) Les matrices, en tant que représentations des applications linéaires en dimension finie, jouent un rôle central en traitement du signal et des images et en apprentissage automatique. L'application d'une matrice de rang plein à un vecteur implique a priori un nombre d'opérations arithmétiques de l'ordre du nombre d'entrées non-nulles que contient la matrice. Cependant, il existe des matrices pouvant être appliquées bien plus rapidement, cette propriété étant d'ailleurs un des fondements du succès de certaines transformations linéaires, telles que la transformée de Fourier ou la transformée en ondelettes. Quelle est cette propriété? Est-elle vérifiable aisément? Peut-on approcher des matrices quelconques par des matrices ayant cette propriété? Peut-on estimer des matrices ayant cette propriété? La thèse s'attaque à ces questions en explorant des applications telles que l'apprentissage de dictionnaire à implémentation efficace, l'accélération des itérations d'algorithmes de résolution de de problèmes inverses pour la localisation de sources, ou l'analyse de Fourier rapide sur graphe. / Matrices, as natural representation of linear mappings in finite dimension, play a crucial role in signal processing and machine learning. Multiplying a vector by a full rank matrix a priori costs of the order of the number of non-zero entries in the matrix, in terms of arithmetic operations. However, matrices exist that can be applied much faster, this property being crucial to the success of certain linear transformations, such as the Fourier transform or the wavelet transform. What is the property that allows these matrices to be applied rapidly ? Is it easy to verify ? Can weapproximate matrices with ones having this property ? Can we estimate matrices having this property ? This thesis investigates these questions, exploring applications such as learning dictionaries with efficient implementations, accelerating the resolution of inverse problems or Fast Fourier Transform on graphs. Transformée rapide Grande dimension Factorisation matricielle Machine learning Signal processing Big data 006.3

Search results