Global ETD Search

1	On the efficient distributed evaluation of SPARQL queries / Sur l'évaluation efficace de requêtes SPARQL distribuées Graux, Damien 15 December 2016 (has links) Le Web Sémantique est une extension du Web standardisée par le World Wide Web Consortium. Les différents standards utilisent comme format de base pour les données le Resource Description Framework (rdf) et son langage de requêtes nommé sparql. Plus généralement, le Web Sémantique tend à orienter l’évolution du Web pour permettre de trouver et de traiter l’information plus facilement. L'augmentation des volumes de données rdf disponibles tend à faire rendre standard la distribution des jeux de données. Par conséquent, des évaluateurs de requêtes sparql efficaces et distribués sont de plus en plus nécessaires. Pour faire face à ces challenges, nous avons commencé par comparer plusieurs évaluateurs sparql distribués de l'état-de-l'art tout en adaptant le jeu de métriques considéré. Ensuite, une analyse guidée par des cas typiques d'utilisation nous a conduit à définir de nouveaux champs de développement dans le domaine de l'évaluation distribuée de sparql. Sur la base de ces nouvelles perspectives, nous avons développé plusieurs évaluateurs efficaces pour ces différents cas d'utilisation que nous avons comparé expérimentalement. / The Semantic Web standardized by the World Wide Web Consortium aims at providing a common framework that allows data to be shared and analyzed across applications. Thereby, it introduced as common base for data the Resource Description Framework (rdf) and its query language sparql.Because of the increasing amounts of rdf data available, dataset distribution across clusters is poised to become a standard storage method. As a consequence, efficient and distributed sparql evaluators are needed.To tackle these needs, we first benchmark several state-of-the-art distributed sparql evaluators while adapting the considered set of metrics to a distributed context such as e.g. network traffic. Then, an analysis driven by typical use cases leads us to define new development areas in the field of distributed sparql evaluation. On the basis of these fresh perspectives, we design several efficient distributed sparql evaluators which fit into each of these use cases and whose performances are validated compared with the already benchmarked evaluators. For instance, our distributed sparql evaluator named sparqlgx offers efficient time performances while being resilient to the loss of nodes. BigData Passage à l'échelle Données BigData Scalability Data set 004
2	Improving MapReduce Performance on Clusters / Amélioration des performances de MapReduce sur grappe de calcul Gault, Sylvain 23 March 2015 (has links) Beaucoup de disciplines scientifiques s'appuient désormais sur l'analyse et la fouille de masses gigantesques de données pour produire de nouveaux résultats. Ces données brutes sont produites à des débits toujours plus élevés par divers types d'instruments tels que les séquenceurs d'ADN en biologie, le Large Hadron Collider (LHC) qui produisait en 2012, 25 pétaoctets par an, ou les grands télescopes tels que le Large Synoptic Survey Telescope (LSST) qui devrait produire 30 pétaoctets par nuit. Les scanners haute résolution en imagerie médicale et l'analyse de réseaux sociaux produisent également d'énormes volumes de données. Ce déluge de données soulève de nombreux défis en termes de stockage et de traitement informatique. L'entreprise Google a proposé en 2004 d'utiliser le modèle de calcul MapReduce afin de distribuer les calculs sur de nombreuses machines.Cette thèse s'intéresse essentiellement à améliorer les performances d'un environnement MapReduce. Pour cela, une conception modulaire et adaptable d'un environnement MapReduce est nécessaire afin de remplacer aisément les briques logicielles nécessaires à l'amélioration des performances. C'est pourquoi une approche à base de composants est étudiée pour concevoir un tel environnement de programmation. Afin d'étudier les performances d'une application MapReduce, il est nécessaire de modéliser la plate-forme, l'application et leurs performances. Ces modèles doivent être à la fois suffisamment précis pour que les algorithmes les utilisant produisent des résultats pertinents, mais aussi suffisamment simple pour être analysés. Un état de l'art des modèles existants est effectué et un nouveau modèle correspondant aux besoins d'optimisation est défini. De manière à optimiser un environnement MapReduce la première approche étudiée est une approche d'optimisation globale qui aboutit à une amélioration du temps de calcul jusqu'à 47 %. La deuxième approche se concentre sur la phase de shuffle de MapReduce où tous les nœuds envoient potentiellement des données à tous les autres nœuds. Différents algorithmes sont définis et étudiés dans le cas où le réseau est un goulet d'étranglement pour les transferts de données. Ces algorithmes sont mis à l'épreuve sur la plate-forme expérimentale Grid'5000 et montrent souvent un comportement proche de la borne inférieure alors que l'approche naïve en est éloignée. / Nowadays, more and more scientific fields rely on data mining to produce new results. These raw data are produced at an increasing rate by several tools like DNA sequencers in biology, the Large Hadron Collider (LHC) in physics that produced 25 petabytes per year as of 2012, or the Large Synoptic Survey Telescope (LSST) that should produce 30 petabyte of data per night. High-resolution scanners in medical imaging and social networks also produce huge amounts of data. This data deluge raise several challenges in terms of storage and computer processing. The Google company proposed in 2004 to use the MapReduce model in order to distribute the computation across several computers.This thesis focus mainly on improving the performance of a MapReduce environment. In order to easily replace the software parts needed to improve the performance, designing a modular and adaptable MapReduce environment is necessary. This is why a component based approach is studied in order to design such a programming environment. In order to study the performance of a MapReduce application, modeling the platform, the application and their performance is mandatory. These models should be both precise enough for the algorithms using them to produce meaningful results, but also simple enough to be analyzed. A state of the art of the existing models is done and a new model adapted to the needs is defined. On order to optimise a MapReduce environment, the first studied approach is a global optimization which result in a computation time reduced by up to 47 %. The second approach focus on the shuffle phase of MapReduce when all the nodes may send some data to every other node. Several algorithms are defined and studied when the network is the bottleneck of the data transfers. These algorithms are tested on the Grid'5000 experiment platform and usually show a behavior close to the lower bound while the trivial approach is far from it. MapReduce BigData Ordonnancement Optimisation MapReduce BigData Scheduling Optimization
3	Modélisation intégratrice du traitement BigData / Integrative modeling of Big Data processing Hashem, Hadi 19 September 2016 (has links) Dans le monde d’aujourd’hui de multiples acteurs de la technologie numérique produisent des quantités infinies de données. Capteurs, réseaux sociaux ou e-commerce, ils génèrent tous de l’information qui s’incrémente en temps-réel selon les 3 V de Gartner : en Volume, en Vitesse et en Variabilité. Afin d’exploiter efficacement et durablement ces données, il est important de respecter la dynamicité de leur évolution chronologique au moyen de deux approches : le polymorphisme d’une part, au moyen d’un modèle dynamique capable de supporter le changement de type à chaque instant sans failles de traitement ; d’autre part le support de la volatilité par un modèle intelligent prenant en compte des données clé seulement interprétables à un instant « t », au lieu de traiter toute la volumétrie des données actuelle et historique.L’objectif premier de cette étude est de pouvoir établir au moyen de ces approches une vision intégratrice du cycle de vie des données qui s’établit selon 3 étapes, (1) la synthèse des données via la sélection des valeurs-clés des micro-données acquises par les différents opérateurs au niveau de la source, (2) la fusion en faisant le tri des valeurs-clés sélectionnées et les dupliquant suivant un aspect de dé-normalisation afin d’obtenir un traitement plus rapide des données et (3) la transformation en un format particulier de carte de cartes de cartes, via Hadoop dans le processus classique de MapReduce afin d’obtenir un graphe défini dans la couche applicative.Cette réflexion est en outre soutenue par un prototype logiciel mettant en oeuvre les opérateurs de modélisation sus-décrits et aboutissant à une boîte à outils de modélisation comparable à un AGL et, permettant une mise en place assistée d'un ou plusieurs traitements sur BigData / Nowadays, multiple actors of Internet technology are producing very large amounts of data. Sensors, social media or e-commerce, all generate real-time extending information based on the 3 Vs of Gartner: Volume, Velocity and Variety. In order to efficiently exploit this data, it is important to keep track of the dynamic aspect of their chronological evolution by means of two main approaches: the polymorphism, a dynamic model able to support type changes every second with a successful processing and second, the support of data volatility by means of an intelligent model taking in consideration key-data, salient and valuable at a specific moment without processing all volumes of history and up to date data.The primary goal of this study is to establish, based on these approaches, an integrative vision of data life cycle set on 3 steps, (1) data synthesis by selecting key-values of micro-data acquired by different data source operators, (2) data fusion by sorting and duplicating the selected key-values based on a de-normalization aspect in order to get a faster processing of data and (3) the data transformation into a specific format of map of maps of maps, via Hadoop in the standard MapReduce process, in order to define the related graph in applicative layer.In addition, this study is supported by a software prototype using the already described modeling tools, as a toolbox compared to an automatic programming software and allowing to create a customized processing chain of BigData Modélisation intégratrice BigData Raisonnement à base de cas Integrative modeling BigData Case-Based reasonning
4	The effect of quality metrics on the user watching behaviour in media content broadcast Setterquist, Erik January 2016 (has links) Understanding the effects of quality metrics on the user behavior is important for the increasing number of content providers in order to maintain a competitive edge. The two data sets used are gathered from a provider of live streaming and a provider of video on demand streaming. The important quality and non quality features are determined by using both correlation metrics and relative importance determined by machine learning methods. A model that can predict and simulate the user behavior is developed and tested. A time series model, machine learning model and a combination of both are compared. Results indicate that both quality features and non quality features are important in understanding user behavior, and the importance of quality features are reduced over time. For short prediction times the model using quality features is performing slightly better than the model not using quality features. BigData Analytics Machine Learning Time Series Analysis QoE Media Broadcast
5	Zpracování a vizualizace senzorových dat ve vojenském prostředí / Processing and Visualization of Military Sensor Data Boychuk, Maksym January 2016 (has links) This thesis deals with the creating, visualization and processing data in a military environment. The task is to design and implement a system that enables the creation, visualization and processing ESM data. The result of this work is a ESMBD application that allows using a classical approach, which is a relational database, and BigData technologies for data storage and manipulation. The comparison of data processing speed while using the classic approach (Postgres database) and BigData technologies (Cassandra databases and Hadoop) has been carried out as well.
6	Etik möter big data : En intervjustudie om kommunikatörers etiska uppfattningar kring big data Arctaedius, Emelie, Lundholm, Josefin January 2023 (has links) Problemformulering och syfte: Big data har gjort det möjligt för kommunikatörer att på ett smidigare sätt nå ut till rätt målgrupp. Samtidigt kan det, utifrån yrkesrollen s delade identitet, ses som ett etiskt dilemma att kommunikatörer använder insamlad information om människor för att försöka påverka dem med olika budskap.Syftet med studien är att undersöka vilka etiska uppfattningar och reflektioner kommunikatörer i en statlig organisation har kring big data. Metod och material: För att genomföra studien valdes fokusgrupper som metod. Elva kommunikatörer från en statlig organisation deltog, uppdelade i tre olika grupper. Intervjufrågorna utformades ur studiens frågeställningar och teorier. Intervjuerna transkriberades för att underlätta analysen, och utifrån detta hittades tre teman som kopplades till varsin frågeställning. Huvudresultat:Big data kan ur ettetiskt perspektiv beskrivas som en gråzon enligt kommunikatörerna. Att kommunikatörer riktar skadliga budskap med hjälp av big data ses som etisktfel, medan det anses okej att använda big data om budskapen bidrar tillsamhällsnytta. Kommunikatörerna upplever att digitala och sociala medier samlar in information på ett tveksamt och otydligt sätt och att det upplevs som ett dilemma att det inte är kommunikatörerna som styr och samlar in datan. En annan slutsats från studien är att kommunikatörerna upplever en otydlighet kring vilket ansvarde själva har i frågan om big data, och att många av dem inte reflekterat så mycket kring det tidigare. Etik Bigdata Kommunikatörsrollen Fokusgrupper Beredskapsteorin Media and Communications Medie- och kommunikationsvetenskap
7	BIG DATA : From hype to reality Danesh, Sabri January 2014 (has links) Big data is all of a sudden everywhere. It is too big to ignore!It has been six decades since the computer revolution, four decades after the development of the microchip, and two decades of the modern Internet! More than a decade after the 90s “.com” fizz, can Big Data be the next Big Bang? Big data reveals part of our daily lives. It has the potential to solve virtually any problem for a better urbanized global. Big Data sources are also very interesting from an official statistics point of view. The purpose of this paper is to explore the conceptions of big data and opportunities and challenges associated with using big data especially in official statistics. “A petabyte is the equivalent of 1,000 terabytes, or a quadrillion bytes. One terabyte is a thousand gigabytes. One gigabyte is made up of a thousand megabytes. There are a thousand thousand—i.e., a million—petabytes in a zettabyte” (Shaw 2014). And this is to be continued… Big Data Volume Velocity Variety Veracity Cukier Mayer-Schönberger SCB Statistics bigdata
8	Análisis de archivos Logs semi-estructurados de ambientes Web usando tecnologías Big-Data Villalobos Luengo, César Alexis January 2016 (has links) Magíster en Tecnologías de la Información / Actualmente el volumen de datos que las empresas generan es mucho más grande del que realmente pueden procesar, por ende existe un gran universo de información que se pierde implícito en estos datos. Este proyecto de tesis logró implementar tecnologías Big Data capaces de extraer información de estos grandes volúmenes de datos existentes en la organización y que no eran utilizados, de tal forma de transformarlos en valor para el negocio. La empresa elegida para este proyecto se dedicada al pago de cotizaciones previsionales de forma electrónica por internet. Su función es ser el medio por el cual se recaudan las cotizaciones de los trabajadores del país. Cada una de estas cotizaciones es informada, rendida y publicada a las instituciones previsionales correspondientes (Mutuales, Cajas de Compensación, AFPs, etc.). Para realizar su función, la organización ha implementado a lo largo de sus 15 años una gran infraestructura de alto rendimiento orientada a servicios web. Actualmente esta arquitectura de servicios genera una gran cantidad de archivos logs que registran los sucesos de las distintas aplicaciones y portales web. Los archivos logs tienen la característica de poseer un gran tamaño y a la vez no tener una estructura rigurosamente definida. Esto ha causado que la organización no realice un eficiente procesamiento de estos datos, ya que las actuales tecnologías de bases de datos relaciones que posee no lo permiten. Por consiguiente, en este proyecto de tesis se buscó diseñar, desarrollar, implementar y validar métodos que sean capaces de procesar eficientemente estos archivos de logs con el objetivo de responder preguntas de negocio que entreguen valor a la compañía. La tecnología Big Data utilizada fue Cloudera, la que se encuentra en el marco que la organización exige, como por ejemplo: Que tenga soporte en el país, que esté dentro de presupuesto del año, etc. De igual forma, Cloudera es líder en el mercado de soluciones Big Data de código abierto, lo cual entrega seguridad y confianza de estar trabajando sobre una herramienta de calidad. Los métodos desarrollados dentro de esta tecnología se basan en el framework de procesamiento MapReduce sobre un sistema de archivos distribuido HDFS. Este proyecto de tesis probó que los métodos implementados tienen la capacidad de escalar horizontalmente a medida que se le agregan nodos de procesamiento a la arquitectura, de forma que la organización tenga la seguridad que en el futuro, cuando los archivos de logs tengan un mayor volumen o una mayor velocidad de generación, la arquitectura seguirá entregando el mismo o mejor rendimiento de procesamiento, todo dependerá del número de nodos que se decidan incorporar. Minería de datos Negocios - Procesamiento de datos Procesamiento electrónico de datos Cloudera BigData MapReduce
9	Analýza pohybů hráčů basketbalu v utkání se zaměření na pozici pivota / Analysis of basketball players movement during the game with special focus on the center position Shaya, Jimmy January 2019 (has links) Title: An analysis of a basketball player's movement during the game with special focus on the centre position Objectives: This work aims to analyse and then quantify the movement load on the pivot position and compare the differences between the number of movements of ČEZ Basketball Nymburk and Real Madrid Baloncesto in their final game. Methods: The basis of this empirical study is a quantitative research that uses the method of analysis and comparison. To obtain the data, I have analysed the most frequent movements carried out by players in the game. I have collected the number of movements per each second for four positions. Observed group are centres. Movements were described theoretically, biomechanically, technically, or by using rules. After that I have used the comparison to determine the differences, which were implemented into charts and qualified in detail. Results: Movement analysis confirmed my expectations. In comparison of the pivot of an elite foreign team with the best Czech team, I have noticed significant percentage differences. Real Madrid Baloncesto players were higher in position-specific moves, such as post-up, rebounding, boxing-out, contacts, and screens. Unlike the players of ČEZ Basketball Nymburk, who achieved significantly higher values in less significant movements...
10	Gestión de la innovación abierta y los derechos de propiedad intelectual / Management of open innovation and intellectual property rights Mendoza Sánchez, Jhenner Emiliano, SANCHEZ MONTEROLA, LESLLY PAOLA EUMELIA 11 December 2019 (has links) El profesor Henry Chesbrough, da origen a “Open Innovation” (OI por sus siglas en inglés) a principios del presente milenio. Él afirma que “La innovación abierta es un paradigma que parte de la suposición de que las empresas pueden y deben utilizar ideas externas, así como vías internas y externas de acceso al mercado, con el fin de desarrollar su negocio” (Chesbrough, 2011, p. 126). La base de OI y los derechos de propiedad intelectual(DPI) en distintas áreas juegan un rol fundamental. Bican, Guderian & Ringbeck (2017), afirman que existe un efecto desactivador de la innovación. Sobre todo, en paises en vías de desarrollo. Debido a que existe una brecha para impulsar I+D+i desde el estado como promotor junto a las universidades. Además, “Las empresas deben organizar sus procesos de innovación para estar más abiertas a ideas y conocimientos externos” (Chesbrough, 2011). En Perú, y otros países de Latinoamérica, falta desarrollar políticas orientadas al desarrollo de innovación abierta. Según CEPAL (2018), el principal motivo de desconexión entre los ciudadanos y el estado, es la incapacidad de las instituciones públicas para satisfacer las demandas crecientes y cambiantes de la sociedad. Además, existen otros desafíos socioeconómicos y la necesidad de repensar las instituciones para darles mejor respuesta a las demandas de la sociedad. En el presente trabajo, estudiaremos los posibles factores de éxito de gestión de OI y DPI, la influencia de las TICs y la generacion de un ecosistema Hyper-colaborativo, para crear valor y promover un mayor bienestar en la población. / Professor Henry Chesbrough gives rise to "Open Innovation" (OI) at the beginning of this millennium. He states that "Open innovation is a paradigm that starts from the assumption that companies can and should use external ideas, as well as internal and external ways of accessing the market, in order to develop their business" (Chesbrough, 2011, p. 126). The basis of OI and intellectual property rights (IPR) in different areas play a fundamental role. Bican, Guderian&Ringbeck (2017), state that there is a deactivating effect of innovation. Above all, in developing countries. Because there is a gap to promote R & D & I from the state as a promoter with universities. In addition, "Companies must organize their innovation processes to be more open to external ideas and knowledge" (Chesbrough, 2011). In Peru, and other Latin American countries, there is a need to develop policies aimed at developing open innovation. According to ECLAC (2018), the main reason for disconnection between citizens and the state is the inability of public institutions to meet the growing and changing demands of society. In addition, there are other socio-economic challenges and the need to rethink institutions to better respond to society's demands. In this paper, we will study the possible success factors of OI and DPI management, the influence of ICTs and the generation of a Hyper-collaborative ecosystem, to create value and promote greater well-being in the population. GESTIÓN Innovación abierta Derechos de la propiedad intelectual Blockchain Bigdata Inteligencia artificial Management Open innovation Intellectual property rights

Search results