471 |
Utilizing Crowd Sourced Analytics for Building Smarter Mobile Infrastructure and Achieving Better Quality of ExperienceYarish, David 04 January 2016 (has links)
There is great power in knowledge. Having insight into and predicting network events can be both informative and profitable. This thesis aims to assess how crowd-sourced network data collected on smartphones can be used to improve the quality of experience for users of the network and give network operators insight into how the networks infrastructure can also be improved.
Over the course of a year, data has been collected and processed to show where networks have been performing well and where they are under-performing. The results of this collection aim to show that there is value in the collection of this data, and that this data cannot be adequately obtained without a device side presence. The various graphs and histograms demonstrate that the quantities of measurements and speeds recorded vary by both the location and time of day. It is these variations that cannot be determined via traditional network-side measurements. During the course of this experiment, it was observed that certain times of day have much greater numbers of people using the network and it is likely that the quantities of users on the network are correlated with the speeds observed at those times. Places of gathering such as malls and public areas had a higher user density, especially around noon which could is a normal time when people would take a break from the work day. Knowing exactly where and when an Access Point (AP) is utilized is important information when trying to identify how users are utilizing the network. / Graduate / davidyarish@gmail.com
|
472 |
Inteligência competitiva e modelos de séries temporais para previsão de consumo : o estudo de uma empresa do setor metalúrgicoEspíndola, André Mauro Santos de 30 August 2013 (has links)
O mundo vive um contínuo e acelerado processo de transformação que envolve todas as áreas
do conhecimento. É possível afirmar que a velocidade desse processo tem uma relação direta
com a rapidez em que ocorrem as mudanças na área tecnológica. Estas mudanças têm tornado
cada vez mais as relações globalizadas, modificado as transações comercias e fazendo com
que as empresas repensem as formas de competir. Nesse contexto, o conhecimento assume, a
partir do volume de dados e informações, um papel de novo insumo, muitas vezes com maior
importância que o trabalho, capital e a terra. Essas mudanças e a importância da informação
fazem com que as empresas busquem um novo posicionamento, procurando identificar no
ambiente externo sinais que possam indicar eventos futuros. O grande desafio das empresas
passa pela obtenção de dados, extração da informação e transformação dessa em
conhecimento útil para a tomada de decisão. Nessa conjuntura este estudo teve como objetivo
identificar qual o modelo de previsão de consumo para análise das informações no processo
de Inteligência Competitiva em uma empresa do setor metalúrgico localizada no estado do
Rio Grande do Sul. No desenvolvimento do estudo foram utilizados os temas Big Data, Data
Mining, Previsão de Demanda e Inteligência Competitiva com a finalidade de responder à
seguinte questão: Qual o modelo de previsão de consumo de aço que pode ser usado para
análise das informações no processo de Inteligência Competitiva? Na realização do estudo
foram analisados dados internos e externos a empresa na busca pela identificação de
correlação entre o consumo de aço da empresa e variáveis econômicas que posteriormente
foram utilizadas na identificação do modelo de previsão de consumo. Foram identificados
dois modelos, um univariado sem intervenção através da metodologia de Box e Jenkins, o
segundo modelo foi um modelo de previsão com Função de Transferência. Os dois modelos
apresentaram uma boa capacidade de descrever a série histórica do consumo de aço, mas o
modelo univariado apresentou melhores resultados na capacidade de previsão. / Submitted by Marcelo Teixeira (mvteixeira@ucs.br) on 2014-05-13T13:26:01Z
No. of bitstreams: 1
Dissertacao Andre Mauro S. de Espíndola.pdf: 1338819 bytes, checksum: af57b17c9aee6e9db4340062eb47950b (MD5) / Made available in DSpace on 2014-05-13T13:26:01Z (GMT). No. of bitstreams: 1
Dissertacao Andre Mauro S. de Espíndola.pdf: 1338819 bytes, checksum: af57b17c9aee6e9db4340062eb47950b (MD5) / The world has been in a continuous and rapid process of transformation which involves all the
areas of knowledge. It is possible to assert that the speed of this process has a direct
relationship with the fast changes in the technological area. These changes have influenced
the global relationships even more; modifying the commercial trades and making companies
rethink their competitive actions. In this field, knowledge takes on a new role giving more
importance to the amount of data and information to the detriment of land, labor and capital.
These changes and the importance given to information make companies establish new
positions in order to identify signs that anticipate events. Obtaining, extracting and
transforming information into useful knowledge to help in the final decision is a challenge.
Thus the purpose of this study is determine a model of consumption anticipation to analyze
the process of competitive intelligence in a Metallurgy Company located in the state of Rio
Grande do Sul. To develop the study the themes Big Data, Data Mining, Demand Prediction
and Competitive Intelligence were used aiming to answer the question: Which model to
anticipate consumption for iron can be used to analyze information in the process of
competitive intelligence? For the study, internal and external data were analyzed to identify
the relation between the company iron consumption and the economic variables, which were
used in the demand anticipation afterwards. Two models were identified, beeing one of them
univariate and having no intervention through Box and Jenkins methodology. The second
model had a transfer function. Both of them demonstrated good capability in describing
historical series of iron consumption, however the univariate model has demonstrated better
results in the capability of anticipation.
|
473 |
Genomic data analyses for population history and population healthBycroft, Clare January 2017 (has links)
Many of the patterns of genetic variation we observe today have arisen via the complex dynamics of interactions and isolation of historic human populations. In this thesis, we focus on two important features of the genetics of populations that can be used to learn about human history: population structure and admixture. The Iberian peninsula has a complex demographic history, as well as rich linguistic and cultural diversity. However, previous studies using small genomic regions (such as Y-chromosome and mtDNA) as well as genome-wide data have so far detected limited genetic structure in Iberia. Larger datasets and powerful new statistical methods that exploit information in the correlation structure of nearby genetic markers have made it possible to detect and characterise genetic differentiation at fine geographic scales. We performed the largest and most comprehensive study of Spanish population structure to date by analysing genotyping array data for ~1,400 Spanish individuals genotyped at ~700,000 polymorphic loci. We show that at broad scales, the major axis of genetic differentiation in Spain runs from west to east, while there is remarkable genetic similarity in the north-south direction. Our analysis also reveals striking patterns of geographically-localised and subtle population structure within Spain at scales down to tens of kilometres. We developed and applied new approaches to show how this structure has arisen from a complex and regionally-varying mix of genetic isolation and recent gene-flow within and from outside of Iberia. To further explore the genetic impact of historical migrations and invasions of Iberia, we assembled a data set of 2,920 individuals (~300,000 markers) from Iberia and the surrounding regions of north Africa, Europe, and sub-Saharan Africa. Our admixture analysis implies that north African-like DNA in Iberia was mainly introduced in the earlier half (860 - 1120 CE) of the period of Muslim rule in Iberia, and we estimate that the closest modern-day equivalents to the initial migrants are located in Western Sahara. We also find that north African-like DNA in Iberia shows striking regional variation, with near-zero contributions in the Basque regions, low amounts (~3%) in the north east of Iberia, and as high as (~11%) in Galicia and Portugal. The UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Understanding the role that genetics plays in phenotypic variation, and its potential interactions with other factors, provides a critical route to a better understanding of human biology and population health. As such, a key component of the UK Biobank resource has been the collection of genome-wide genetic data (~805,000 markers) on every participant using purpose-designed genotyping arrays. These data are the focus of the second part of this thesis. In particular, we designed and implemented a quality control (QC) pipeline on behalf of the current and future use of this multi-purpose resource. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestral backgrounds in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data, including population structure and familial relatedness, that can be important for downstream analyses. We find that cryptic relatedness is common among UK Biobank participants (~30% have at least one first cousin relative or closer), and a full range of human population structure is present in this cohort: from world-wide ancestral diversity to subtle population structure at sub-national geographic scales. Finally, we performed a genome-wide association scan on a well-studied and highly polygenic phenotype: standing height. This provided a further test of the effectiveness of our QC, as well as highlighting the potential of the resource to uncover novel regions of association.
|
474 |
The circulation of flesh : regional food producing/consuming systems in Southern England, 1500BC-AD1086Stansbie, Daniel January 2016 (has links)
It has become an axiom of British archaeology that the results of developer-funded fieldwork are under-utilised in research and several projects carried out at British universities have attempted to redress this perceived imbalance. These projects, including those on British and Continental prehistory carried out by Richard Bradley, the Roman Rural settlement project, the Fields of Britannia project, John Blair's work on early medieval England and the EngLaId project, of which this thesis forms a component, have all demonstrated beyond doubt the transformative effect of the data produced by developer-funded work on our understanding. However, to date no project has sought to utilise artefact and ecofact data produced by developer-funded work on a similar scale. This thesis is partly an attempt to fill this gap, by using ceramic, animal bone and charred plant data from digital archives generated by developer-funded archaeology, to address a series of questions about food production/consumption over the later prehistoric and early historic periods in Southern England. These three datasets have very varied characteristics and their integration in a single database was, therefore, one of the major challenges of the thesis. However, this also provided the opportunity to ask new questions and to address old questions with new data. The thesis argues that regional ecosystems had a long-term influence on processes of food production/consumption, which displayed considerable continuities across the boundaries of traditional archaeological periods. Landscape, settlement, ceramic, animal bone and charred plant data from three regional case studies, encompassing the Upper Thames Valley, the Middle and Lower Thames Valley and the route of HS1 in Kent were investigated using a Filemaker database and QGIS mapping. It is argued that, while there were long-term continuities in the use of plants and animals, the expression of social relationships expressed in fields, settlements and ceramics followed a cyclical pattern.
|
475 |
The impact of Privacy concerns in the context of Big Data : A cross-cultural quantitative study of France and Bangladesh.Lizot, Edouard, Islam, S M Abidul January 2018 (has links)
Background Big Data Analytics take place in almost every sector of new business world. Nowadays, banks are also adopting Big Data to handle the huge number of data that generate every day. Big Data helps banks to provide a fast, personalised service in a cost efficient way. On the other hand, Big Data has some privacy issues as it deals with a lot of data that can be decoded by third party. It is also the case in online banking as it is involved with personal and financial information. Privacy concerns also vary among different cultures. PurposeThe purpose of this cross-cultural study is to investigate online privacy concerns in the context of Big Data MethodologyA quantitative approach has been followed and data were collected through an online survey to understand the relations between variables. ConclusionThe findings indicate that the relationship between the privacy concern and its antecedents differ between France and Bangladesh. Though for both countries, the desire upon transparency showed a significant positive relationship with online privacy concerns. Additionally, for both countries a high privacy concern will not conduct to lower consumer trust and consumer engagement in online baking. The findings involving moderator variables were not significant at al
|
476 |
Automatic assessment of OLAP exploration quality / Evaluation automatique de la qualité des explorations OLAPDjedaini, Mahfoud 06 December 2017 (has links)
Avant l’arrivée du Big Data, la quantité de données contenues dans les bases de données était relativement faible et donc plutôt simple à analyser. Dans ce contexte, le principal défi dans ce domaine était d’optimiser le stockage des données, mais aussi et surtout le temps de réponse des Systèmes de Gestion de Bases de Données (SGBD). De nombreux benchmarks, notamment ceux du consortium TPC, ont été mis en place pour permettre l’évaluation des différents systèmes existants dans des conditions similaires. Cependant, l’arrivée de Big Data a complètement changé la situation, avec de plus en plus de données générées de jour en jour. Parallèlement à l’augmentation de la mémoire disponible, nous avons assisté à l’émergence de nouvelles méthodes de stockage basées sur des systèmes distribués tels que le système de fichiers HDFS utilisé notamment dans Hadoop pour couvrir les besoins de stockage technique et le traitement Big Data. L’augmentation du volume de données rend donc leur analyse beaucoup plus difficile. Dans ce contexte, il ne s’agit pas tant de mesurer la vitesse de récupération des données, mais plutôt de produire des séquences de requêtes cohérentes pour identifier rapidement les zones d’intérêt dans les données, ce qui permet d’analyser ces zones plus en profondeur, et d’extraire des informations permettant une prise de décision éclairée. / In a Big Data context, traditional data analysis is becoming more and more tedious. Many approaches have been designed and developed to support analysts in their exploration tasks. However, there is no automatic, unified method for evaluating the quality of support for these different approaches. Current benchmarks focus mainly on the evaluation of systems in terms of temporal, energy or financial performance. In this thesis, we propose a model, based on supervised automatic leaming methods, to evaluate the quality of an OLAP exploration. We use this model to build an evaluation benchmark of exploration support sys.terns, the general principle of which is to allow these systems to generate explorations and then to evaluate them through the explorations they produce.
|
477 |
Optimisation de la gestion des ressources sur une plate-forme informatique du type Big Data basée sur le logiciel Hadoop / Optimisation of the ressources management on "big data" platforms using the Hadoop softwareJlassi, Aymen 11 December 2017 (has links)
L'entreprise "Cyres-group" cherche à améliorer le temps de réponse de ses grappes Hadoop et la manière dont les ressources sont exploitées dans son centre de données. Les idées sous-jacentes à la réduction du temps de réponse sont de faire en sorte que (i) les travaux soumis se terminent au plus tôt et que (ii) le temps d'attente de chaque utilisateur du système soit réduit. Nous identifions deux axes d'amélioration : 1. nous décidons d'intervenir pour optimiser l'ordonnancement des travaux sur une plateforme Hadoop. Nous considérons le problème d'ordonnancement d'un ensemble de travaux du type MapReduce sur une plateforme homogène. 2. Nous décidons d'évaluer et proposer des outils capables (i) de fournir plus de flexibilité lors de la gestion des ressources dans le centre de données et (ii) d'assurer l'intégration d'Hadoop dans des infrastructures Cloud avec le minimum de perte de performance. Dans une première étude, nous effectuons une revue de la littérature. À la fin de cette étape, nous remarquons que les modèles mathématiques proposés dans la littérature pour le problème d'ordonnancement ne modélisent pas toutes les caractéristiques d'une plateforme Hadoop. Nous proposons à ce niveau un modèle plus réaliste qui prend en compte les aspects les plus importants tels que la gestion des ressources, la précédence entre les travaux, la gestion du transfert des données et la gestion du réseau. Nous considérons une première modélisation simpliste et nous considérons la minimisation de la date de fin du dernier travail (Cmax) comme critère à optimiser. Nous calculons une borne inférieure à l'aide de la résolution du modèle mathématique avec le solveur CPLEX. Nous proposons une heuristique (LocFirst) et nous l'évaluons. Ensuite, nous faisons évoluer notre modèle et nous considérons, comme fonction objective, la somme des deux critères identifiés depuis la première étape : la minimisation de la somme pondérée des dates de fin des travaux ( ∑ wjCj) et la minimisation du (Cmax). Nous cherchons à minimiser la moyenne pondérée des deux critères, nous calculons une borne inférieure et nous proposons deux heuristiques de résolution. / "Cyres-Group" is working to improve the response time of his clusters Hadoop and optimize how the resources are exploited in its data center. That is, the goals are to finish work as soon as possible and reduce the latency of each user of the system. Firstly, we decide to work on the scheduling problem in the Hadoop system. We consider the problem as the problem of scheduling a set of jobs on a homogeneous platform. Secondly, we decide to propose tools, which are able to provide more flexibility during the resources management in the data center and ensure the integration of Hadoop in Cloud infrastructures without unacceptable loss of performance. Next, the second level focuses on the review of literature. We conclude that, existing works use simple mathematical models that do not reflect the real problem. They ignore the main characteristics of Hadoop software. Hence, we propose a new model ; we take into account the most important aspects like resources management and the relations of precedence among tasks and the data management and transfer. Thus, we model the problem. We begin with a simplistic model and we consider the minimisation of the Cmax as the objective function. We solve the model with mathematical solver CPLEX and we compute a lower bound. We propose the heuristic "LocFirst" that aims to minimize the Cmax. In the third level, we consider a more realistic modelling of the scheduling problem. We aim to minimize the weighted sum of the following objectives : the weighted flow time ( ∑ wjCj) and the makespan (Cmax). We compute a lower bound and we propose two heuristics to resolve the problem.
|
478 |
Produtividade de café arábica estimada a partir de dados de modelos de circulação global /Valeriano, Taynara Tuany Borges. January 2017 (has links)
Orientador: Glauco de Souza Rolim / Banca: Márcio José de Santana / Banca: Lucieta Guerreiro Martorano / Resumo: O Brasil é o maior produtor da segunda commodity de maior valor no mercado, o café. O conhecimento de técnicas eficazes de estimativa de produtividade é de grande importância para o mercado cafeeiro, possibilitando melhor planejamento e tornando a atividade mais sustentável. Uma forma eficaz de se estimar a produtividade é através da modelagem agrometeorológica, que quantifica a influência do clima nos cultivos agrícolas. Entretanto, dados climáticos são necessários, e estes em sua maioria são provenientes de estações de superfície. Uma forma de inovar essa técnica é a utilização de outra fonte de dados climáticos, como os dados em grid (GD) que são resultado da combinação de diversas fontes, como sondas oceânicas, sensoriamento remoto, entre outros. O presente trabalho teve como objetivo estimar a produtividade de Coffea arabica a partir do modelo proposto por Santos e Camargo (2006) (SC) com dados meteorológicos provenientes dos GD dos sistemas ECMWF e NASA para regiões cafeeiras de Minas Gerais e São Paulo. Em um primeiro momento foram comparados dados de temperatura do ar e precipitação obtidos pelo ECMWF e NASA aos dados de estações meteorológicas de superfície, com o objetivo de verificação da acurácia dos GDs. No segundo momento foi proposta uma calibração no modelo de estimativa de produtividade de Santos e Camargo (2006), para a utilização dos GDs. Para temperatura, os dados do ECMWF e NASA foram precisos e acurados, com valores mínimos de RMSE iguais a 0,37 e 0,50 °... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Brazil is the largest producer of the second most valuable commodity on the market, coffee. The knowledge of effective techniques of yield estimation is importance for the coffee market, enabling better planning and making the activity more sustainable. An effective way of yield estimating is through agrometeorological modeling, which quantifies the influence of climate on agricultural crops. However, weather data is needed, and these mostly come from surface stations. One way to innovate this technique is to use another source of climate data, such as Gridded data (GD), which are the result of a combination of diverse sources such as oceanic probes, remote sensing, among others. This work aimed to propose the use of the model of yield estimation proposed by Santos and Camargo (2006) with meteorological data from the GCMs, ECMWF and NASA for coffee regions of Minas Gerais and São Paulo. At first, temperature and precipitation data obtained through the ECMWF and NASA were compared and verified with data from surface meteorological stations. In the second moment, a calibration was proposed in Santos and Camargo (2006), for the use of GDs (ECMWF and NASA). For temperature, ECMWF and NASA data were accurate and accurate, with minimum RMSE values of 0.37 and 0.50 ° C, and Willmott's d values of 0.86 and 0.53, respectively. For precipitation, the minimum values of precision and accuracy were lower, with RMSE equal to 2.15 and 5.33 ° C, and d of Willmott equal to 0.79 and 0.59, respectively. When the GD data were applied in the SC model for coffee yield estimation the ECMWF was superior to NASA. The calibration of the SC model coefficients for ECMWF and NASA data was efficient, since there was a decrease in the mean absolute percentage error (MAPE), mean root mean square root (RMSEs) and root mean squ... (Complete abstract click electronic access below) / Mestre
|
479 |
The use of geospatial tools to support, monitor and evaluate post-disaster recoveryBrown, Daniel January 2018 (has links)
The aim of this research is to test the feasibility of using remote sensing-based information products and services to support the planning, monitoring and evaluation of recovery after disaster. The thesis begins by outlining the process of post-disaster recovery, what it entails and who is involved. The data and information needs at different stages of the disaster cycle are introduced and the importance of monitoring and evaluating post-disaster recovery is discussed. The literature review introduces the high-spatial-resolution remote sensing market and the technology focusing on current sensors’ capabilities. This is followed by a review of previous attempts to measure post-disaster recovery by practitioners and academics. At the end of the chapter a list of recovery indicators, suitable for remote sensing analysis, are presented and assessed through a user needs survey. In chapter 3, the six recovery categories and thirteen indicators identified in the literature review form a framework for the retrospective analysis of recovery in Thailand and Pakistan. A selection of results is presented to demonstrate the usefulness of remote sensing as a recovery monitoring tool. To assess its reliability, the results from the satellite image analysis are triangulated against narratives and datasets acquired on the ground. The next two chapters describe work done whilst providing real-time support to two humanitarian agencies operating in Port-au-Prince one-and-a-half years after the 2010 Haiti earthquake. Chapter 4 describes how geospatial tools were used to support a British Red Cross integrated reconstruction project for 500 households living in an informal settlement. The chapter describes how geospatial tools were used as a rapid assessment tool, and to support cadastral and enumeration mapping and the community participatory process. While previous chapters focus on the manual analysis of satellite imagery, chapter 5 reports how semi-automatic analyses of satellite imagery were used to support UN-Habitat by monitoring a planned camp and large-scale instances of spontaneous settlement. The conclusion to the thesis summarises the key lessons learnt from the retrospective analysis of recovery in Thailand and Pakistan and the real-time application in Haiti. Recommendations are then made on how to effectively use remote sensing in support of post-disaster recovery focussing on what to measure, when and how. Recognising that a mixed-method approach can best monitor recovery, recommendations are also made on how to integrate remote sensing with existing tools.
|
480 |
Everything Counts in Large Amounts : Protection of big data under the Database DirectiveZeitlin, Martin January 2018 (has links)
No description available.
|
Page generated in 0.0602 seconds