Global ETD Search

471	The circulation of flesh : regional food producing/consuming systems in Southern England, 1500BC-AD1086 Stansbie, Daniel January 2016 (has links) It has become an axiom of British archaeology that the results of developer-funded fieldwork are under-utilised in research and several projects carried out at British universities have attempted to redress this perceived imbalance. These projects, including those on British and Continental prehistory carried out by Richard Bradley, the Roman Rural settlement project, the Fields of Britannia project, John Blair's work on early medieval England and the EngLaId project, of which this thesis forms a component, have all demonstrated beyond doubt the transformative effect of the data produced by developer-funded work on our understanding. However, to date no project has sought to utilise artefact and ecofact data produced by developer-funded work on a similar scale. This thesis is partly an attempt to fill this gap, by using ceramic, animal bone and charred plant data from digital archives generated by developer-funded archaeology, to address a series of questions about food production/consumption over the later prehistoric and early historic periods in Southern England. These three datasets have very varied characteristics and their integration in a single database was, therefore, one of the major challenges of the thesis. However, this also provided the opportunity to ask new questions and to address old questions with new data. The thesis argues that regional ecosystems had a long-term influence on processes of food production/consumption, which displayed considerable continuities across the boundaries of traditional archaeological periods. Landscape, settlement, ceramic, animal bone and charred plant data from three regional case studies, encompassing the Upper Thames Valley, the Middle and Lower Thames Valley and the route of HS1 in Kent were investigated using a Filemaker database and QGIS mapping. It is argued that, while there were long-term continuities in the use of plants and animals, the expression of social relationships expressed in fields, settlements and ceramics followed a cyclical pattern.
472	The impact of Privacy concerns in the context of Big Data : A cross-cultural quantitative study of France and Bangladesh. Lizot, Edouard, Islam, S M Abidul January 2018 (has links) Background Big Data Analytics take place in almost every sector of new business world. Nowadays, banks are also adopting Big Data to handle the huge number of data that generate every day. Big Data helps banks to provide a fast, personalised service in a cost efficient way. On the other hand, Big Data has some privacy issues as it deals with a lot of data that can be decoded by third party. It is also the case in online banking as it is involved with personal and financial information. Privacy concerns also vary among different cultures. PurposeThe purpose of this cross-cultural study is to investigate online privacy concerns in the context of Big Data MethodologyA quantitative approach has been followed and data were collected through an online survey to understand the relations between variables. ConclusionThe findings indicate that the relationship between the privacy concern and its antecedents differ between France and Bangladesh. Though for both countries, the desire upon transparency showed a significant positive relationship with online privacy concerns. Additionally, for both countries a high privacy concern will not conduct to lower consumer trust and consumer engagement in online baking. The findings involving moderator variables were not significant at al Privacy Concern Big Data Online Banking Trust Engagement Culture. Business Administration Företagsekonomi
473	Automatic assessment of OLAP exploration quality / Evaluation automatique de la qualité des explorations OLAP Djedaini, Mahfoud 06 December 2017 (has links) Avant l’arrivée du Big Data, la quantité de données contenues dans les bases de données était relativement faible et donc plutôt simple à analyser. Dans ce contexte, le principal défi dans ce domaine était d’optimiser le stockage des données, mais aussi et surtout le temps de réponse des Systèmes de Gestion de Bases de Données (SGBD). De nombreux benchmarks, notamment ceux du consortium TPC, ont été mis en place pour permettre l’évaluation des différents systèmes existants dans des conditions similaires. Cependant, l’arrivée de Big Data a complètement changé la situation, avec de plus en plus de données générées de jour en jour. Parallèlement à l’augmentation de la mémoire disponible, nous avons assisté à l’émergence de nouvelles méthodes de stockage basées sur des systèmes distribués tels que le système de fichiers HDFS utilisé notamment dans Hadoop pour couvrir les besoins de stockage technique et le traitement Big Data. L’augmentation du volume de données rend donc leur analyse beaucoup plus difficile. Dans ce contexte, il ne s’agit pas tant de mesurer la vitesse de récupération des données, mais plutôt de produire des séquences de requêtes cohérentes pour identifier rapidement les zones d’intérêt dans les données, ce qui permet d’analyser ces zones plus en profondeur, et d’extraire des informations permettant une prise de décision éclairée. / In a Big Data context, traditional data analysis is becoming more and more tedious. Many approaches have been designed and developed to support analysts in their exploration tasks. However, there is no automatic, unified method for evaluating the quality of support for these different approaches. Current benchmarks focus mainly on the evaluation of systems in terms of temporal, energy or financial performance. In this thesis, we propose a model, based on supervised automatic leaming methods, to evaluate the quality of an OLAP exploration. We use this model to build an evaluation benchmark of exploration support sys.terns, the general principle of which is to allow these systems to generate explorations and then to evaluate them through the explorations they produce. Exploration des données OLAP Benchmarking Données massives Interactive Data Exploration OLAP Business Intelligence Benchmarking Big Data
474	Optimisation de la gestion des ressources sur une plate-forme informatique du type Big Data basée sur le logiciel Hadoop / Optimisation of the ressources management on "big data" platforms using the Hadoop software Jlassi, Aymen 11 December 2017 (has links) L'entreprise "Cyres-group" cherche à améliorer le temps de réponse de ses grappes Hadoop et la manière dont les ressources sont exploitées dans son centre de données. Les idées sous-jacentes à la réduction du temps de réponse sont de faire en sorte que (i) les travaux soumis se terminent au plus tôt et que (ii) le temps d'attente de chaque utilisateur du système soit réduit. Nous identifions deux axes d'amélioration : 1. nous décidons d'intervenir pour optimiser l'ordonnancement des travaux sur une plateforme Hadoop. Nous considérons le problème d'ordonnancement d'un ensemble de travaux du type MapReduce sur une plateforme homogène. 2. Nous décidons d'évaluer et proposer des outils capables (i) de fournir plus de flexibilité lors de la gestion des ressources dans le centre de données et (ii) d'assurer l'intégration d'Hadoop dans des infrastructures Cloud avec le minimum de perte de performance. Dans une première étude, nous effectuons une revue de la littérature. À la fin de cette étape, nous remarquons que les modèles mathématiques proposés dans la littérature pour le problème d'ordonnancement ne modélisent pas toutes les caractéristiques d'une plateforme Hadoop. Nous proposons à ce niveau un modèle plus réaliste qui prend en compte les aspects les plus importants tels que la gestion des ressources, la précédence entre les travaux, la gestion du transfert des données et la gestion du réseau. Nous considérons une première modélisation simpliste et nous considérons la minimisation de la date de fin du dernier travail (Cmax) comme critère à optimiser. Nous calculons une borne inférieure à l'aide de la résolution du modèle mathématique avec le solveur CPLEX. Nous proposons une heuristique (LocFirst) et nous l'évaluons. Ensuite, nous faisons évoluer notre modèle et nous considérons, comme fonction objective, la somme des deux critères identifiés depuis la première étape : la minimisation de la somme pondérée des dates de fin des travaux ( ∑ wjCj) et la minimisation du (Cmax). Nous cherchons à minimiser la moyenne pondérée des deux critères, nous calculons une borne inférieure et nous proposons deux heuristiques de résolution. / "Cyres-Group" is working to improve the response time of his clusters Hadoop and optimize how the resources are exploited in its data center. That is, the goals are to finish work as soon as possible and reduce the latency of each user of the system. Firstly, we decide to work on the scheduling problem in the Hadoop system. We consider the problem as the problem of scheduling a set of jobs on a homogeneous platform. Secondly, we decide to propose tools, which are able to provide more flexibility during the resources management in the data center and ensure the integration of Hadoop in Cloud infrastructures without unacceptable loss of performance. Next, the second level focuses on the review of literature. We conclude that, existing works use simple mathematical models that do not reflect the real problem. They ignore the main characteristics of Hadoop software. Hence, we propose a new model ; we take into account the most important aspects like resources management and the relations of precedence among tasks and the data management and transfer. Thus, we model the problem. We begin with a simplistic model and we consider the minimisation of the Cmax as the objective function. We solve the model with mathematical solver CPLEX and we compute a lower bound. We propose the heuristic "LocFirst" that aims to minimize the Cmax. In the third level, we consider a more realistic modelling of the scheduling problem. We aim to minimize the weighted sum of the following objectives : the weighted flow time ( ∑ wjCj) and the makespan (Cmax). We compute a lower bound and we propose two heuristics to resolve the problem. Ordonnancement et affectation Recherche opérationnelle Optimisation Big Data MapReduce Hadoop Cloud Consommation énergétique
475	Produtividade de café arábica estimada a partir de dados de modelos de circulação global / Valeriano, Taynara Tuany Borges. January 2017 (has links) Orientador: Glauco de Souza Rolim / Banca: Márcio José de Santana / Banca: Lucieta Guerreiro Martorano / Resumo: O Brasil é o maior produtor da segunda commodity de maior valor no mercado, o café. O conhecimento de técnicas eficazes de estimativa de produtividade é de grande importância para o mercado cafeeiro, possibilitando melhor planejamento e tornando a atividade mais sustentável. Uma forma eficaz de se estimar a produtividade é através da modelagem agrometeorológica, que quantifica a influência do clima nos cultivos agrícolas. Entretanto, dados climáticos são necessários, e estes em sua maioria são provenientes de estações de superfície. Uma forma de inovar essa técnica é a utilização de outra fonte de dados climáticos, como os dados em grid (GD) que são resultado da combinação de diversas fontes, como sondas oceânicas, sensoriamento remoto, entre outros. O presente trabalho teve como objetivo estimar a produtividade de Coffea arabica a partir do modelo proposto por Santos e Camargo (2006) (SC) com dados meteorológicos provenientes dos GD dos sistemas ECMWF e NASA para regiões cafeeiras de Minas Gerais e São Paulo. Em um primeiro momento foram comparados dados de temperatura do ar e precipitação obtidos pelo ECMWF e NASA aos dados de estações meteorológicas de superfície, com o objetivo de verificação da acurácia dos GDs. No segundo momento foi proposta uma calibração no modelo de estimativa de produtividade de Santos e Camargo (2006), para a utilização dos GDs. Para temperatura, os dados do ECMWF e NASA foram precisos e acurados, com valores mínimos de RMSE iguais a 0,37 e 0,50 °... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Brazil is the largest producer of the second most valuable commodity on the market, coffee. The knowledge of effective techniques of yield estimation is importance for the coffee market, enabling better planning and making the activity more sustainable. An effective way of yield estimating is through agrometeorological modeling, which quantifies the influence of climate on agricultural crops. However, weather data is needed, and these mostly come from surface stations. One way to innovate this technique is to use another source of climate data, such as Gridded data (GD), which are the result of a combination of diverse sources such as oceanic probes, remote sensing, among others. This work aimed to propose the use of the model of yield estimation proposed by Santos and Camargo (2006) with meteorological data from the GCMs, ECMWF and NASA for coffee regions of Minas Gerais and São Paulo. At first, temperature and precipitation data obtained through the ECMWF and NASA were compared and verified with data from surface meteorological stations. In the second moment, a calibration was proposed in Santos and Camargo (2006), for the use of GDs (ECMWF and NASA). For temperature, ECMWF and NASA data were accurate and accurate, with minimum RMSE values of 0.37 and 0.50 ° C, and Willmott's d values of 0.86 and 0.53, respectively. For precipitation, the minimum values of precision and accuracy were lower, with RMSE equal to 2.15 and 5.33 ° C, and d of Willmott equal to 0.79 and 0.59, respectively. When the GD data were applied in the SC model for coffee yield estimation the ECMWF was superior to NASA. The calibration of the SC model coefficients for ECMWF and NASA data was efficient, since there was a decrease in the mean absolute percentage error (MAPE), mean root mean square root (RMSEs) and root mean squ... (Complete abstract click electronic access below) / Mestre Produtividade agrícola. Café. Big data. Meteorologia agricola. Coffee.
476	The use of geospatial tools to support, monitor and evaluate post-disaster recovery Brown, Daniel January 2018 (has links) The aim of this research is to test the feasibility of using remote sensing-based information products and services to support the planning, monitoring and evaluation of recovery after disaster. The thesis begins by outlining the process of post-disaster recovery, what it entails and who is involved. The data and information needs at different stages of the disaster cycle are introduced and the importance of monitoring and evaluating post-disaster recovery is discussed. The literature review introduces the high-spatial-resolution remote sensing market and the technology focusing on current sensors’ capabilities. This is followed by a review of previous attempts to measure post-disaster recovery by practitioners and academics. At the end of the chapter a list of recovery indicators, suitable for remote sensing analysis, are presented and assessed through a user needs survey. In chapter 3, the six recovery categories and thirteen indicators identified in the literature review form a framework for the retrospective analysis of recovery in Thailand and Pakistan. A selection of results is presented to demonstrate the usefulness of remote sensing as a recovery monitoring tool. To assess its reliability, the results from the satellite image analysis are triangulated against narratives and datasets acquired on the ground. The next two chapters describe work done whilst providing real-time support to two humanitarian agencies operating in Port-au-Prince one-and-a-half years after the 2010 Haiti earthquake. Chapter 4 describes how geospatial tools were used to support a British Red Cross integrated reconstruction project for 500 households living in an informal settlement. The chapter describes how geospatial tools were used as a rapid assessment tool, and to support cadastral and enumeration mapping and the community participatory process. While previous chapters focus on the manual analysis of satellite imagery, chapter 5 reports how semi-automatic analyses of satellite imagery were used to support UN-Habitat by monitoring a planned camp and large-scale instances of spontaneous settlement. The conclusion to the thesis summarises the key lessons learnt from the retrospective analysis of recovery in Thailand and Pakistan and the real-time application in Haiti. Recommendations are then made on how to effectively use remote sensing in support of post-disaster recovery focussing on what to measure, when and how. Recognising that a mixed-method approach can best monitor recovery, recommendations are also made on how to integrate remote sensing with existing tools.
477	Everything Counts in Large Amounts : Protection of big data under the Database Directive Zeitlin, Martin January 2018 (has links) No description available. Big data Database Directive intellectual property Law (excluding Law and Society)
478	Análisis exploratorio del rol del CFO y el Big Data en Chile Alfaro Carrasco, Matías 08 1900 (has links) Seminario para optar al título de Ingeniero Comercial, Mención Administración / Este estudio busca reconocer las oportunidades y desafíos, que implica el Big Data, en la industria financiera y en el Rol de directores financieros (CFOs) y sus colaboradores en las empresas Chilenas. Los directivos de hoy se ajustan a un rol cambiante dentro de las empresas, dónde este nuevo rol puede ser apoyado por las herramientas del Big Data y una agenda corporativa que lo soporte. En este contexto, el rol del CFO debe ser estudiado desde una nueva perspectiva, vinculándolo a una función más comercial pero manteniendo la expertise técnica, todo bajo un creciente ecosistema de abundancia de información. Reconocer desafíos y oportunidades, tanto para la industria financiera, cómo para los CFOs, puede resultar clave en la consecución de un cambio de agenda corporativa que soporte al Big Data y el nuevo rol del CFO. El documento representa un punto de partida, hacia una “plataforma de descubrimiento” que posibilite mejoras en los modelos de riesgo, el desarrollo de propuestas de valor o modelos de negocio basados en datos y la disminución de los costos operativos, al permitir automatizar procesos, producto de la implementación de una agenda corporativa, que soporte el Big Data y permita a los Directivos, encontrar la forma de ejecutar este cambio. Algunos de los beneficios de la implementación de una estrategia Big Data para los CFOs, podrían ser; una toma de decisiones más veloz, con nuevos puntos de vista. Igualmente, proveer un mejor soporte, gestión y mitigación de los riesgos en la empresa. También, mejorar el modelo de negocios existente, seleccionando indicadores clave de la performance que se conecten con una ejecución efectiva de la estrategia. En los resultados exploratorios destacan la percepción positiva que otorgan los participantes, la mayoría de grandes empresas y con un título universitario y postgrado, al rol del Big Data en apoyar la labor del CFO, dando el punta pie inicial para cambios en la agenda corporativa de las empresas, cambios en su propuesta de valor y estrategia, para aprovechar al máximo el valor que el Big Data puede aportar a sus organizaciones. Financiamiento de empresas Negocios--Aspectos económicos Big Data Rol de directores financieros (CFO) Administración
479	Modélisation et apprentissage de dépendances á l’aide de copules dans les modéles probabilistes latents / Modeling and learning dependencies with copulas in latent topic models Amoualian, Hesam 12 December 2017 (has links) Ce travail de thése a pour objectif de s’intéresser à une classe de modèles hiérarchiques bayesiens, appelés topic models, servant à modéliser de grands corpus de documents et ceci en particulier dans le cas où ces documents arrivent séquentiellement. Pour cela, nous introduisons au Chapitre 3, trois nouveaux modèles prenant en compte les dépendances entre les thèmes relatifs à chaque document pour deux documents successifs. Le premier modèle s’avère être une généralisation directe du modèle LDA (Latent Dirichlet Allocation). On utilise une loi de Dirichlet pour prendre en compte l’influence sur un document des paramètres relatifs aux thèmes sous jacents du document précédent. Le deuxième modèle utilise les copules, outil générique servant à modéliser les dépendances entre variables aléatoires. La famille de copules utilisée est la famille des copules Archimédiens et plus précisément la famille des copules de Franck qui vérifient de bonnes propriétés (symétrie, associativité) et qui sont donc adaptés à la modélisation de variables échangeables. Enfin le dernier modèle est une extension non paramétrique du deuxième. On intègre cette fois ci lescopules dans la construction stick-breaking des Processus de Dirichlet Hiérarchique (HDP). Nos expériences numériques, réalisées sur cinq collections standard, mettent en évidence les performances de notre approche, par rapport aux approches existantes dans la littérature comme les dynamic topic models, le temporal LDA et les Evolving Hierarchical Processes, et ceci à la fois sur le plan de la perplexité et en terme de performances lorsqu’on cherche à détecter des thèmes similaires dans des flux de documents. Notre approche, comparée aux autres, se révèle être capable de modéliser un plus grand nombre de situations allant d’une dépendance forte entre les documents à une totale indépendance. Par ailleurs, l’hypothèse d’échangeabilité sous jacente à tous les topics models du type du LDA amène souvent à estimer des thèmes différents pour des mots relevant pourtant du même segment de phrase ce qui n’est pas cohérent. Dans le Chapitre 4, nous introduisons le copulaLDA (copLDA), qui généralise le LDA en intégrant la structure du texte dans le modèle of the text et de relaxer l’hypothèse d’indépendance conditionnelle. Pour cela, nous supposons que les groupes de mots dans un texte sont reliés thématiquement entre eux. Nous modélisons cette dépendance avec les copules. Nous montrons de manièreempirique l’efficacité du modèle copLDA pour effectuer à la fois des tâches de natureintrinsèque et extrinsèque sur différents corpus accessibles publiquement. Pour compléter le modèle précédent (copLDA), le chapitre 5 présente un modèle de type LDA qui génére des segments dont les thèmes sont cohérents à l’intérieur de chaque document en faisant de manière simultanée la segmentation des documents et l’affectation des thèmes à chaque mot. La cohérence entre les différents thèmes internes à chaque groupe de mots est assurée grâce aux copules qui relient les thèmes entre eux. De plus ce modèle s’appuie tout à la fois sur des distributions spécifiques pour les thèmes reliés à chaque document et à chaque groupe de mots, ceci permettant de capturer les différents degrés de granularité. Nous montrons que le modèle proposé généralise naturellement plusieurs modèles de type LDA qui ont été introduits pour des tâches similaires. Par ailleurs nos expériences, effectuées sur six bases de données différentes mettent en évidence les performances de notre modèle mesurée de différentes manières : à l’aide de la perplexité, de la Pointwise Mutual Information Normalisée, qui capture la cohérence entre les thèmes et la mesure Micro F1 measure utilisée en classification de texte. / This thesis focuses on scaling latent topic models for big data collections, especiallywhen document streams. Although the main goal of probabilistic modeling is to find word topics, an equally interesting objective is to examine topic evolutions and transitions. To accomplish this task, we propose in Chapter 3, three new models for modeling topic and word-topic dependencies between consecutive documents in document streams. The first model is a direct extension of Latent Dirichlet Allocation model (LDA) and makes use of a Dirichlet distribution to balance the influence of the LDA prior parameters with respect to topic and word-topic distributions of the previous document. The second extension makes use of copulas, which constitute a generic tool to model dependencies between random variables. We rely here on Archimedean copulas, and more precisely on Franck copula, as they are symmetric and associative and are thus appropriate for exchangeable random variables. Lastly, the third model is a non-parametric extension of the second one through the integration of copulas in the stick-breaking construction of Hierarchical Dirichlet Processes (HDP). Our experiments, conducted on five standard collections that have been used in several studies on topic modeling, show that our proposals outperform previous ones, as dynamic topic models, temporal LDA and the Evolving Hierarchical Processes,both in terms of perplexity and for tracking similar topics in document streams. Compared to previous proposals, our models have extra flexibility and can adapt to situations where there are no dependencies between the documents.On the other hand, the "Exchangeability" assumption in topic models like LDA oftenresults in inferring inconsistent topics for the words of text spans like noun-phrases, which are usually expected to be topically coherent. In Chapter 4, we propose copulaLDA (copLDA), that extends LDA by integrating part of the text structure to the model and relaxes the conditional independence assumption between the word-specific latent topics given the per-document topic distributions. To this end, we assume that the words of text spans like noun-phrases are topically bound and we model this dependence with copulas. We demonstrate empirically the effectiveness of copLDA on both intrinsic and extrinsic evaluation tasks on several publicly available corpora. To complete the previous model (copLDA), Chapter 5 presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine-grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification. Latente L'échelle des modèles Collections de données et les flux Scaling Latent Topic/Class Models Big Data 004
480	Signal Detection of Adverse Drug Reaction using the Adverse Event Reporting System: Literature Review and Novel Methods Pham, Minh H. 29 March 2018 (has links) One of the objectives of the U.S. Food and Drug Administration is to protect the public health through post-marketing drug safety surveillance, also known as Pharmacovigilance. An inexpensive and efficient method to inspect post-marketing drug safety is to use data mining algorithms on electronic health records to discover associations between drugs and adverse events. The purpose of this study is two-fold. First, we review the methods and algorithms proposed in the literature for identifying association drug interactions to an adverse event and discuss their advantages and drawbacks. Second, we attempt to adapt some novel methods that have been used in comparable problems such as the genome-wide association studies and the market-basket problems. Most of the common methods in the drug-adverse event problem have univariate structure and thus are vulnerable to give false positive when certain drugs are usually co-prescribed. Therefore, we will study applicability of multivariate methods in the literature such as Logistic Regression and Regression-adjusted Gamma-Poisson Shrinkage Model for the association studies. We also adopted Random Forest and Monte Carlo Logic Regression from the genome-wide association study to our problem because of their ability to detect inherent interactions. We have built a computer program for the Regression-adjusted Gamma Poisson Shrinkage model, which was proposed by DuMouchel in 2013 but has not been made available in any public software package. A comparison study between popular methods and the proposed new methods is presented in this study. Association Study Big Data Pharmacovigilance Data Mining Statistical Algorithms Bioinformatics Medicinal Chemistry and Pharmaceutics Statistics and Probability

Search results