Global ETD Search

491	Análise de surtos de doenças transmitidas pelo mosquito Aedes aegypti utilizando Big-Data Analytics e mensagens do Twitter Carlos, Marcelo Aparecido January 2017 (has links) Orientador: Prof. Dr. Filipe Ieda Fazanaro / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Engenharia da Informação, 2017. / O uso do big-data aliado a técnicas de mineração de textos vem crescendo a cada ano em diversas áreas da ciência, especialmente na área da saúde, na medicina de precisão, em prontuários eletrônicos entre outros. A motivação desse trabalho parte da hipótese de que é possível usar conceitos de big-data para analisar grandes quantidades de dados sobre as doenças da dengue, chikungunya e zika vírus, para monitorar e antecipar informações sobre possíveis surtos dessas doenças. Entretanto, a análise de grandes volumes de dados - inerente ao estudo em big-data - possui desafios, particularmente, devido à falta de escalabilidade dos algoritmos e à complexidade do gerenciamento dos mais diferentes tipos e estruturas dos dados envolvidos. O principal objetivo desse trabalho é apresentar uma implementação de técnicas de mineração de textos, em especial, aqueles oriundos de redes sociais, tais como o Twitter, aliadas à abordagem de análises em big-data e aprendizado de máquina, para monitorar a incidência das doenças da dengue, chikungunya e zika vírus, todas transmitidas pelo mosquito Aedes aegypti. Os resultados obtidos indicam que a implementação realizada, baseado na junção dos algoritmos de aprendizado de máquina K-Means e SVM, teve rendimento satisfatório para a amostra utilizada em comparação aos registros do Ministério da Saúde, indicando, assim, um potencial para utilização do seu propósito. Observa-se que a principal vantagem das análises em big-data está relacionada à possibilidade de empregar dados não estruturados os quais são obtidos em redes sociais, sites de e-commerce, dentre outros. Nesse sentido, dados que antes pareciam, de certo modo, de pouca importância, passam a ter grande potencial e relevância. / The use of the big-data allied to techniques of text mining has been growing every year in several areas of science, especially in the area of health, precision medicine, electronic medical records among others. The motivation from this work, is based on the hypothesis that it is possible to use big-data concepts to analyze large volumes of data about the dengue disease, chikungunya and zika virus, to monitor and anticipate information about possible outbreaks for diseases. However, the analysis of large volumes of data - inherent in the big-data study - has some challenges, particularly due to the lack of scalability of the algorithms and the complexity of managing the most different types and structures of the data involved. The main objective of this work is to present the implementation of text mining techniques - especially from social networks such as Twitter - allies to the approach of big-data and machine-learned analyzes to monitor the incidence of Dengue, Chikungunya and Zika virus, all transmissions by the mosquito Aedes aegypti. The results obtained indicate that the implementation made based on the combination of machine learning algorithms, K-Means and SVM, got a satisfactory yield for a sample used, if compared the publications of the records of the Ministry of Health, thus indicating a potential for the purpose. It is observed that a main advantage of the big-data analyzes is related to the possibility of employing unstructured data, e-commerce sites, among others. In this sense, data that once seemed, in a way, of little importance, have great potential and relevance. BIG-DATA CHIKUNGUNYA DENGUE ZIKA
492	SQLToKeyNoSQL Schreiner, Geomar André January 2016 (has links) Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2016. / Made available in DSpace on 2016-09-20T04:42:01Z (GMT). No. of bitstreams: 1 339451.pdf: 2281831 bytes, checksum: 129da9c13a181f4ca28c9822b6e994ca (MD5) Previous issue date: 2016 / Diversas aplicações atualmente produzem e manipulam um grande volume de dados, denominados Big Data. Bancos de dados tradicionais, em particular, os Bancos de Dados Relacionais (BDRs), não são adequados ao gerenciamento de Big Data. Devido a isso, novos modelos de dados têm sido propostos para manipular grandes massas de dados, enfatizando a escalabilidade e a disponibilidade. A maioria destes modelos de dados pertence a uma nova categoria de gerenciadores de dados denominados BDs NoSQL. Entretanto, BDs NoSQL não são compatíveis, em geral, com o padrão SQL e desenvolvedores que utilizam BDRs necessitam aprender novos modelos de dados e interfaces de acesso para produzirem aplicações baseadas em Big Data. Para lidar com esta problemática, abordagens têm sido propostas para o suporte da interoperabilidade entre BDRs e BDs NoSQL. Poucas destas abordagens tem a habilidade de suportar mais que um BD alvo, sendo a maioria restrita a um BD NoSQL. Neste contexto, este trabalho propõe uma abordagem para acesso via SQL para dados armazenados em um SGBD NoSQL baseado em Chave de acesso (chave-valor, orientado a documentos ou orientado a colunas). Para isso, é proposto um modelo canônico hierárquico intermediário para o qual é traduzido o modelo Relacional. Este modelo hierárquico pode ser traduzido para modelos de dados NoSQL orientado a colunas, orientado a documentos ou chave-valor. A tradução das instruções SQL é feita para um conjunto intermediário de métodos baseado na API REST, que são traduzidos para a linguagem de acesso dos BDs NoSQL. Além disso, a abordagem possibilita o processamento de junções que não são suportadas pelos BDs NoSQL. Experimentos demostram que a solução proposta é promissora, possuindo um overhead não proibitivo e sendo competitiva com ferramentas existentes.<br> / Abstract : A lot of applications produce and manipulate today a large volume of data, the so-called Big Data. Traditional databases, like relational databases (RDB), are not suitable to Big Data management. In order to deal with this problem, a new category of DB has been proposed, been most of them called NoSQL DB. NoSQL DB have different data models, as well as different access methods which are not usually compatible with the RDB SQL standard. In this context, approaches have been proposed for providing mapping of RDB schemata and operations to equivalent ones in NoSQL DB to deal with large relational data sets in the cloud, focusing on scalability and availability. However, these approaches map relational DB only to a single NoSQL data model and, sometimes, to a specific NoSQL DB product. This work presents SQLToKeyNoSQL, a layer able to translate, in a transparent way, RDB schemata as well as SQL instructions to equivalent schemata and access methods for key-oriented NoSQL DB, i.e., databases based on document-oriented, key-value and column-oriented data models. We propose a hierarchical data model that abstracts the key-oriented NoSQL data models, and use it as an intermediate data model for mapping the relational data model to these NoSQL data models. Besides, we propose the translation of a subset of SQL instructions to an intermediate set of access methods based on the REST API, which are further translated, in a simple way, to the access methods of the key-oriented NoSQL DB. Our solution also supports join queries, which is not a NoSQL DB capability. An experimental evaluation demonstrates that our approach is promising, since the introduced overhead with our layer is not prohibitive. Informática Computação Big data Banco de dados relacionais Computação em nuvem
493	Decoding the regulatory role and epiclonal dynamics of DNA methylation in 1482 breast tumours Batra, Rajbir Nath January 2018 (has links) Breast cancer is a clinically and molecularly heterogeneous disease displaying distinct therapeutic responses. Although recent studies have explored the genomic and transcriptomic landscapes of breast cancer, the epigenetic architecture has received less attention. To address this, an optimised Reduced Representation Bisulfite Sequencing protocol was performed on 1482 primary breast tumours (and 237 matched adjacent normal tissues). This constitutes the largest breast cancer methylome yet, and this thesis describes the bioinformatics and statistical analysis of this study. Noticeable epigenetic drift (both gain and loss of homogeneous DNA methylation patterns) was observed in breast tumours when compared to normal tissues, with markedly higher differences in late replicating genomic regions. The extent of epigenetic drift was also found to be highly heterogeneous between the breast tumours and was sharply correlated with the tumour’s mitotic index, indicating that epigenetic drift is largely a consequence of the accumulation of passive cell division related errors. A novel algorithm called DMARC (Directed Methylation Altered Regions in Cancer) was developed that utilised the tumour-specific drift rates to discriminate between methylation alterations attained as a consequence of stochastic cell division errors (background) and those reflecting a more instructive biological process (directed). Directed methylation alterations were significantly enriched for gene expression changes in breast cancer, compared to background alterations. Characterising these methylation aberrations with gene expression led to the identification of breast cancer subtype-specific epigenetic genes with consequences on transcription and prognosis. Cancer genes may be deregulated by multiple mechanisms. By integrating with existing copy number and gene expression profiles for these tumours, DNA methylation alterations were revealed as the predominant mechanism correlated with differentially expressed genes in breast cancer. The crucial role of DNA methylation as a mechanism to target the silencing of specific genes within copy number amplifications is also explored which led to the identification of a putative tumour suppressor gene, THSZ2. Finally, the first genome-wide assessment of epigenomic evolution in breast cancer is conducted. Both, the level of intratumoural heterogeneity, and the extent of epiallelic burden were found to be prognostic, and revealed an extraordinary distinction in the role of epiclonal dynamics in different breast cancer subtypes. Collectively, the results presented in this thesis have shed light on the somatic DNA methylation basis of inter-patient as well as intra-tumour heterogeneity in breast cancer. This complements our genetic knowledge of the disease, and will help move us towards tailoring treatments to the patient's molecular profile.
494	A behavioural data approach towards predicting direct real estate markets in the United Kingdom Stevens, Donald Garth January 2018 (has links) In recent years, modern prediction models have evolved to include behavioural data such as user-generated search query data that capture market sentiment and reach beyond the grasp of established macroeconomic indicators. These applications had considerable success in predicting a wide range of economic phenomena with the assumption that internet interaction behaviour resembles probable offline behaviour. Despite the considerable success of this approach, the existing literature argues for the continuous validation of search query keywords and its probable meaning over time to avoid spurious and biased results. Although recent literature attempted to bridge the keyword validation gap, this line of research is still in its infancy. This thesis sets out to examine the validity of web search intention to serve as a “pure” demand proxy for direct real estate market prediction in the United Kingdom. More specifically, it is directed towards constructing web search indices to explore: (i) the extent to which an individual’s true real estate orientated intentions manifest themselves in their web search behaviour and (ii) the magnitude to which real-time information adds value towards the prediction of illiquid asset classes. In doing so, a conceptual framework is produced, which outlines the logic and importance associated with intention specific web search in the digital age, as well as its relation to real estate demand. The empirical findings suggest that intention specific keyword development might be of little importance for aggregate housing and office market forecasts in the United Kingdom. On the contrary, it seems that the viability of intention specific web search keyword development increases when it is directed at a specific regional market. The overall thesis narrative introduces a new way of thinking about web search in the context of economic demand and draws from a variety of principles and methodologies to establish an avenue from which future research can be conducted.
495	Diretrizes para uma política de gestão de dados científicos no Brasil Costa, Maíra Murrieta 18 August 2017 (has links) Tese (doutorado)—Universidade de Brasília, Faculdade de Ciência da Informação, Programa de Pós-Graduação em Ciência da Informação, 2017. / Submitted by Raquel Viana (raquelviana@bce.unb.br) on 2017-10-26T18:24:33Z No. of bitstreams: 1 2017_MaíraMurrietaCosta.pdf: 8735993 bytes, checksum: 7748ef1c7927b5936492b8c705eeb7bb (MD5) / Approved for entry into archive by Raquel Viana (raquelviana@bce.unb.br) on 2017-10-26T18:25:33Z (GMT) No. of bitstreams: 1 2017_MaíraMurrietaCosta.pdf: 8735993 bytes, checksum: 7748ef1c7927b5936492b8c705eeb7bb (MD5) / Made available in DSpace on 2017-10-26T18:25:33Z (GMT). No. of bitstreams: 1 2017_MaíraMurrietaCosta.pdf: 8735993 bytes, checksum: 7748ef1c7927b5936492b8c705eeb7bb (MD5) Previous issue date: 2017-10-26 / Contextualiza a organização social da ciência contemporânea. Discute aspectos sobre a ciência colaborativa do Século XXI, a internacionalização e a virtualização da ciência que culminaram com a explosão de dados científicos coletados on-line, dando origem ao fenômeno de big data e e-science. Discute o surgimento dos termos guarda-chuva e-science e cyberinfrastructure. Contextualiza a literatura sobre dados de pesquisa/ dados científicos e argumenta sobre a necessidade da estruturação de políticas públicas que norteiem a gestão dos dados científicos oriundos da e-science, visto que, do ponto da gestão da informação, faz-se necessário apontar soluções para um tratamento adequado dos dados científicos de forma a viabilizar o processo de armazenamento, organização, busca, recuperação e difusão dos dados coletados. Nesse aspecto, relembra que a preocupação da informação científica está na origem da ciência da informação, logo discute o papel do profissional da informação no tratamento dos dados oriundos da e-science. Também são contextualizados na revisão de literatura a política brasileira de ciência e tecnologia, bem como a política de informação. O estudo apresenta como objetivo geral – é elaborar um esboço de diretrizes governamentais para a gestão de dados científicos no Brasil. Para tanto, desenharam-se os seguintes objetivos específicos: OE 1) Identificar os países desenvolvidos que possuem ações de governo para a gestão de dados científicos, OE 2) Analisar as ações de governo de países desenvolvidos sobre a gestão de dados científicos nos países identificados, OE 3) Identificar os principais problemas e as soluções inerentes à construção de uma política estruturada para a gestão de dados científicos, OE 4) Identificar a postura das agências de fomento no Brasil com relação ao tema, OE 5) Identificar o posicionamento dos pesquisadores brasileiros envolvidos com o tema. Discursa sobre a primeira fase da pesquisa, marcada pela necessidade de nortear a busca bibliográfica de forma a identificar as políticas nacionais de gestão de dados científicos nos países mais avançados em e-science. Informa que para tanto foi realizada uma pesquisa descritiva e de levantamento (survey), que utilizou a bibliometria, um método quantitativo baseado em análises estatísticas, para análise de dados. O estudo analisou o termo e-science nas bases de dados Library Information Science Abstracts e Library Information, Science Technology Abstracts. Discorre sobre a metodologia da pesquisa, classificando-a como exploratória, com características quantitativas e qualitativas na coleta de dados, portanto uma pesquisa mista. Informa que a amostra da pesquisa é não probabilística, formada pelo critério de intencionalidade. Foram entrevistados 40 pesquisadores doutores envolvidos com a gestão de dados científicos no Brasil. Além deles, foi aplicado questionário a 22 servidores de agências de fomento e fundações de amparo à pesquisa no Brasil. No aspecto qualitativo da análise dos dados, tece considerações sobre a abordagem de investigação utilizada – a Grounded Theory. O estudo argumenta que o Brasil carece de uma política explícita que norteie as ações do Estado em termos de gestão e preservação dos dados científicos, bem como diretrizes para reutilização dos dados em questão. Em termos de iniciativas nacionais, tem-se apenas a referente a informação geoespacial já estabelecida em Decreto 6.666 de 2008. Além desta, a outra iniciativa de destaque é a informação sobre biodiversidade trabalhada no Portal Brasileiro da Biodiversidade. Também apresenta o edital da FAPESP em e-science como iniciativa relevante na área. Apresenta um framework com itens considerados de extrema relevância para a elaboração de um conjunto de diretrizes que venham a servir de elementos norteadores para a elaboração de uma política para a gestão de dados científicos no Brasil. Conclui que uma política de gestão de dados precisa abordar aspectos tais como: regras de compartilhamento e reuso dos dados, prazo de carência para algumas categorias de dados, prazo de armazenamento para algumas classes de dados, padrões de metadados e interoperabilidade destes. Além disso, deve exigir do pesquisador um plano de gestão de dados quando a pesquisa for fomentada pelo governo, bem como definir os requisitos necessários para a implementação do DOI para dados a exemplo das questões relacionadas no framework. / It contextualizes the social organization of contemporary science. It discusses aspects of the collaborative science in the 21st century, the internationalization and virtualization of science that culminated in the explosion of scientific data online collected, giving rise to the phenomenon of big data and e-science. It discusses the emergence of the terms umbrella e-science and cyberinfrastructure. It contextualizes the literature on research data / scientific data and argues about the need for structuring public policies to guide the management of scientific data from e-science, as, from the point of information management, it is necessary to point out solutions for an adequate treatment of scientific data, to enable the process of storing, organizing, search, retrieving and dissemination of collected data on a large scale. In this aspect, it recalls that the concern of scientific information is at the origin of information science, thus, it discusses the role of the information professional in the treatment of data from e-science. The Brazilian science and technology policy, as well as the information policy are also contextualized in the literature review. The study presents, as general objective – Elaborate a draft of guidelines for the management of scientific data in Brazil. To do so, the following specific objectives were designed: OE 1) Identify the developed countries that have government actions for the management of scientific data; OE 2) Analyze government actions of developed countries on the management of scientific data in the identified countries; OE 3) Identify the main problems and the solutions inherent to the construction of a structured policy for the management of scientific data; OE 4) Identify the position of funding agencies in Brazil concerning this subject; OE 5) Identify the opinion of Brazilian researchers involved with the subject and. The thesis discusses the first phase of the research; characterized by the need to guide the bibliographic search to identify the national policies of scientific data management in countries that e-science is more advanced. It reports that a descriptive and survey research was carried out, using bibliometrics, a quantitative method based on statistical analysis, for data analysis. The study analyzed the term e-science in the following database: Library Information and Science Abstracts e Library Information, Science Technology Abstracts. It discusses the methodology of the research, classifying it as exploratory, with quantitative and qualitative characteristics in the data collection, as a result, a mixed research. It informs that the research sample is non-probabilistic, formed by the criterion of intentionality. Forty PhD researchers involved in the management of scientific data in Brazil were interviewed. Additionally, twenty-two federal employees working in the funding agencies answered a questionnaire. In the qualitative aspect of the data analysis, it makes considerations on the research approach used – a Grounded Theory. The study argues that Brazil lacks an explicit policy to guide State actions in terms of management and preservation of scientific data, as well as guidelines for reusing the data. In terms of national initiatives, there is only reference to geospatial information already established in 2008 by the Decree n. 6.666. In addition to this, the other outstanding initiative is the information on biodiversity worked on the Brazilian Biodiversity Portal. It also presents the FAPESP e-science call for proposals as a relevant initiative in this field. It come up with a framework of items considered to be of extremely relevance for the elaboration of a set of guidelines that will serve as guiding elements for the elaboration of a policy for the management of scientific data in Brazil. I concluded that a data management policy needs to address aspects such as rule of data sharing and its reuse, grace period for some data categories, shelf life for some classes of data, metadata standards, and interoperability for them. Moreover, a data management plan must me demanded from the researcher when the research is fostered by the government, as well as to define the necessary requirements for implementation of DOI for data, such as the issues related to the framework. / Met en perspective l'organisation sociale de la science contemporaine. Aborde les aspects sur la science de collaborative du XXIe siècle, l'internationalisation et la virtualisation de la science qui abutirent à l'explosion de données scientifiques ramassées en ligne, ce qui a donné lieu au phénomène de big-data et e-science. Débat sur l'apparition des termes génériques e-science et cyberinfrastructure. Situe la littérature sur les données de recherche/données scientifiques et soutien le besoin d'une structuration des politiques publiques pour guider la gestion des données scientifiques résultantes du e-science, vue que, pour la gestion de l'information, il faut proposer des solutions pour un traitement propre des données scientifiques pour le processus de stockage, organisation, recherche, récuperation et dissémination des données ramassées, en grande échelle. Sur ce point, rappele que le soucis sur l'information scientifique est à l'origine de la science de l'information, donc aborde le rôle du professionnel de l'information pour traiter les données résultantes du e-science. La politique brésilienne pour la science et technologia et celle pour l'information sont aussi mises en perspective dans la revision de la littérature. L'étude présente pour objectif général – concevoir un ébauche de directives pour la gestion des données scientifiques au Brésil. Pour l'atteindre, les objectifs spécifiques (OS) suivants ont été conçus: OS1) Identifier les pays développés qui ont de politiques pour la gestion des données scientifiques, OS2) Examiner politiques pour la gestion des données scientifiques des pays développés, OS3) Identifier les principaux problèmes et les solutions liés à la construcion d'un politique structurée pour la gestion des données scientifiques, OS4) Identifier l'attitude des agences de promotion du Brésil sur la tématique, OS5) Identifier le positionnement de chercheur brésiliens intéressés par la tématique et. Aborde la première phase des travaux, marqué par le besoin de guider la recherche bibliographique pour identifier les politiques nationales de gestion de données scientifiques dans les pays les plus avancés en e-science. Rapporte que pour cela, una recherche descriptive et d'enquête (survey) a été ménée, que la bibliométrie, une méthode quantitative basée sur des analyses statistiques, a été utilisée pour analyser les données. L'étude a examiné le terme e-science dans les bases de données de la Library Information Science Abstracts et Library Information, Science Technology Abstracts. Parle de la méthodologie de la récherche et la classe comme exploratoire, avec des aspects quantitatifs e qualitatifs pour le ramassage des données, par conséquent une recherche mixte. Fait savoir que l'échantillonage de la recherche est non-probabilistique, mais crée par le critère d'intention.40 chercheurs docteurs impliqué par la gestion des données scientifiques au Brésil ont été interviewé. En outre, un questionnaire a été employé a 22 fonctionnaires d'agences de promotion et fondations d'appui à la recherche du Brésil. Pour l'aspect qualitatif de l'analyse des données, expose des considérations sur l'approche de l'investigation utilisée - la Groudend Theoty. L'étude argumente que le Brésil est dépourvu d'une politique claire pour guider les actions de l'État pour la gestion e conservation des données scientifiques, ainsi que des directives pour la réutilisation de ces données. Quant aux initiatives nationales, il y a seulemente celle sur l'information géospatial établi par le Décret 6.666 de 2008. S'ajoute a cette initiative celle de l'information sur la biodiversité définie au Portal Brasileiro da Biodiversidade. L'avis public de la FAPESP sur e-science est aussi utilisée comme initiative importante dans le domaine. Présente un framework avec des élements considérés d'extrème importancepour l'élaboration d'un ensemble de directives que puissent guider l'élaboration d'une politique pour la gestion de données scientifiques au Brésil. Conclut q'il faut qu'une politique de gestion de données considère certains aspects comme: des normatifs de partage et réutilisation de données, un délai de carence pour quelques catégories de données, une périuode de stockage pour quelques rangs de données, des standards de métadonnées et son interopérabilité. En plus, il faut démander au chercheur un plan de gestion des données pour les recherces promue par le gouvernement, ainsi que la définition de exigences pour le déploiment du DOI pour les données, à l'exemple des questions sur le framework. Dados de pesquisa Big data Biblioteca digital Dados científicos Dados científicos - gestão Ciência colaborativa
496	Proposição de um modelo e sistema de gerenciamento de dados distribuídos para internet das coisas – GDDIoT Cruz Huacarpuma, Ruben 27 July 2017 (has links) Tese (doutorado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2017. / Submitted by Raquel Almeida (raquel.df13@gmail.com) on 2017-11-30T18:11:49Z No. of bitstreams: 1 2017_RubenCruzHuacarpuma.pdf: 2899560 bytes, checksum: 365f64d81ab752f9b8e0d9d37bc5e549 (MD5) / Approved for entry into archive by Raquel Viana (raquelviana@bce.unb.br) on 2018-02-05T18:39:00Z (GMT) No. of bitstreams: 1 2017_RubenCruzHuacarpuma.pdf: 2899560 bytes, checksum: 365f64d81ab752f9b8e0d9d37bc5e549 (MD5) / Made available in DSpace on 2018-02-05T18:39:00Z (GMT). No. of bitstreams: 1 2017_RubenCruzHuacarpuma.pdf: 2899560 bytes, checksum: 365f64d81ab752f9b8e0d9d37bc5e549 (MD5) Previous issue date: 2018-01-05 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES). / O desenvolvimento da Internet das Coisas (IoT) levou a um aumento do número e da variedade de dispositivos conectados à Internet. Dispositivos tais como sensores tornaram-se uma parte regular do nosso ambiente, instalados em carros e edifícios, bem como telefones inteligentes e outros dispositivos que coletam continuamente dados sobre nossas vidas, mesmo sem a nossa intervenção. Com tais objetos conectados, uma gama de aplicações tem sido desenvolvida e implantada, incluindo aquelas que lidam com grandes volumes de dados. Nesta tese, apresenta-se uma proposta e implementação de um modelo para o gerenciamento de dados em um ambiente de IoT. Este modelo contribui com a especificação das funcionalidades e a concepção de técnicas para coletar, filtrar, armazenar e visualizar os dados de forma eficiente. Uma característica importante deste trabalho é capacidade de integrar diferentes middlewares IoT. A implementação deste trabalho foi avaliada através de diferentes estudos de casos sobre cenário de sistemas inteligentes: Sistema de Casas Inteligentes, Sistema de Transporte Inteligente e a comparação do GDDIoT com middleware IoT. / The development of the Internet of Things (IoT) has led to a considerable increase in the number and variety of devices connected to the Internet. Smart objects such as sensors have become a regular part of our environment, installed in cars and buildings, as well as smart phones and other devices that continuously collect data about our lives even without our intervention. With such connected smart objects, a broad range of applications has been developed and deployed, including those dealing with massive volumes of data. In this thesis, it is proposed a data management approach and implementation for an IoT environment, thus contributing with the specification of functionalities and the conception of techniques for collecting, filtering, storing and visualization data conveniently and efficiently. An important characteristics of this work is to enable multiple and distinct middleware IoT to work together in a non-intrusive manner. The corresponding implementation of this work was evaluated through different case studies regarding a smart system scenarios: Smart Home System, Smart Transportation System and comparison between GDDIoT and an IoT middleware. Sistemas distribuídos Internet das coisas Gerenciamento de dados Big data NoSQL (Not Only SQL)
497	Reklam på sociala medier – personlig eller integritetskränkande? : En studie om användares syn på individ anpassad reklam / Advertisement on social media – personal or an invasion of privacy? : A study on the users view on individualized advertising Stagnebo, Pernilla January 2018 (has links) Många företag fokuserar sin reklam på sociala medier idag då de kan, utifrån användarnas beteende, utforma reklam som passar varje individ. Facebook har 2.2 miljarder aktiva användare världen över och är därmed en tacksam plats för företag att annonsera på. Som användare på webbsidan accepterar du att Facebook har rätt att samla in data om dig och ditt beteende på deras webbsida och på andra webbsidor som har en koppling till Facebook. Fördelen är att du får intressant reklam i ditt Facebook-flöde men det kan även kännas som att ”storebror ser dig”. Syftet med denna studie var att beskriva hur bekväma användare av sociala medier är med att deras data samlas in så att företag ska kunna göra individanpassad reklam i deras flöden. Ett frågeformulär skickades ut till 100 respondenter där åtta olika scenarion nämndes om insamling av data. Respondenterna fick besvara hur bekväma det var med varje scenario och resultatet var att majoriteten var obekväm med alla scenarion. Avlyssning av mobila enheter var de mest obekväma med samt insamling av data för politisk reklam. Några respondenter ansåg även att det var för mycket reklam på Facebook. Majoriteten av respondenterna svarade dock nej på frågan om de kunde tänka sig att betala för Facebook för att slippa reklamen. Slutligen togs det upp att respondenterna inte hade koll på vilket typ av data som Facebook samlar in om dem. Svaret står i användaravtalet som de en gång i tiden har accepterat men förmodligen inte läst. En diskussion om förenkling av användaravtal fördes där användaren i korthet skulle se de viktigaste bitarna utifrån deras synsätt innan de accepterade avtalet. / Today many companies focus their advertising on social media. Based on the users behaviour they can design individual advertisement for the user. There are 2.2 billion active users on Facebook world wide which makes it a gratefull place for advertisement. Facebook have the right to collect data about you and your behavior on the website and on other websites connecting with Facebook. As a Facebook user you have accepted these terms of service. You get more interesting advertisement in your Facebook-feed but it might also make you feel that ”big brother is watching you”. The purpose of this study is to describe how comfortable users in social media are with organisations collecting data to make individualized advertisement. A questionnaire was sent to 100 Facebook users including eight different scenarios on data collecting. The respondents answered how comfortable they were with each scenario. Results showed that the majority of the respondents where uncomfortable with all scenarios. The most uncomfortable scenario would be if a cellphone eavesdropped into a private conversation and then showed an ad about the product the people were talking about. Another sensitive subject was collecting data for individual advertisement from political parties. Some respondents also thought that there were too much advertisement on Facebook. However, the majority of the respondents said no to the idea of paying for Facebook in order to avoid advertisement. Finally, it was argued that the respondents wasn’t sure on what type of data Facebook were collecting. The answer can be found in the terms of service. A discussion about a simplification on user agreements were made. Sociala medier Datainsamling Reklam Facebook Big data Individanpassad reklam Information Systems
498	Distributed SPARQL over Big RDF Data - A Comparative Analysis using Presto and MapReduce January 2014 (has links) abstract: The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example. This thesis deals with evaluating the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done on four and eight node Linux clusters installed on Microsoft Windows Azure platform with RDF datasets of size 10, 20, and 30 million triples. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data. / Dissertation/Thesis / Masters Thesis Computing Studies 2014 Computer science Big Data Facebook Presto Hadoop MapReduce RDF Semantic Web
499	Evaluation of Storage Systems for Big Data Analytics January 2017 (has links) abstract: Recent trends in big data storage systems show a shift from disk centric models to memory centric models. The primary challenges faced by these systems are speed, scalability, and fault tolerance. It is interesting to investigate the performance of these two models with respect to some big data applications. This thesis studies the performance of Ceph (a disk centric model) and Alluxio (a memory centric model) and evaluates whether a hybrid model provides any performance benefits with respect to big data applications. To this end, an application TechTalk is created that uses Ceph to store data and Alluxio to perform data analytics. The functionalities of the application include offline lecture storage, live recording of classes, content analysis and reference generation. The knowledge base of videos is constructed by analyzing the offline data using machine learning techniques. This training dataset provides knowledge to construct the index of an online stream. The indexed metadata enables the students to search, view and access the relevant content. The performance of the application is benchmarked in different use cases to demonstrate the benefits of the hybrid model. / Dissertation/Thesis / Masters Thesis Computer Science 2017 Computer science Alluxio Big Data Analytics Ceph Disk Centric Hybrid Memory Centric
500	Big Data Database: : Loopholes regarding Ownership and Access to Data Shaba, Nusrat Jahan Shaba January 2018 (has links) Big Data is an interesting, developing and to some extent, vague area in respect of law. The actual value of Big Data is in its flow, not its sources. There are different options discussed which are considered as the tool to dictate ownership for Big Data, like, Copyright, Trade Secrets, Patent, Database Protection etc. However, there are also some ideas to come up with a new type of intellectual property right to deal with this. Among other available intellectual property rights, database, apparently, provides the most obvious protection for Big Data. In addition to it, laws regarding Big Data needs to be in conformity with privacy law, competition law, contract law etc. The research primarily concerns with big data database, and to identify the impact of big data, it includes some aspects of business practice. From a broader perspective, the research analyses the scope of third parties’ rights to match with the financial aspects of big data database. This research aims to identify how to balance different interests in using big data. There is no point to deny the need to control big data and simultaneously, privacy should be respected as well. It is therefore important who can access to these data and how far their right to access can be stretched. This access right extended to third parties is valuable as it is a must to ensure free flow of data which is a prerequisite for building the new data economy. In regard to methodology, the thesis is based on analytical approach where existing sources are being explained in the context of recent scenario. Big data Intellectual Property Law Database IoT Access to data etc. Law and Society Juridik och samhälle

Search results