Global ETD Search

501	Distributed SPARQL over Big RDF Data - A Comparative Analysis using Presto and MapReduce January 2014 (has links) abstract: The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example. This thesis deals with evaluating the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done on four and eight node Linux clusters installed on Microsoft Windows Azure platform with RDF datasets of size 10, 20, and 30 million triples. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data. / Dissertation/Thesis / Masters Thesis Computing Studies 2014 Computer science Big Data Facebook Presto Hadoop MapReduce RDF Semantic Web
502	Evaluation of Storage Systems for Big Data Analytics January 2017 (has links) abstract: Recent trends in big data storage systems show a shift from disk centric models to memory centric models. The primary challenges faced by these systems are speed, scalability, and fault tolerance. It is interesting to investigate the performance of these two models with respect to some big data applications. This thesis studies the performance of Ceph (a disk centric model) and Alluxio (a memory centric model) and evaluates whether a hybrid model provides any performance benefits with respect to big data applications. To this end, an application TechTalk is created that uses Ceph to store data and Alluxio to perform data analytics. The functionalities of the application include offline lecture storage, live recording of classes, content analysis and reference generation. The knowledge base of videos is constructed by analyzing the offline data using machine learning techniques. This training dataset provides knowledge to construct the index of an online stream. The indexed metadata enables the students to search, view and access the relevant content. The performance of the application is benchmarked in different use cases to demonstrate the benefits of the hybrid model. / Dissertation/Thesis / Masters Thesis Computer Science 2017 Computer science Alluxio Big Data Analytics Ceph Disk Centric Hybrid Memory Centric
503	Big Data Database: : Loopholes regarding Ownership and Access to Data Shaba, Nusrat Jahan Shaba January 2018 (has links) Big Data is an interesting, developing and to some extent, vague area in respect of law. The actual value of Big Data is in its flow, not its sources. There are different options discussed which are considered as the tool to dictate ownership for Big Data, like, Copyright, Trade Secrets, Patent, Database Protection etc. However, there are also some ideas to come up with a new type of intellectual property right to deal with this. Among other available intellectual property rights, database, apparently, provides the most obvious protection for Big Data. In addition to it, laws regarding Big Data needs to be in conformity with privacy law, competition law, contract law etc. The research primarily concerns with big data database, and to identify the impact of big data, it includes some aspects of business practice. From a broader perspective, the research analyses the scope of third parties’ rights to match with the financial aspects of big data database. This research aims to identify how to balance different interests in using big data. There is no point to deny the need to control big data and simultaneously, privacy should be respected as well. It is therefore important who can access to these data and how far their right to access can be stretched. This access right extended to third parties is valuable as it is a must to ensure free flow of data which is a prerequisite for building the new data economy. In regard to methodology, the thesis is based on analytical approach where existing sources are being explained in the context of recent scenario. Big data Intellectual Property Law Database IoT Access to data etc. Law and Society Juridik och samhälle
504	Industry 4.0: An Opportunity or a Threat? : A Qualitative Study Among Manufacturing Companies Venema, Sven, Anger Bergström, Albin January 2018 (has links) Manufacturing companies are currently going through exciting times. Technological developments follow each other up in high pace and many opportunities occur for companies to be smarter than their competitors. The disruptive notion of these developments is so big that people talk about a new, fourth, industrial revolution. This industrial revolution, that is being characterized and driven by seven drivers is called industry 4.0. The popularity of this industrial revolution is seemingly apparent everywhere and is being described, by some, as “manufacturing its next act”. Even though this sounds promising and applicable to every company, the practical consequences and feasibility are, most of the times, being overlooked. Especially a theoretical foundation on differences in feasibility between small and medium - sized enterprises (SMEs) and large firms is missing. In this thesis, we are going to take the reader through a journey that will help readers understand the positioning and perspective of firms regarding industry 4.0 and eventually the practical effects of industry 4.0 on business models of manufacturing firms will be presented. This research provides enough clarity on the topic to answer the follow research questions. This thesis aims to fill the gap in available research in which business model change is being linked to industry 4.0. Due to the novelty of industry 4.0 few researches on the practical effects are not yet fully explored in the literature. Business models, a more traditional area of research, has not yet touched upon the effects industry 4.0 has on the business models of company. Our purpose is to combine these two topics and provide both SMEs and large firms an overview on what the effects of industry 4.0 are in practice. Furthermore, the perspectives and positioning of our sample firms can provide clarity for potential implementers, since wide range of participants provide different insights on the topic and therefore give clarity on the practical use of industry 4.0. During this, the researchers, by converting observations and findings into theory, follow an inductive approach. The study uses a qualitative design and semi-structured interviews has been conducted to collect the data. Our sample firms consist of both SMEs and large firms and are all located within Europe. The researchers found that there are some key differences in the positivity on industry 4.0 between the academic and business world. Companies might be highly automated and have implemented some of the drivers of industry 4.0, but the definition itself is not popular. Where some of our sample firms are convinced industry 4.0 is the new way of working, most of them are using the technologies simply because it is the best in the market and helps them to follow their strategy. Industry 4.0 can be seen as an interesting tool for firms to become smarter and achieve better results, but not at all costs. Especially for SMEs implementing industry 4.0 should not be the sole goal of the company, since it is decided by many factors whether or not industry 4.0 will succeed in the company. In terms of business models, industry 4.0 causes many changes. The role of industry 4.0 can be seen as an enabler for change, rather than the reason to build a business model around. / Social science; Business Administratiom Business Administration Företagsekonomi
505	El fenomeno de Big Data y redes sociales : un estudio de las implicancias de la información digital de los individuos para el reclutamiento y selección en Chile Astorga Batarce, Cristian January 2017 (has links) Seminario para optar al título de Ingeniero Comercial, Mención Administración / El presente trabajo consiste en una investigación descriptiva sobre la utilización del Big Data en las organizaciones chilenas para los procesos de Reclutamiento y Selección, y su correspondencia con la legislación nacional vigente en materia de protección de la vida privada, así como la dimensión ética de las decisiones emprendidas por distintos agentes relacionados a estos subsistemas dentro de la Gestión de Personas. El objetivo de esta investigación es determinar el grado de conocimiento que poseen los representados de las organizaciones chilenas sobre el Big Data y la utilidad práctica de esta. Así mismo, se pretende definir el estado actual de la legislatura chilena sobre derechos digitales y el grado de responsabilidad ética de los tomadores de decisiones que utilizan la información digital de los individuos. La metodología que se empleó para llevar a cabo la investigación fue de naturaleza descriptiva, a través de la recopilación de información por medio de entrevistas en profundidad. Se realizaron entrevistas a una muestra de 5 personas con cargos relacionados al Reclutamiento y Selección, con experiencia en distintas industrias. La investigación, de carácter inductivo, incluye un marco teórico donde se revisan tópicos relacionados a Big Data, privacidad digital, le legislatura chilena actual y la ética. Adicionalmente, se realiza un análisis del discurso expresado en las entrevistas, para luego realizar un diagnóstico del escenario chileno actual sobre las materias antes descritas. Los resultados del análisis permitirán al lector adoptar una posición respecto de la situación actual en materia de la protección de la privacidad digital de los individuos, así como hacer un juicio personal sobre las prácticas en las que los tomadores de decisiones incurren justificando la búsqueda de ajuste entre los candidatos y la organización. Redes sociales--Chile Big Data Administración
506	Leveraging big data for competitive advantage in a media organisation Nartey, Cecil Kabu January 2015 (has links) Thesis submitted in fulfilment of the requirements for the degree Master of Technology: Information Technology In the Faculty of Informatics and Design at the Cape Peninsula University of Technology / Data sources often emerge with the potential to transform, drive and allow deriving never-envisaged business value. These data sources change the way business enacts and models value generation. As a result, sellers are compelled to capture value by collecting data about business elements that drive change. Some of these elements, such as the customer and products, generate data as part of transactions which necessitates placement of the business element at the centre of the organisation’s data curation journey. This is in order to reveal changes and how these elements affect the business model. Data in business represents information translated into a format convenient for transfer. Data holds the relevant markers needed to measure business elements and provide the relevant metrics to monitor, steer and forecast business to attain enterprise goals. Data forms the building blocks of information within an organisation, allowing for knowledge and facts to be obtained. At its lowest level of abstraction, it provides a platform from which insights and knowledge can be derived as a direct extract for business decision-making as these decisions steer business into profitable situations. Because of this, organisations have had to adapt or change their business models to derive business value for sustainability, profitability and transformation. An organisation’s business model reflects a conceptual representation on how the organisation obtains and delivers value to prospective customers (the service beneficiary). In the process of delivering value to the service beneficiaries, data is generated. Generated data leads to business knowledge which can be leveraged to re-engineer the business model. The business model dictates which information and technology assets are needed for a balanced, profitable and optimised operation. The information assets represent value holding documented facts. Information assets go hand in hand with technology assets. The technology assets within an organisation are the technologies (computers, communications and databases) that support the automation of well-defined tasks as the organisation seeks to remain relevant to its clientele. What has become apparent is the fact that companies find it difficult to leverage the opportunities that data, and for that matter Big Data (BD), offers them. A data curation journey enables a seller to strategise and collect insightful data to influence how business may be conducted in a sustainable and profitable way while positioning the curating firm in a state of ‘information advantage’. While much of the discussion surrounding the concept of BD has focused on programming models (such as Hadoop) and technology innovations usually referred to as disruptive technologies (such as The Internet of Things and Automation of Knowledge Work), the real driver of technology and business is BD economics, which is the combination of open source data management and advanced analytics software coupled with commodity-based, scale-out architectures which are comparatively cheaper than prevalent sustainable technologies known to industry. Hadoop, though hugely misconstrued, is not an integration platform; it is a model the helps determine data value while it brings on-board an optimised way of curating data cheaply as part of the integration architecture. The objectives of the study were to explore how BD can be used to utilise the opportunities it offers the organisation, such as leveraging insights to enable business for transformation. This is accomplished by assessing the level of BD integration with the business model using the BD Business Model Maturation Index. Guidelines with subsequent recommendations are proposed for curation procedures aimed at improving the curation process. A qualitative research methodology was adopted. The research design outlines the research as a single case study; it outlines the philosophy as interpretivist, the approach as data collection through interviews, and the strategy as a review of the method of analysis deployed in the study. Themes that emerged from categorised data indicate the diverging of business elements into primary business elements and secondary supporting business elements. Furthermore, results show that data curation still hinges firmly on traditional data curation processes which diminish the benefits associated with BD curation. Results suggest a guided data curation process optimised by persistence hybridisation as an enabler to gain information advantage. The research also evaluated the level of integration of BD into the case business model to extrapolate results leading to guidelines and recommendations for BD curation. Big Data Data curation Business model Competitive advantage data monetisation Polyglot persistence
507	GoldBI: uma solu??o de Business Intelligence como servi?o / GoldBI: a Business Intelligence as a service solution Silva Neto, Arlindo Rodrigues da 26 August 2016 (has links) Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2017-03-14T23:51:19Z No. of bitstreams: 1 ArlindoRodriguesDaSilvaNeto_DISSERT.pdf: 3147140 bytes, checksum: 65ec83f6b7b7603769da720a2273e85b (MD5) / Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2017-03-16T23:01:46Z (GMT) No. of bitstreams: 1 ArlindoRodriguesDaSilvaNeto_DISSERT.pdf: 3147140 bytes, checksum: 65ec83f6b7b7603769da720a2273e85b (MD5) / Made available in DSpace on 2017-03-16T23:01:46Z (GMT). No. of bitstreams: 1 ArlindoRodriguesDaSilvaNeto_DISSERT.pdf: 3147140 bytes, checksum: 65ec83f6b7b7603769da720a2273e85b (MD5) Previous issue date: 2016-08-26 / Este trabalho consiste em criar uma ferramenta de BI (Business Intelligence) dispon?vel em nuvem (cloud computing) atrav?s de SaaS (Software as Service) utilizando t?cnicas de ETL (Extract, Transform, Load) e tecnologias de Big Data, com a inten??o de facilitar a extra??o descentralizada e o processamento de dados em grande quantidade. Atualmente, constata-se que ? praticamente invi?vel realizar uma an?lise consistente sem o aux?lio de um software para gera??o de relat?rios e estat?sticas. Para tais fins, a obten??o de resultados concretos com a tomada de decis?o exige estrat?gias de an?lise de dados e vari?veis consolidadas. Partindo dessa vis?o, enfatiza-se neste estudo o Business Intelligence (BI) com o objetivo de simplificar a an?lise de informa??es gerenciais e estat?sticas para propiciar indicadores atrav?s de gr?ficos ou listagens din?micas de dados gerenciais. Assim, ? poss?vel inferir que, com o crescimento exponencial dos dados torna-se cada vez mais dif?cil a obten??o de resultados de forma r?pida e consistente, tornando necess?rio atuar com novas t?cnicas e ferramentas para tratamentos de dados em larga escala. Este trabalho ? de natureza t?cnica de cria??o de um produto de Engenharia de Software, fundamentado a partir do estudo da arte da ?rea, e de um comparativo com as principais ferramentas existentes no mercado, evidenciando vantagens e desvantagens da solu??o criada. / This work is to create a BI tool (Business Intelligence) available in the cloud (cloud computing) through SaaS (Software as Service) using ETL techniques (extract, transform, load) and Big Data technologies, with the intention of facilitating decentralized extraction and data processing in large quantities. Currently, it appears that it is practically impossible conduct a consistent analysis without the aid of a software for reporting and statistics. For these purposes, the achievement of concrete results with decision making requires data analysis strategies and consolidated variable. From this view, it is emphasized in this study Business Intelligence (BI) in order to simplify the analysis of management information and statistics to provide indicators through graphs or dynamic lists of data management. Thus, it is possible to infer that with the exponential growth of data becomes increasingly difficult to obtain results quickly and consistently, making it necessary to work with new techniques and tools for large-scale data processing. This work is technical in nature to create a product of Software Engineering, based from the study of art in the area, and a comparison with the main existing tools on the market, showing advantages and disadvantages of the created solution. / 2020-12-31 Business Intelligence BI Big data ETL Map reduce Hadoop Spark SaaS MongoDB
508	High performance trace replay event simulation of parallel programs behavior / Ferramenta de alto desempenho para análise de comportamento de programas paralelos baseada em rastos de execução Korndorfer, Jonas Henrique Muller January 2016 (has links) Sistemas modernos de alto desempenho compreendem milhares a milhões de unidades de processamento. O desenvolvimento de uma aplicação paralela escalável para tais sistemas depende de um mapeamento preciso da utilização recursos disponíveis. A identificação de recursos não utilizados e os gargalos de processamento requere uma boa análise desempenho. A observação de rastros de execução é uma das técnicas mais úteis para esse fim. Infelizmente, o rastreamento muitas vezes produz grandes arquivos de rastro, atingindo facilmente gigabytes de dados brutos. Portanto ferramentas para análise de desempenho baseadas em rastros precisam processar esses dados para uma forma legível e serem eficientes a fim de permitirem uma análise rápida e útil. A maioria das ferramentas existentes, tais como Vampir, Scalasca e TAU, focam no processamento de formatos de rastro com semântica associada, geralmente definidos para lidar com programas desenvolvidos com bibliotecas populares como OpenMP, MPI e CUDA. No entanto, nem todas aplicações paralelas utilizam essas bibliotecas e assim, algumas vezes, essas ferramentas podem não ser úteis. Felizmente existem outras ferramentas que apresentam uma abordagem mais dinâmica, utilizando um formato de arquivo de rastro aberto e sem semântica específica. Algumas dessas ferramentas são Paraver, Pajé e PajeNG. Por outro lado, ser genérico tem custo e assim tais ferramentas frequentemente apresentam baixo desempenho para o processamento de grandes rastros. O objetivo deste trabalho é apresentar otimizações feitas para o conjunto de ferramentas PajeNG. São apresentados o desenvolvimento de um estratégia de paralelização para o PajeNG e uma análise de desempenho para demonstrar nossos ganhos. O PajeNG original funciona sequencialmente, processando um único arquivo de rastro que contém todos os dados do programa rastreado. Desta forma, a escalabilidade da ferramenta fica muito limitada pela leitura dos dados. Nossa estratégia divide o arquivo em pedaços permitindo seu processamento em paralelo. O método desenvolvido para separar os rastros permite que cada pedaço execute em um fluxo de execução separado. Nossos experimentos foram executados em máquinas com acesso não uniforme à memória (NUMA).Aanálise de desempenho desenvolvida considera vários aspectos como localidade das threads, o número de fluxos, tipo de disco e também comparações entre os nós NUMA. Os resultados obtidos são muito promissores, escalando o PajeNG cerca de oito a onze vezes, dependendo da máquina. / Modern high performance systems comprise thousands to millions of processing units. The development of a scalable parallel application for such systems depends on an accurate mapping of application processes on top of available resources. The identification of unused resources and potential processing bottlenecks requires good performance analysis. The trace-based observation of a parallel program execution is one of the most helpful techniques for such purpose. Unfortunately, tracing often produces large trace files, easily reaching the order of gigabytes of raw data. Therefore tracebased performance analysis tools have to process such data to a human readable way and also should be efficient to allow an useful analysis. Most of the existing tools such as Vampir, Scalasca, TAU have focus on the processing of trace formats with a fixed and well-defined semantic. The corresponding file format are usually proposed to handle applications developed using popular libraries like OpenMP, MPI, and CUDA. However, not all parallel applications use such libraries and so, sometimes, these tools cannot be useful. Fortunately, there are other tools that present a more dynamic approach by using an open trace file format without specific semantic. Some of these tools are the Paraver, Pajé and PajeNG. However the fact of being generic comes with a cost. These tools very frequently present low performance for the processing of large traces. The objective of this work is to present performance optimizations made in the PajeNG tool-set. This comprises the development of a parallelization strategy and a performance analysis to set our gains. The original PajeNG works sequentially by processing a single trace file with all data from the observed application. This way, the scalability of the tool is very limited by the reading of the trace file. Our strategy splits such file to process several pieces in parallel. The created method to split the traces allows the processing of each piece in each thread. The experiments were executed in non-uniform memory access (NUMA) machines. The performance analysis considers several aspects like threads locality, number of flows, disk type and also comparisons between the NUMA nodes. The obtained results are very promising, scaling up the PajeNG about eight to eleven times depending on the machine. Processamento paralelo Processamento : Alto desempenho Parallel application Performance analysis High performance Big data Trace replay
509	Ensaios em macroeconomia aplicada Costa, Hudson Chaves January 2016 (has links) Esta tese apresenta três ensaios em macroeconomia aplicada e que possuem em comum o uso de técnicas estatísticas e econométricas em problemas macroeconômicos. Dentre os campos de pesquisa da macroeconomia aplicada, a tese faz uso de modelos macroeconômicos microfundamentados, em sua versão DSGE-VAR, e da macroeconomia financeira por meio da avaliação do comportamento da correlação entre os retornos das ações usando modelos Garch multivariados. Além disso, a tese provoca a discussão sobre um novo campo de pesquisa em macroeconomia que surge a partir do advento da tecnologia. No primeiro ensaio, aplicamos a abordagem DSGE-VAR na discussão sobre a reação do Banco Central do Brasil (BCB) as oscilações na taxa de câmbio, especificamente para o caso de uma economia sob metas de inflação. Para tanto, baseando-se no modelo para uma economia aberta desenvolvido por Gali e Monacelli (2005) e modificado por Lubik e Schorfheide (2007), estimamos uma regra de política monetária para o Brasil e examinamos em que medida o BCB responde a mudanças na taxa de câmbio. Além disso, estudamos o grau de má especificação do modelo DSGE proposto. Mais especificamente, comparamos a verossimilhança marginal do modelo DSGE às do modelo DSGE-VAR e examinamos se o Banco Central conseguiu isolar a economia brasileira, em particular a inflação, de choques externos. Nossas conclusões mostram que as respostas aos desvios da taxa de câmbio são diferentes de zero e menores do que as respostas aos desvios da inflação. Finalmente, o ajuste do modelo DSGE é consideravelmente pior do que o ajuste do modelo DSGE-VAR, independentemente do número de defasagens utilizadas no VAR o que indica que de um ponto de vista estatístico existem evidências de que as restrições cruzadas do modelo teórico são violadas nos dados. O segundo ensaio examina empiricamente o comportamento da correlação entre o retorno de ações listadas na BMF&BOVESPA no período de 2000 a 2015. Para tanto, utilizamos modelos GARCH multivariados introduzidos por Bollerslev (1990) para extrair a série temporal das matrizes de correlação condicional dos retornos das ações. Com a série temporal dos maiores autovalores das matrizes de correlação condicional estimadas, aplicamos testes estatísticos (raiz unitária, quebra estrutural e tendência) para verificar a existência de tendência estocástica ou determinística para a intensidade da correlação entre os retornos das ações representadas pelos autovalores. Nossas conclusões confirmam que tanto em períodos de crises nacionais como turbulências internacionais, há intensificação da correlação entre as ações. Contudo, não encontramos qualquer tendência de longo prazo na série temporal dos maiores autovalores das matrizes de correlação condicional. Isso sugere que apesar das conclusões de Costa, Mazzeu e Jr (2016) sobre a tendência de queda do risco idiossincrático no mercado acionário brasileiro, a correlação dos retornos não apresentou tendência de alta, conforme esperado pela teoria de finanças. No terceiro ensaio, apresentamos pesquisas que utilizaram Big Data, Machine Learning e Text Mining em problemas macroeconômicos e discutimos as principais técnicas e tecnologias adotadas bem como aplicamos elas na análise de sentimento do BCB sobre a economia. Por meio de técnicas de Web Scraping e Text Mining, acessamos e extraímos as palavras usadas na escrita das atas divulgadas pelo Comitê de Política Monetária (Copom) no site do BCB. Após isso, comparando tais palavras com um dicionário de sentimentos (Inquider) mantido pela Universidade de Harvard e originalmente apresentado por Stone, Dunphy e Smith (1966), foi possível criar um índice de sentimento para a autoridade monetária. Nossos resultados confirmam que tal abordagem pode contribuir para a avaliação econômica dado que a série temporal do índice proposto está relacionada com variáveis macroeconômicas importantes para as decisões do BCB. / This thesis presents three essays in applied macroeconomics and who have in common the use of statistical and econometric techniques in macroeconomic problems. Among the search fields of applied macroeconomics, the thesis makes use of microfounded macroeconomic models, in tis DSGE-VAR version, and financial macroeconomics through the evaluation of the behavior of correlation between stock returns using multivariate Garch models. In addition, leads a discussion on a new field of research in macroeconomics which arises from the advent of technology. In the first experiment, we applied the approach to dynamic stochastic general equilibrium (DSGE VAR in the discussion about the reaction of the Central Bank of Brazil (CBB) to fluctuations in the exchange rate, specifically for the case of an economy under inflation targeting. To this end, based on the model for an open economy developed by Gali and Monacelli (2005) and modified by Lubik and Schorfheide (2007), we estimate a rule of monetary policy for the United States and examine to what extent the CBC responds to changes in the exchange rate. In addition, we studied the degree of poor specification of the DSGE model proposed. More specifically, we compare the marginal likelihood of the DSGE model to the DSGE-VAR model and examine whether the Central Bank managed to isolate the brazilian economy, in particular the inflation, external shocks. Our findings show that the response to deviations of the exchange rate are different from zero and lower than the response to deviations of inflation. Finally, the adjustment of the DSGE model is considerably worse than the adjustment of the DSGE-VAR model, regardless of the number of lags used in the VAR which indicates that a statistical point of view there is evidence that the restrictions crusades of the theoretical model are violated in the data. The second essay examines empirically the behavior of the correlation between the return of shares listed on the BMF&BOVESPA over the period from 2000 to 2015. To this end, we use models multivariate GARCH introduced by Bollerslev (1990) to remove the temporal series of arrays of conditional correlation of returns of stocks. With the temporal series of the largest eigenvalues of matrices of correlation estimated conditional, we apply statistical tests (unit root, structural breaks and trend) to verify the existence of stochastic trend or deterministic to the intensity of the correlation between the returns of the shares represented by eigenvalues. Our findings confirm that both in times of crises at national and international turbulence, there is greater correlation between the actions. However, we did not find any long-term trend in time series of the largest eigenvalues of matrices of correlation conditional. In the third test, we present research that used Big Data, Machine Learning and Text Mining in macroeconomic problems and discuss the main techniques and technologies adopted and apply them in the analysis of feeling of BCB on the economy. Through techniques of Web Scraping and Text Mining, we accessed and extracted the words used in the writing of the minutes released by the Monetary Policy Committee (Copom) on the site of the BCB. After that, comparing these words with a dictionary of feelings (Inquider) maintained by Harvard University and originally presented by Stone, Dunphy and Smith (1966), it was possible to create an index of sentiment for the monetary authority. Our results confirm that such an approach can contribute to the economic assessment given that the temporal series of the index proposed is related with macroeconomic variables are important for decisions of the BCB. Macroeconomia Taxa de câmbio Política monetária DSGE-VAR Idiosyncratic risk Multivariate GARCH Big Data Machine learning
510	A benchmark suite for distributed stream processing systems / Um benchmark suite para sistemas distribuídos de stream processing Bordin, Maycon Viana January 2017 (has links) Um dado por si só não possui valor algum, a menos que ele seja interpretado, contextualizado e agregado com outros dados, para então possuir valor, tornando-o uma informação. Em algumas classes de aplicações o valor não está apenas na informação, mas também na velocidade com que essa informação é obtida. As negociações de alta frequência (NAF) são um bom exemplo onde a lucratividade é diretamente proporcional a latência (LOVELESS; STOIKOV; WAEBER, 2013). Com a evolução do hardware e de ferramentas de processamento de dados diversas aplicações que antes levavam horas para produzir resultados, hoje precisam produzir resultados em questão de minutos ou segundos (BARLOW, 2013). Este tipo de aplicação tem como característica, além da necessidade de processamento em tempo-real ou quase real, a ingestão contínua de grandes e ilimitadas quantidades de dados na forma de tuplas ou eventos. A crescente demanda por aplicações com esses requisitos levou a criação de sistemas que disponibilizam um modelo de programação que abstrai detalhes como escalonamento, tolerância a falhas, processamento e otimização de consultas. Estes sistemas são conhecidos como Stream Processing Systems (SPS), Data Stream Management Systems (DSMS) (CHAKRAVARTHY, 2009) ou Stream Processing Engines (SPE) (ABADI et al., 2005). Ultimamente estes sistemas adotaram uma arquitetura distribuída como forma de lidar com as quantidades cada vez maiores de dados (ZAHARIA et al., 2012). Entre estes sistemas estão S4, Storm, Spark Streaming, Flink Streaming e mais recentemente Samza e Apache Beam. Estes sistemas modelam o processamento de dados através de um grafo de fluxo com vértices representando os operadores e as arestas representando os data streams. Mas as similaridades não vão muito além disso, pois cada sistema possui suas particularidades com relação aos mecanismos de tolerância e recuperação a falhas, escalonamento e paralelismo de operadores, e padrões de comunicação. Neste senário seria útil possuir uma ferramenta para a comparação destes sistemas em diferentes workloads, para auxiliar na seleção da plataforma mais adequada para um trabalho específico. Este trabalho propõe um benchmark composto por aplicações de diferentes áreas, bem como um framework para o desenvolvimento e avaliação de SPSs distribuídos. / Recently a new application domain characterized by the continuous and low-latency processing of large volumes of data has been gaining attention. The growing number of applications of such genre has led to the creation of Stream Processing Systems (SPSs), systems that abstract the details of real-time applications from the developer. More recently, the ever increasing volumes of data to be processed gave rise to distributed SPSs. Currently there are in the market several distributed SPSs, however the existing benchmarks designed for the evaluation this kind of system covers only a few applications and workloads, while these systems have a much wider set of applications. In this work a benchmark for stream processing systems is proposed. Based on a survey of several papers with real-time and stream applications, the most used applications and areas were outlined, as well as the most used metrics in the performance evaluation of such applications. With these information the metrics of the benchmark were selected as well as a list of possible application to be part of the benchmark. Those passed through a workload characterization in order to select a diverse set of applications. To ease the evaluation of SPSs a framework was created with an API to generalize the application development and collect metrics, with the possibility of extending it to support other platforms in the future. To prove the usefulness of the benchmark, a subset of the applications were executed on Storm and Spark using the Azure Platform and the results have demonstrated the usefulness of the benchmark suite in comparing these systems. Processamento distribuido Processamento : Alto desempenho Distributed systems Benchmark suite Stream processing Real-time processing Big data

Search results