Global ETD Search

501	Industry 4.0: An Opportunity or a Threat? : A Qualitative Study Among Manufacturing Companies Venema, Sven, Anger Bergström, Albin January 2018 (has links) Manufacturing companies are currently going through exciting times. Technological developments follow each other up in high pace and many opportunities occur for companies to be smarter than their competitors. The disruptive notion of these developments is so big that people talk about a new, fourth, industrial revolution. This industrial revolution, that is being characterized and driven by seven drivers is called industry 4.0. The popularity of this industrial revolution is seemingly apparent everywhere and is being described, by some, as “manufacturing its next act”. Even though this sounds promising and applicable to every company, the practical consequences and feasibility are, most of the times, being overlooked. Especially a theoretical foundation on differences in feasibility between small and medium - sized enterprises (SMEs) and large firms is missing. In this thesis, we are going to take the reader through a journey that will help readers understand the positioning and perspective of firms regarding industry 4.0 and eventually the practical effects of industry 4.0 on business models of manufacturing firms will be presented. This research provides enough clarity on the topic to answer the follow research questions. This thesis aims to fill the gap in available research in which business model change is being linked to industry 4.0. Due to the novelty of industry 4.0 few researches on the practical effects are not yet fully explored in the literature. Business models, a more traditional area of research, has not yet touched upon the effects industry 4.0 has on the business models of company. Our purpose is to combine these two topics and provide both SMEs and large firms an overview on what the effects of industry 4.0 are in practice. Furthermore, the perspectives and positioning of our sample firms can provide clarity for potential implementers, since wide range of participants provide different insights on the topic and therefore give clarity on the practical use of industry 4.0. During this, the researchers, by converting observations and findings into theory, follow an inductive approach. The study uses a qualitative design and semi-structured interviews has been conducted to collect the data. Our sample firms consist of both SMEs and large firms and are all located within Europe. The researchers found that there are some key differences in the positivity on industry 4.0 between the academic and business world. Companies might be highly automated and have implemented some of the drivers of industry 4.0, but the definition itself is not popular. Where some of our sample firms are convinced industry 4.0 is the new way of working, most of them are using the technologies simply because it is the best in the market and helps them to follow their strategy. Industry 4.0 can be seen as an interesting tool for firms to become smarter and achieve better results, but not at all costs. Especially for SMEs implementing industry 4.0 should not be the sole goal of the company, since it is decided by many factors whether or not industry 4.0 will succeed in the company. In terms of business models, industry 4.0 causes many changes. The role of industry 4.0 can be seen as an enabler for change, rather than the reason to build a business model around. / Social science; Business Administratiom Business Administration Företagsekonomi
502	El fenomeno de Big Data y redes sociales : un estudio de las implicancias de la información digital de los individuos para el reclutamiento y selección en Chile Astorga Batarce, Cristian January 2017 (has links) Seminario para optar al título de Ingeniero Comercial, Mención Administración / El presente trabajo consiste en una investigación descriptiva sobre la utilización del Big Data en las organizaciones chilenas para los procesos de Reclutamiento y Selección, y su correspondencia con la legislación nacional vigente en materia de protección de la vida privada, así como la dimensión ética de las decisiones emprendidas por distintos agentes relacionados a estos subsistemas dentro de la Gestión de Personas. El objetivo de esta investigación es determinar el grado de conocimiento que poseen los representados de las organizaciones chilenas sobre el Big Data y la utilidad práctica de esta. Así mismo, se pretende definir el estado actual de la legislatura chilena sobre derechos digitales y el grado de responsabilidad ética de los tomadores de decisiones que utilizan la información digital de los individuos. La metodología que se empleó para llevar a cabo la investigación fue de naturaleza descriptiva, a través de la recopilación de información por medio de entrevistas en profundidad. Se realizaron entrevistas a una muestra de 5 personas con cargos relacionados al Reclutamiento y Selección, con experiencia en distintas industrias. La investigación, de carácter inductivo, incluye un marco teórico donde se revisan tópicos relacionados a Big Data, privacidad digital, le legislatura chilena actual y la ética. Adicionalmente, se realiza un análisis del discurso expresado en las entrevistas, para luego realizar un diagnóstico del escenario chileno actual sobre las materias antes descritas. Los resultados del análisis permitirán al lector adoptar una posición respecto de la situación actual en materia de la protección de la privacidad digital de los individuos, así como hacer un juicio personal sobre las prácticas en las que los tomadores de decisiones incurren justificando la búsqueda de ajuste entre los candidatos y la organización. Redes sociales--Chile Big Data Administración
503	Leveraging big data for competitive advantage in a media organisation Nartey, Cecil Kabu January 2015 (has links) Thesis submitted in fulfilment of the requirements for the degree Master of Technology: Information Technology In the Faculty of Informatics and Design at the Cape Peninsula University of Technology / Data sources often emerge with the potential to transform, drive and allow deriving never-envisaged business value. These data sources change the way business enacts and models value generation. As a result, sellers are compelled to capture value by collecting data about business elements that drive change. Some of these elements, such as the customer and products, generate data as part of transactions which necessitates placement of the business element at the centre of the organisation’s data curation journey. This is in order to reveal changes and how these elements affect the business model. Data in business represents information translated into a format convenient for transfer. Data holds the relevant markers needed to measure business elements and provide the relevant metrics to monitor, steer and forecast business to attain enterprise goals. Data forms the building blocks of information within an organisation, allowing for knowledge and facts to be obtained. At its lowest level of abstraction, it provides a platform from which insights and knowledge can be derived as a direct extract for business decision-making as these decisions steer business into profitable situations. Because of this, organisations have had to adapt or change their business models to derive business value for sustainability, profitability and transformation. An organisation’s business model reflects a conceptual representation on how the organisation obtains and delivers value to prospective customers (the service beneficiary). In the process of delivering value to the service beneficiaries, data is generated. Generated data leads to business knowledge which can be leveraged to re-engineer the business model. The business model dictates which information and technology assets are needed for a balanced, profitable and optimised operation. The information assets represent value holding documented facts. Information assets go hand in hand with technology assets. The technology assets within an organisation are the technologies (computers, communications and databases) that support the automation of well-defined tasks as the organisation seeks to remain relevant to its clientele. What has become apparent is the fact that companies find it difficult to leverage the opportunities that data, and for that matter Big Data (BD), offers them. A data curation journey enables a seller to strategise and collect insightful data to influence how business may be conducted in a sustainable and profitable way while positioning the curating firm in a state of ‘information advantage’. While much of the discussion surrounding the concept of BD has focused on programming models (such as Hadoop) and technology innovations usually referred to as disruptive technologies (such as The Internet of Things and Automation of Knowledge Work), the real driver of technology and business is BD economics, which is the combination of open source data management and advanced analytics software coupled with commodity-based, scale-out architectures which are comparatively cheaper than prevalent sustainable technologies known to industry. Hadoop, though hugely misconstrued, is not an integration platform; it is a model the helps determine data value while it brings on-board an optimised way of curating data cheaply as part of the integration architecture. The objectives of the study were to explore how BD can be used to utilise the opportunities it offers the organisation, such as leveraging insights to enable business for transformation. This is accomplished by assessing the level of BD integration with the business model using the BD Business Model Maturation Index. Guidelines with subsequent recommendations are proposed for curation procedures aimed at improving the curation process. A qualitative research methodology was adopted. The research design outlines the research as a single case study; it outlines the philosophy as interpretivist, the approach as data collection through interviews, and the strategy as a review of the method of analysis deployed in the study. Themes that emerged from categorised data indicate the diverging of business elements into primary business elements and secondary supporting business elements. Furthermore, results show that data curation still hinges firmly on traditional data curation processes which diminish the benefits associated with BD curation. Results suggest a guided data curation process optimised by persistence hybridisation as an enabler to gain information advantage. The research also evaluated the level of integration of BD into the case business model to extrapolate results leading to guidelines and recommendations for BD curation. Big Data Data curation Business model Competitive advantage data monetisation Polyglot persistence
504	GoldBI: uma solu??o de Business Intelligence como servi?o / GoldBI: a Business Intelligence as a service solution Silva Neto, Arlindo Rodrigues da 26 August 2016 (has links) Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2017-03-14T23:51:19Z No. of bitstreams: 1 ArlindoRodriguesDaSilvaNeto_DISSERT.pdf: 3147140 bytes, checksum: 65ec83f6b7b7603769da720a2273e85b (MD5) / Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2017-03-16T23:01:46Z (GMT) No. of bitstreams: 1 ArlindoRodriguesDaSilvaNeto_DISSERT.pdf: 3147140 bytes, checksum: 65ec83f6b7b7603769da720a2273e85b (MD5) / Made available in DSpace on 2017-03-16T23:01:46Z (GMT). No. of bitstreams: 1 ArlindoRodriguesDaSilvaNeto_DISSERT.pdf: 3147140 bytes, checksum: 65ec83f6b7b7603769da720a2273e85b (MD5) Previous issue date: 2016-08-26 / Este trabalho consiste em criar uma ferramenta de BI (Business Intelligence) dispon?vel em nuvem (cloud computing) atrav?s de SaaS (Software as Service) utilizando t?cnicas de ETL (Extract, Transform, Load) e tecnologias de Big Data, com a inten??o de facilitar a extra??o descentralizada e o processamento de dados em grande quantidade. Atualmente, constata-se que ? praticamente invi?vel realizar uma an?lise consistente sem o aux?lio de um software para gera??o de relat?rios e estat?sticas. Para tais fins, a obten??o de resultados concretos com a tomada de decis?o exige estrat?gias de an?lise de dados e vari?veis consolidadas. Partindo dessa vis?o, enfatiza-se neste estudo o Business Intelligence (BI) com o objetivo de simplificar a an?lise de informa??es gerenciais e estat?sticas para propiciar indicadores atrav?s de gr?ficos ou listagens din?micas de dados gerenciais. Assim, ? poss?vel inferir que, com o crescimento exponencial dos dados torna-se cada vez mais dif?cil a obten??o de resultados de forma r?pida e consistente, tornando necess?rio atuar com novas t?cnicas e ferramentas para tratamentos de dados em larga escala. Este trabalho ? de natureza t?cnica de cria??o de um produto de Engenharia de Software, fundamentado a partir do estudo da arte da ?rea, e de um comparativo com as principais ferramentas existentes no mercado, evidenciando vantagens e desvantagens da solu??o criada. / This work is to create a BI tool (Business Intelligence) available in the cloud (cloud computing) through SaaS (Software as Service) using ETL techniques (extract, transform, load) and Big Data technologies, with the intention of facilitating decentralized extraction and data processing in large quantities. Currently, it appears that it is practically impossible conduct a consistent analysis without the aid of a software for reporting and statistics. For these purposes, the achievement of concrete results with decision making requires data analysis strategies and consolidated variable. From this view, it is emphasized in this study Business Intelligence (BI) in order to simplify the analysis of management information and statistics to provide indicators through graphs or dynamic lists of data management. Thus, it is possible to infer that with the exponential growth of data becomes increasingly difficult to obtain results quickly and consistently, making it necessary to work with new techniques and tools for large-scale data processing. This work is technical in nature to create a product of Software Engineering, based from the study of art in the area, and a comparison with the main existing tools on the market, showing advantages and disadvantages of the created solution. / 2020-12-31 Business Intelligence BI Big data ETL Map reduce Hadoop Spark SaaS MongoDB
505	High performance trace replay event simulation of parallel programs behavior / Ferramenta de alto desempenho para análise de comportamento de programas paralelos baseada em rastos de execução Korndorfer, Jonas Henrique Muller January 2016 (has links) Sistemas modernos de alto desempenho compreendem milhares a milhões de unidades de processamento. O desenvolvimento de uma aplicação paralela escalável para tais sistemas depende de um mapeamento preciso da utilização recursos disponíveis. A identificação de recursos não utilizados e os gargalos de processamento requere uma boa análise desempenho. A observação de rastros de execução é uma das técnicas mais úteis para esse fim. Infelizmente, o rastreamento muitas vezes produz grandes arquivos de rastro, atingindo facilmente gigabytes de dados brutos. Portanto ferramentas para análise de desempenho baseadas em rastros precisam processar esses dados para uma forma legível e serem eficientes a fim de permitirem uma análise rápida e útil. A maioria das ferramentas existentes, tais como Vampir, Scalasca e TAU, focam no processamento de formatos de rastro com semântica associada, geralmente definidos para lidar com programas desenvolvidos com bibliotecas populares como OpenMP, MPI e CUDA. No entanto, nem todas aplicações paralelas utilizam essas bibliotecas e assim, algumas vezes, essas ferramentas podem não ser úteis. Felizmente existem outras ferramentas que apresentam uma abordagem mais dinâmica, utilizando um formato de arquivo de rastro aberto e sem semântica específica. Algumas dessas ferramentas são Paraver, Pajé e PajeNG. Por outro lado, ser genérico tem custo e assim tais ferramentas frequentemente apresentam baixo desempenho para o processamento de grandes rastros. O objetivo deste trabalho é apresentar otimizações feitas para o conjunto de ferramentas PajeNG. São apresentados o desenvolvimento de um estratégia de paralelização para o PajeNG e uma análise de desempenho para demonstrar nossos ganhos. O PajeNG original funciona sequencialmente, processando um único arquivo de rastro que contém todos os dados do programa rastreado. Desta forma, a escalabilidade da ferramenta fica muito limitada pela leitura dos dados. Nossa estratégia divide o arquivo em pedaços permitindo seu processamento em paralelo. O método desenvolvido para separar os rastros permite que cada pedaço execute em um fluxo de execução separado. Nossos experimentos foram executados em máquinas com acesso não uniforme à memória (NUMA).Aanálise de desempenho desenvolvida considera vários aspectos como localidade das threads, o número de fluxos, tipo de disco e também comparações entre os nós NUMA. Os resultados obtidos são muito promissores, escalando o PajeNG cerca de oito a onze vezes, dependendo da máquina. / Modern high performance systems comprise thousands to millions of processing units. The development of a scalable parallel application for such systems depends on an accurate mapping of application processes on top of available resources. The identification of unused resources and potential processing bottlenecks requires good performance analysis. The trace-based observation of a parallel program execution is one of the most helpful techniques for such purpose. Unfortunately, tracing often produces large trace files, easily reaching the order of gigabytes of raw data. Therefore tracebased performance analysis tools have to process such data to a human readable way and also should be efficient to allow an useful analysis. Most of the existing tools such as Vampir, Scalasca, TAU have focus on the processing of trace formats with a fixed and well-defined semantic. The corresponding file format are usually proposed to handle applications developed using popular libraries like OpenMP, MPI, and CUDA. However, not all parallel applications use such libraries and so, sometimes, these tools cannot be useful. Fortunately, there are other tools that present a more dynamic approach by using an open trace file format without specific semantic. Some of these tools are the Paraver, Pajé and PajeNG. However the fact of being generic comes with a cost. These tools very frequently present low performance for the processing of large traces. The objective of this work is to present performance optimizations made in the PajeNG tool-set. This comprises the development of a parallelization strategy and a performance analysis to set our gains. The original PajeNG works sequentially by processing a single trace file with all data from the observed application. This way, the scalability of the tool is very limited by the reading of the trace file. Our strategy splits such file to process several pieces in parallel. The created method to split the traces allows the processing of each piece in each thread. The experiments were executed in non-uniform memory access (NUMA) machines. The performance analysis considers several aspects like threads locality, number of flows, disk type and also comparisons between the NUMA nodes. The obtained results are very promising, scaling up the PajeNG about eight to eleven times depending on the machine. Processamento paralelo Processamento : Alto desempenho Parallel application Performance analysis High performance Big data Trace replay
506	Ensaios em macroeconomia aplicada Costa, Hudson Chaves January 2016 (has links) Esta tese apresenta três ensaios em macroeconomia aplicada e que possuem em comum o uso de técnicas estatísticas e econométricas em problemas macroeconômicos. Dentre os campos de pesquisa da macroeconomia aplicada, a tese faz uso de modelos macroeconômicos microfundamentados, em sua versão DSGE-VAR, e da macroeconomia financeira por meio da avaliação do comportamento da correlação entre os retornos das ações usando modelos Garch multivariados. Além disso, a tese provoca a discussão sobre um novo campo de pesquisa em macroeconomia que surge a partir do advento da tecnologia. No primeiro ensaio, aplicamos a abordagem DSGE-VAR na discussão sobre a reação do Banco Central do Brasil (BCB) as oscilações na taxa de câmbio, especificamente para o caso de uma economia sob metas de inflação. Para tanto, baseando-se no modelo para uma economia aberta desenvolvido por Gali e Monacelli (2005) e modificado por Lubik e Schorfheide (2007), estimamos uma regra de política monetária para o Brasil e examinamos em que medida o BCB responde a mudanças na taxa de câmbio. Além disso, estudamos o grau de má especificação do modelo DSGE proposto. Mais especificamente, comparamos a verossimilhança marginal do modelo DSGE às do modelo DSGE-VAR e examinamos se o Banco Central conseguiu isolar a economia brasileira, em particular a inflação, de choques externos. Nossas conclusões mostram que as respostas aos desvios da taxa de câmbio são diferentes de zero e menores do que as respostas aos desvios da inflação. Finalmente, o ajuste do modelo DSGE é consideravelmente pior do que o ajuste do modelo DSGE-VAR, independentemente do número de defasagens utilizadas no VAR o que indica que de um ponto de vista estatístico existem evidências de que as restrições cruzadas do modelo teórico são violadas nos dados. O segundo ensaio examina empiricamente o comportamento da correlação entre o retorno de ações listadas na BMF&BOVESPA no período de 2000 a 2015. Para tanto, utilizamos modelos GARCH multivariados introduzidos por Bollerslev (1990) para extrair a série temporal das matrizes de correlação condicional dos retornos das ações. Com a série temporal dos maiores autovalores das matrizes de correlação condicional estimadas, aplicamos testes estatísticos (raiz unitária, quebra estrutural e tendência) para verificar a existência de tendência estocástica ou determinística para a intensidade da correlação entre os retornos das ações representadas pelos autovalores. Nossas conclusões confirmam que tanto em períodos de crises nacionais como turbulências internacionais, há intensificação da correlação entre as ações. Contudo, não encontramos qualquer tendência de longo prazo na série temporal dos maiores autovalores das matrizes de correlação condicional. Isso sugere que apesar das conclusões de Costa, Mazzeu e Jr (2016) sobre a tendência de queda do risco idiossincrático no mercado acionário brasileiro, a correlação dos retornos não apresentou tendência de alta, conforme esperado pela teoria de finanças. No terceiro ensaio, apresentamos pesquisas que utilizaram Big Data, Machine Learning e Text Mining em problemas macroeconômicos e discutimos as principais técnicas e tecnologias adotadas bem como aplicamos elas na análise de sentimento do BCB sobre a economia. Por meio de técnicas de Web Scraping e Text Mining, acessamos e extraímos as palavras usadas na escrita das atas divulgadas pelo Comitê de Política Monetária (Copom) no site do BCB. Após isso, comparando tais palavras com um dicionário de sentimentos (Inquider) mantido pela Universidade de Harvard e originalmente apresentado por Stone, Dunphy e Smith (1966), foi possível criar um índice de sentimento para a autoridade monetária. Nossos resultados confirmam que tal abordagem pode contribuir para a avaliação econômica dado que a série temporal do índice proposto está relacionada com variáveis macroeconômicas importantes para as decisões do BCB. / This thesis presents three essays in applied macroeconomics and who have in common the use of statistical and econometric techniques in macroeconomic problems. Among the search fields of applied macroeconomics, the thesis makes use of microfounded macroeconomic models, in tis DSGE-VAR version, and financial macroeconomics through the evaluation of the behavior of correlation between stock returns using multivariate Garch models. In addition, leads a discussion on a new field of research in macroeconomics which arises from the advent of technology. In the first experiment, we applied the approach to dynamic stochastic general equilibrium (DSGE VAR in the discussion about the reaction of the Central Bank of Brazil (CBB) to fluctuations in the exchange rate, specifically for the case of an economy under inflation targeting. To this end, based on the model for an open economy developed by Gali and Monacelli (2005) and modified by Lubik and Schorfheide (2007), we estimate a rule of monetary policy for the United States and examine to what extent the CBC responds to changes in the exchange rate. In addition, we studied the degree of poor specification of the DSGE model proposed. More specifically, we compare the marginal likelihood of the DSGE model to the DSGE-VAR model and examine whether the Central Bank managed to isolate the brazilian economy, in particular the inflation, external shocks. Our findings show that the response to deviations of the exchange rate are different from zero and lower than the response to deviations of inflation. Finally, the adjustment of the DSGE model is considerably worse than the adjustment of the DSGE-VAR model, regardless of the number of lags used in the VAR which indicates that a statistical point of view there is evidence that the restrictions crusades of the theoretical model are violated in the data. The second essay examines empirically the behavior of the correlation between the return of shares listed on the BMF&BOVESPA over the period from 2000 to 2015. To this end, we use models multivariate GARCH introduced by Bollerslev (1990) to remove the temporal series of arrays of conditional correlation of returns of stocks. With the temporal series of the largest eigenvalues of matrices of correlation estimated conditional, we apply statistical tests (unit root, structural breaks and trend) to verify the existence of stochastic trend or deterministic to the intensity of the correlation between the returns of the shares represented by eigenvalues. Our findings confirm that both in times of crises at national and international turbulence, there is greater correlation between the actions. However, we did not find any long-term trend in time series of the largest eigenvalues of matrices of correlation conditional. In the third test, we present research that used Big Data, Machine Learning and Text Mining in macroeconomic problems and discuss the main techniques and technologies adopted and apply them in the analysis of feeling of BCB on the economy. Through techniques of Web Scraping and Text Mining, we accessed and extracted the words used in the writing of the minutes released by the Monetary Policy Committee (Copom) on the site of the BCB. After that, comparing these words with a dictionary of feelings (Inquider) maintained by Harvard University and originally presented by Stone, Dunphy and Smith (1966), it was possible to create an index of sentiment for the monetary authority. Our results confirm that such an approach can contribute to the economic assessment given that the temporal series of the index proposed is related with macroeconomic variables are important for decisions of the BCB. Macroeconomia Taxa de câmbio Política monetária DSGE-VAR Idiosyncratic risk Multivariate GARCH Big Data Machine learning
507	A benchmark suite for distributed stream processing systems / Um benchmark suite para sistemas distribuídos de stream processing Bordin, Maycon Viana January 2017 (has links) Um dado por si só não possui valor algum, a menos que ele seja interpretado, contextualizado e agregado com outros dados, para então possuir valor, tornando-o uma informação. Em algumas classes de aplicações o valor não está apenas na informação, mas também na velocidade com que essa informação é obtida. As negociações de alta frequência (NAF) são um bom exemplo onde a lucratividade é diretamente proporcional a latência (LOVELESS; STOIKOV; WAEBER, 2013). Com a evolução do hardware e de ferramentas de processamento de dados diversas aplicações que antes levavam horas para produzir resultados, hoje precisam produzir resultados em questão de minutos ou segundos (BARLOW, 2013). Este tipo de aplicação tem como característica, além da necessidade de processamento em tempo-real ou quase real, a ingestão contínua de grandes e ilimitadas quantidades de dados na forma de tuplas ou eventos. A crescente demanda por aplicações com esses requisitos levou a criação de sistemas que disponibilizam um modelo de programação que abstrai detalhes como escalonamento, tolerância a falhas, processamento e otimização de consultas. Estes sistemas são conhecidos como Stream Processing Systems (SPS), Data Stream Management Systems (DSMS) (CHAKRAVARTHY, 2009) ou Stream Processing Engines (SPE) (ABADI et al., 2005). Ultimamente estes sistemas adotaram uma arquitetura distribuída como forma de lidar com as quantidades cada vez maiores de dados (ZAHARIA et al., 2012). Entre estes sistemas estão S4, Storm, Spark Streaming, Flink Streaming e mais recentemente Samza e Apache Beam. Estes sistemas modelam o processamento de dados através de um grafo de fluxo com vértices representando os operadores e as arestas representando os data streams. Mas as similaridades não vão muito além disso, pois cada sistema possui suas particularidades com relação aos mecanismos de tolerância e recuperação a falhas, escalonamento e paralelismo de operadores, e padrões de comunicação. Neste senário seria útil possuir uma ferramenta para a comparação destes sistemas em diferentes workloads, para auxiliar na seleção da plataforma mais adequada para um trabalho específico. Este trabalho propõe um benchmark composto por aplicações de diferentes áreas, bem como um framework para o desenvolvimento e avaliação de SPSs distribuídos. / Recently a new application domain characterized by the continuous and low-latency processing of large volumes of data has been gaining attention. The growing number of applications of such genre has led to the creation of Stream Processing Systems (SPSs), systems that abstract the details of real-time applications from the developer. More recently, the ever increasing volumes of data to be processed gave rise to distributed SPSs. Currently there are in the market several distributed SPSs, however the existing benchmarks designed for the evaluation this kind of system covers only a few applications and workloads, while these systems have a much wider set of applications. In this work a benchmark for stream processing systems is proposed. Based on a survey of several papers with real-time and stream applications, the most used applications and areas were outlined, as well as the most used metrics in the performance evaluation of such applications. With these information the metrics of the benchmark were selected as well as a list of possible application to be part of the benchmark. Those passed through a workload characterization in order to select a diverse set of applications. To ease the evaluation of SPSs a framework was created with an API to generalize the application development and collect metrics, with the possibility of extending it to support other platforms in the future. To prove the usefulness of the benchmark, a subset of the applications were executed on Storm and Spark using the Azure Platform and the results have demonstrated the usefulness of the benchmark suite in comparing these systems. Processamento distribuido Processamento : Alto desempenho Distributed systems Benchmark suite Stream processing Real-time processing Big data
508	Distributed data analysis over meteorological datasets using the actor model Sanchez, Jimmy Kraimer Martin Valverde January 2017 (has links) Devido ao contínuo crescimento dos dados científicos nos últimos anos, a análise intensiva de dados nessas quantidades massivas de dados é muito importante para extrair informações valiosas. Por outro lado, o formato de dados científicos GRIB (GRIdded Binary) é amplamente utilizado na comunidade meteorológica para armazenar histórico de dados e previsões meteorológicas. No entanto, as ferramentas atuais disponíveis e métodos para processar arquivos neste formato não realizam o processamento em um ambiente distribuído. Essa situação limita as capacidades de análise dos cientistas que precisam realizar uma avaliação sobre grandes conjuntos de dados com o objetivo de obter informação no menor tempo possível fazendo uso de todos os recursos disponíveis. Neste contexto, este trabalho apresenta uma alternativa ao processamento de dados no formato GRIB usando o padrão Manager-Worker implementado com o modelo de atores fornecido pelo Akka toolkit. Realizamos também uma comparação da nossa proposta com outros mecanismos, como o round-robin, random, balanceamento de carga adaptativo, bem como com um dos principais frameworks para o processamento de grandes quantidades de dados tal como o Apache Spark. A metodologia utilizada considera vários fatores para avaliar o processamento dos arquivos GRIB. Os experimentos foram conduzidos em um cluster na plataforma Microsoft Azure. Os resultados mostram que nossa proposta escala bem à medida que o número de nós aumenta. Assim, nossa proposta atingiu um melhor desempenho em relação aos outros mecanismos utilizados para a comparação, particularmente quando foram utilizadas oito máquinas virtuais para executar as tarefas. Nosso trabalho com o uso de metadados alcançou um ganho de 53.88%, 62.42%, 62.97%, 61.92%, 62.44% e 59.36% em relação aos mecanismos round-robin, random, balanceamento de carga adaptativo que usou métricas CPU, JVM Heap e um combinado de métricas, e o Apache Spark, respectivamente, em um cenário onde um critério de busca é aplicado para selecionar 2 dos 27 parâmetros totais encontrados no conjunto de dados utilizado nos experimentos. / Because of the continuous and overwhelming growth of scientific data in the last few years, data-intensive analysis on this vast amount of scientific data is very important to extract valuable scientific information. The GRIB (GRIdded Binary) scientific data format is widely used within the meteorological community and is used to store historical meteorological data and weather forecast simulation results. However, current libraries to process the GRIB files do not perform the computation in a distributed environment. This situation limits the analytical capabilities of scientists who need to perform analysis on large data sets in order to obtain information in the shortest time possible using of all available resources. In this context, this work presents an alternative to data processing in the GRIB format using the well-know Manager-Worker pattern, which was implemented with the Actor model provided by the Akka toolkit. We also compare our proposal with other mechanisms, such as the round-robin, random and an adaptive load balancing, as well as with one of the main frameworks currently existing for big data processing, Apache Spark. The methodology used considers several factors to evaluate the processing of the GRIB files. The experiments were conducted on a cluster in Microsoft Azure platform. The results show that our proposal scales well as the number of worker nodes increases. Our work reached a better performance in relation to the other mechanisms used for the comparison particularly when eight worker virtual machines were used. Thus, our proposal upon using metadata achieved a gain of 53.88%, 62.42%, 62.97%, 61.92%, 62.44% and 59.36% in relation to the mechanisms: round-robin, random, an adaptive load balancing that used CPU, JVM Heap and mix metrics, and the Apache Spark respectively, in a scenario where a search criteria is applied to select 2 of 27 total parameters found in the dataset used in the experiments. Meteorologia Processamento distribuido Actor model Akka GRIB Manager-Worker Big data
509	Big data a kontrola spojování podniků v EU / Big data and EU merger control Bosáková, Viktória January 2018 (has links) BIG DATA AND EU MERGER CONTROL ABSTRACT The significance of "big data" as a factor in the competitive assessment of mergers in EU has attracted more and more attention in the past years. Today's digital economy revolves around the Internet and information technologies that together enabled collecting and processing previously unimaginable sets of data, high in volume, velocity, variety and value. Data started to present a valuable and important asset to various businesses, mainly active on online platforms. Consequently, companies may engage in strategic mergers in order to acquire profitable data from one another. The aim of this master thesis is to research and analyse whether big data could result in the increased market power of the newly merged company or could have detrimental effects on other competitors present on the market or the competition itself. The main research question therefore is whether big data in its essence could constitute a competitive concern when it comes to data-related mergers. This thesis initially clarifies the concept and characteristics of "big data" in general, whilst demonstrating the increasing significance of data used as assets for businesses in the present digital economy. The research then focuses on what role specific features of data could play in various stages of...
510	Large-Scale Matrix Completion Using Orthogonal Rank-One Matrix Pursuit, Divide-Factor-Combine, and Apache Spark January 2014 (has links) abstract: As the size and scope of valuable datasets has exploded across many industries and fields of research in recent years, an increasingly diverse audience has sought out effective tools for their large-scale data analytics needs. Over this period, machine learning researchers have also been very prolific in designing improved algorithms which are capable of finding the hidden structure within these datasets. As consumers of popular Big Data frameworks have sought to apply and benefit from these improved learning algorithms, the problems encountered with the frameworks have motivated a new generation of Big Data tools to address the shortcomings of the previous generation. One important example of this is the improved performance in the newer tools with the large class of machine learning algorithms which are highly iterative in nature. In this thesis project, I set about to implement a low-rank matrix completion algorithm (as an example of a highly iterative algorithm) within a popular Big Data framework, and to evaluate its performance processing the Netflix Prize dataset. I begin by describing several approaches which I attempted, but which did not perform adequately. These include an implementation of the Singular Value Thresholding (SVT) algorithm within the Apache Mahout framework, which runs on top of the Apache Hadoop MapReduce engine. I then describe an approach which uses the Divide-Factor-Combine (DFC) algorithmic framework to parallelize the state-of-the-art low-rank completion algorithm Orthogoal Rank-One Matrix Pursuit (OR1MP) within the Apache Spark engine. I describe the results of a series of tests running this implementation with the Netflix dataset on clusters of various sizes, with various degrees of parallelism. For these experiments, I utilized the Amazon Elastic Compute Cloud (EC2) web service. In the final analysis, I conclude that the Spark DFC + OR1MP implementation does indeed produce competitive results, in both accuracy and performance. In particular, the Spark implementation performs nearly as well as the MATLAB implementation of OR1MP without any parallelism, and improves performance to a significant degree as the parallelism increases. In addition, the experience demonstrates how Spark's flexible programming model makes it straightforward to implement this parallel and iterative machine learning algorithm. / Dissertation/Thesis / M.S. Computer Science 2014 Computer science Artificial intelligence Big Data Hadoop Machine Learning Mahout Matrix Completion Spark

Search results