Global ETD Search

421	Da identidade dos grupos aos perfis programados: uma possível passagem à luz da teoria da comunicação Picchiai, Daniela de Queiroz 02 December 2014 (has links) Made available in DSpace on 2016-04-26T18:14:45Z (GMT). No. of bitstreams: 1 Daniela de Queiroz Picchiai.pdf: 564041 bytes, checksum: 70762e27f3ed8170f628c4f853bd1639 (MD5) Previous issue date: 2014-12-02 / This dissertation has the main objective investigate the recent changes in the orientation market research that makes increasing use of digital databases as a source of information about consumers. The work begins by analyzing the methodologies of market research already established, of qualitative source, from data collection, sample character, such as questionnaires and interviews, focuses on attitudes and behaviors of consumers, and starting from this creates marketing communication strategies for institutions interested in this information. With the emergence of Big Data, the statement changes. In the evolution of this path, we highlight the increasingly important reflections of philosophers (Foucault, Deleuze) and communication theorists (Martin-Barbero, Jenkins) about our role current in society of control. That must mean, for our study, an increasingly centered on search for patterns of behavior by way of digital data. With the influence of new technologies, the digital universe and the arrival of Big Data, the market research proceeds to analyze the data from environments such as businesses databases or activities on social networks, and check what kind of consumption pattern can be identified. With that, they could theoretically build a communication to induce and direct the actions and consumers choices. As a result, the research compares two formats of action utilized by companies as a strategy for reaching possible consumers: the first, as a base, has qualitative tools and an understanding of the social environment and determined segments; the second considers the enormous dimension of existing techniques in the online environment that influence people's behavior.The work reflects on how business communication strategies are utilized by brands in order to involve and generate recognition from its audiences. As a working methodology, we use the following authors: Harold Lasswell, for identifying the influence of research in advertising communication; Jesús Martín-Barbero, one of the first scholars to understand the mediation between individuals and social environments; Philip Kotler and Gilbert A. Churchill, central authors in the understanding of marketing techniques; Michel Foucault, Felix Guattari and Gilles Deleuze to recognize how the look of Big Data articulated in our society of control. The dissertation considers that communication has a decisive role in influencing individuals' choices and constantly searches for the previous path in this process, in order to recognize and emphasize the variables that lead to these choices / Esta dissertação tem como objetivo principal investigar as recentes mudanças na orientação das pesquisas de mercado, que fazem uso cada vez maior de bases de dados digitais como fonte de informação sobre os consumidores. O trabalho começa analisando as metodologias de pesquisa mercadológica já consolidadas, de natureza qualitativa, que a partir de levantamentos de dados, de caráter amostral, tais como questionários e entrevistas, busca compreender atitudes e comportamentos dos consumidores e, assim, criar estratégias de comunicação mercadológica para as instituições interessadas nessa informação. Com o surgimento das análises ditas de Big Data, o cenário muda. Na evolução desse caminho, destacamos o papel cada vez mais importante das reflexões de filósofos (Foucault, Deleuze) e teóricos da comunicação (Martin-Barbero, Jenkins) sobre nossa atual sociedade de controle. Isso deve significar, para nosso estudo, um caminho cada vez mais centrado na busca por padrões de comportamento através de dados digitais. Com a influência das novas tecnologias, do universo digital e com a chegada do Big Data, a pesquisa de mercado passa a analisar os dados retirados de ambientes tais como as bases de dados das empresas ou de atividades em redes sociais, e verificam quais tipos de padrão de consumo podem ser identificados. Com isso, poderiam em tese construir uma comunicação para induzir e direcionar as ações e escolhas dos consumidores. Com isso, o trabalho compara dois formatos de ação utilizados pelas empresas como estratégia para atingir possíveis consumidores: o primeiro tem como base ferramentas qualitativas e compreensão do ambiente social de determinados segmentos; o segundo considera a gigantesca dimensão de técnicas existentes no ambiente online que influenciam o comportamento das pessoas. O trabalho faz uma reflexão sobre como as estratégias de comunicação são utilizadas pelas empresas para envolver e gerar identificação em seus públicos. Como metodologia de trabalho, utilizamos os seguintes autores: Harold Lasswell, para identificar a influência das pesquisas na comunicação publicitária; Jesús Martín-Barbero, um dos primeiros estudiosos a compreender as mediações entre indivíduos e ambientes sociais; Philip Kotler e Gilbert A. Churchill, autores centrais no conhecimento das técnicas de marketing; Michel Foucault, Felix Guattari e Gilles Deleuze para reconhecer como o olhar do BigData se articula com nossa sociedade de controle. A pesquisa conclui que a comunicação mercadológica tem um papel decisivo na indução das escolhas dos indivíduos e busca constatar o caminho percorrido nesse processo, para assim reconhecer e pontuar os fatores que conduzem a essas escolhas Escolhas Comunicação Pesquisas Subjetividade Dados Choices Communication Research Subjectivity Big data
422	High performance trace replay event simulation of parallel programs behavior / Ferramenta de alto desempenho para análise de comportamento de programas paralelos baseada em rastos de execução Korndorfer, Jonas Henrique Muller January 2016 (has links) Sistemas modernos de alto desempenho compreendem milhares a milhões de unidades de processamento. O desenvolvimento de uma aplicação paralela escalável para tais sistemas depende de um mapeamento preciso da utilização recursos disponíveis. A identificação de recursos não utilizados e os gargalos de processamento requere uma boa análise desempenho. A observação de rastros de execução é uma das técnicas mais úteis para esse fim. Infelizmente, o rastreamento muitas vezes produz grandes arquivos de rastro, atingindo facilmente gigabytes de dados brutos. Portanto ferramentas para análise de desempenho baseadas em rastros precisam processar esses dados para uma forma legível e serem eficientes a fim de permitirem uma análise rápida e útil. A maioria das ferramentas existentes, tais como Vampir, Scalasca e TAU, focam no processamento de formatos de rastro com semântica associada, geralmente definidos para lidar com programas desenvolvidos com bibliotecas populares como OpenMP, MPI e CUDA. No entanto, nem todas aplicações paralelas utilizam essas bibliotecas e assim, algumas vezes, essas ferramentas podem não ser úteis. Felizmente existem outras ferramentas que apresentam uma abordagem mais dinâmica, utilizando um formato de arquivo de rastro aberto e sem semântica específica. Algumas dessas ferramentas são Paraver, Pajé e PajeNG. Por outro lado, ser genérico tem custo e assim tais ferramentas frequentemente apresentam baixo desempenho para o processamento de grandes rastros. O objetivo deste trabalho é apresentar otimizações feitas para o conjunto de ferramentas PajeNG. São apresentados o desenvolvimento de um estratégia de paralelização para o PajeNG e uma análise de desempenho para demonstrar nossos ganhos. O PajeNG original funciona sequencialmente, processando um único arquivo de rastro que contém todos os dados do programa rastreado. Desta forma, a escalabilidade da ferramenta fica muito limitada pela leitura dos dados. Nossa estratégia divide o arquivo em pedaços permitindo seu processamento em paralelo. O método desenvolvido para separar os rastros permite que cada pedaço execute em um fluxo de execução separado. Nossos experimentos foram executados em máquinas com acesso não uniforme à memória (NUMA).Aanálise de desempenho desenvolvida considera vários aspectos como localidade das threads, o número de fluxos, tipo de disco e também comparações entre os nós NUMA. Os resultados obtidos são muito promissores, escalando o PajeNG cerca de oito a onze vezes, dependendo da máquina. / Modern high performance systems comprise thousands to millions of processing units. The development of a scalable parallel application for such systems depends on an accurate mapping of application processes on top of available resources. The identification of unused resources and potential processing bottlenecks requires good performance analysis. The trace-based observation of a parallel program execution is one of the most helpful techniques for such purpose. Unfortunately, tracing often produces large trace files, easily reaching the order of gigabytes of raw data. Therefore tracebased performance analysis tools have to process such data to a human readable way and also should be efficient to allow an useful analysis. Most of the existing tools such as Vampir, Scalasca, TAU have focus on the processing of trace formats with a fixed and well-defined semantic. The corresponding file format are usually proposed to handle applications developed using popular libraries like OpenMP, MPI, and CUDA. However, not all parallel applications use such libraries and so, sometimes, these tools cannot be useful. Fortunately, there are other tools that present a more dynamic approach by using an open trace file format without specific semantic. Some of these tools are the Paraver, Pajé and PajeNG. However the fact of being generic comes with a cost. These tools very frequently present low performance for the processing of large traces. The objective of this work is to present performance optimizations made in the PajeNG tool-set. This comprises the development of a parallelization strategy and a performance analysis to set our gains. The original PajeNG works sequentially by processing a single trace file with all data from the observed application. This way, the scalability of the tool is very limited by the reading of the trace file. Our strategy splits such file to process several pieces in parallel. The created method to split the traces allows the processing of each piece in each thread. The experiments were executed in non-uniform memory access (NUMA) machines. The performance analysis considers several aspects like threads locality, number of flows, disk type and also comparisons between the NUMA nodes. The obtained results are very promising, scaling up the PajeNG about eight to eleven times depending on the machine. Processamento paralelo Processamento : Alto desempenho Parallel application Performance analysis High performance Big data Trace replay
423	Distributed data analysis over meteorological datasets using the actor model Sanchez, Jimmy Kraimer Martin Valverde January 2017 (has links) Devido ao contínuo crescimento dos dados científicos nos últimos anos, a análise intensiva de dados nessas quantidades massivas de dados é muito importante para extrair informações valiosas. Por outro lado, o formato de dados científicos GRIB (GRIdded Binary) é amplamente utilizado na comunidade meteorológica para armazenar histórico de dados e previsões meteorológicas. No entanto, as ferramentas atuais disponíveis e métodos para processar arquivos neste formato não realizam o processamento em um ambiente distribuído. Essa situação limita as capacidades de análise dos cientistas que precisam realizar uma avaliação sobre grandes conjuntos de dados com o objetivo de obter informação no menor tempo possível fazendo uso de todos os recursos disponíveis. Neste contexto, este trabalho apresenta uma alternativa ao processamento de dados no formato GRIB usando o padrão Manager-Worker implementado com o modelo de atores fornecido pelo Akka toolkit. Realizamos também uma comparação da nossa proposta com outros mecanismos, como o round-robin, random, balanceamento de carga adaptativo, bem como com um dos principais frameworks para o processamento de grandes quantidades de dados tal como o Apache Spark. A metodologia utilizada considera vários fatores para avaliar o processamento dos arquivos GRIB. Os experimentos foram conduzidos em um cluster na plataforma Microsoft Azure. Os resultados mostram que nossa proposta escala bem à medida que o número de nós aumenta. Assim, nossa proposta atingiu um melhor desempenho em relação aos outros mecanismos utilizados para a comparação, particularmente quando foram utilizadas oito máquinas virtuais para executar as tarefas. Nosso trabalho com o uso de metadados alcançou um ganho de 53.88%, 62.42%, 62.97%, 61.92%, 62.44% e 59.36% em relação aos mecanismos round-robin, random, balanceamento de carga adaptativo que usou métricas CPU, JVM Heap e um combinado de métricas, e o Apache Spark, respectivamente, em um cenário onde um critério de busca é aplicado para selecionar 2 dos 27 parâmetros totais encontrados no conjunto de dados utilizado nos experimentos. / Because of the continuous and overwhelming growth of scientific data in the last few years, data-intensive analysis on this vast amount of scientific data is very important to extract valuable scientific information. The GRIB (GRIdded Binary) scientific data format is widely used within the meteorological community and is used to store historical meteorological data and weather forecast simulation results. However, current libraries to process the GRIB files do not perform the computation in a distributed environment. This situation limits the analytical capabilities of scientists who need to perform analysis on large data sets in order to obtain information in the shortest time possible using of all available resources. In this context, this work presents an alternative to data processing in the GRIB format using the well-know Manager-Worker pattern, which was implemented with the Actor model provided by the Akka toolkit. We also compare our proposal with other mechanisms, such as the round-robin, random and an adaptive load balancing, as well as with one of the main frameworks currently existing for big data processing, Apache Spark. The methodology used considers several factors to evaluate the processing of the GRIB files. The experiments were conducted on a cluster in Microsoft Azure platform. The results show that our proposal scales well as the number of worker nodes increases. Our work reached a better performance in relation to the other mechanisms used for the comparison particularly when eight worker virtual machines were used. Thus, our proposal upon using metadata achieved a gain of 53.88%, 62.42%, 62.97%, 61.92%, 62.44% and 59.36% in relation to the mechanisms: round-robin, random, an adaptive load balancing that used CPU, JVM Heap and mix metrics, and the Apache Spark respectively, in a scenario where a search criteria is applied to select 2 of 27 total parameters found in the dataset used in the experiments. Meteorologia Processamento distribuido Actor model Akka GRIB Manager-Worker Big data
424	Ensaios em macroeconomia aplicada Costa, Hudson Chaves January 2016 (has links) Esta tese apresenta três ensaios em macroeconomia aplicada e que possuem em comum o uso de técnicas estatísticas e econométricas em problemas macroeconômicos. Dentre os campos de pesquisa da macroeconomia aplicada, a tese faz uso de modelos macroeconômicos microfundamentados, em sua versão DSGE-VAR, e da macroeconomia financeira por meio da avaliação do comportamento da correlação entre os retornos das ações usando modelos Garch multivariados. Além disso, a tese provoca a discussão sobre um novo campo de pesquisa em macroeconomia que surge a partir do advento da tecnologia. No primeiro ensaio, aplicamos a abordagem DSGE-VAR na discussão sobre a reação do Banco Central do Brasil (BCB) as oscilações na taxa de câmbio, especificamente para o caso de uma economia sob metas de inflação. Para tanto, baseando-se no modelo para uma economia aberta desenvolvido por Gali e Monacelli (2005) e modificado por Lubik e Schorfheide (2007), estimamos uma regra de política monetária para o Brasil e examinamos em que medida o BCB responde a mudanças na taxa de câmbio. Além disso, estudamos o grau de má especificação do modelo DSGE proposto. Mais especificamente, comparamos a verossimilhança marginal do modelo DSGE às do modelo DSGE-VAR e examinamos se o Banco Central conseguiu isolar a economia brasileira, em particular a inflação, de choques externos. Nossas conclusões mostram que as respostas aos desvios da taxa de câmbio são diferentes de zero e menores do que as respostas aos desvios da inflação. Finalmente, o ajuste do modelo DSGE é consideravelmente pior do que o ajuste do modelo DSGE-VAR, independentemente do número de defasagens utilizadas no VAR o que indica que de um ponto de vista estatístico existem evidências de que as restrições cruzadas do modelo teórico são violadas nos dados. O segundo ensaio examina empiricamente o comportamento da correlação entre o retorno de ações listadas na BMF&BOVESPA no período de 2000 a 2015. Para tanto, utilizamos modelos GARCH multivariados introduzidos por Bollerslev (1990) para extrair a série temporal das matrizes de correlação condicional dos retornos das ações. Com a série temporal dos maiores autovalores das matrizes de correlação condicional estimadas, aplicamos testes estatísticos (raiz unitária, quebra estrutural e tendência) para verificar a existência de tendência estocástica ou determinística para a intensidade da correlação entre os retornos das ações representadas pelos autovalores. Nossas conclusões confirmam que tanto em períodos de crises nacionais como turbulências internacionais, há intensificação da correlação entre as ações. Contudo, não encontramos qualquer tendência de longo prazo na série temporal dos maiores autovalores das matrizes de correlação condicional. Isso sugere que apesar das conclusões de Costa, Mazzeu e Jr (2016) sobre a tendência de queda do risco idiossincrático no mercado acionário brasileiro, a correlação dos retornos não apresentou tendência de alta, conforme esperado pela teoria de finanças. No terceiro ensaio, apresentamos pesquisas que utilizaram Big Data, Machine Learning e Text Mining em problemas macroeconômicos e discutimos as principais técnicas e tecnologias adotadas bem como aplicamos elas na análise de sentimento do BCB sobre a economia. Por meio de técnicas de Web Scraping e Text Mining, acessamos e extraímos as palavras usadas na escrita das atas divulgadas pelo Comitê de Política Monetária (Copom) no site do BCB. Após isso, comparando tais palavras com um dicionário de sentimentos (Inquider) mantido pela Universidade de Harvard e originalmente apresentado por Stone, Dunphy e Smith (1966), foi possível criar um índice de sentimento para a autoridade monetária. Nossos resultados confirmam que tal abordagem pode contribuir para a avaliação econômica dado que a série temporal do índice proposto está relacionada com variáveis macroeconômicas importantes para as decisões do BCB. / This thesis presents three essays in applied macroeconomics and who have in common the use of statistical and econometric techniques in macroeconomic problems. Among the search fields of applied macroeconomics, the thesis makes use of microfounded macroeconomic models, in tis DSGE-VAR version, and financial macroeconomics through the evaluation of the behavior of correlation between stock returns using multivariate Garch models. In addition, leads a discussion on a new field of research in macroeconomics which arises from the advent of technology. In the first experiment, we applied the approach to dynamic stochastic general equilibrium (DSGE VAR in the discussion about the reaction of the Central Bank of Brazil (CBB) to fluctuations in the exchange rate, specifically for the case of an economy under inflation targeting. To this end, based on the model for an open economy developed by Gali and Monacelli (2005) and modified by Lubik and Schorfheide (2007), we estimate a rule of monetary policy for the United States and examine to what extent the CBC responds to changes in the exchange rate. In addition, we studied the degree of poor specification of the DSGE model proposed. More specifically, we compare the marginal likelihood of the DSGE model to the DSGE-VAR model and examine whether the Central Bank managed to isolate the brazilian economy, in particular the inflation, external shocks. Our findings show that the response to deviations of the exchange rate are different from zero and lower than the response to deviations of inflation. Finally, the adjustment of the DSGE model is considerably worse than the adjustment of the DSGE-VAR model, regardless of the number of lags used in the VAR which indicates that a statistical point of view there is evidence that the restrictions crusades of the theoretical model are violated in the data. The second essay examines empirically the behavior of the correlation between the return of shares listed on the BMF&BOVESPA over the period from 2000 to 2015. To this end, we use models multivariate GARCH introduced by Bollerslev (1990) to remove the temporal series of arrays of conditional correlation of returns of stocks. With the temporal series of the largest eigenvalues of matrices of correlation estimated conditional, we apply statistical tests (unit root, structural breaks and trend) to verify the existence of stochastic trend or deterministic to the intensity of the correlation between the returns of the shares represented by eigenvalues. Our findings confirm that both in times of crises at national and international turbulence, there is greater correlation between the actions. However, we did not find any long-term trend in time series of the largest eigenvalues of matrices of correlation conditional. In the third test, we present research that used Big Data, Machine Learning and Text Mining in macroeconomic problems and discuss the main techniques and technologies adopted and apply them in the analysis of feeling of BCB on the economy. Through techniques of Web Scraping and Text Mining, we accessed and extracted the words used in the writing of the minutes released by the Monetary Policy Committee (Copom) on the site of the BCB. After that, comparing these words with a dictionary of feelings (Inquider) maintained by Harvard University and originally presented by Stone, Dunphy and Smith (1966), it was possible to create an index of sentiment for the monetary authority. Our results confirm that such an approach can contribute to the economic assessment given that the temporal series of the index proposed is related with macroeconomic variables are important for decisions of the BCB. Macroeconomia Taxa de câmbio Política monetária DSGE-VAR Idiosyncratic risk Multivariate GARCH Big Data Machine learning
425	A benchmark suite for distributed stream processing systems / Um benchmark suite para sistemas distribuídos de stream processing Bordin, Maycon Viana January 2017 (has links) Um dado por si só não possui valor algum, a menos que ele seja interpretado, contextualizado e agregado com outros dados, para então possuir valor, tornando-o uma informação. Em algumas classes de aplicações o valor não está apenas na informação, mas também na velocidade com que essa informação é obtida. As negociações de alta frequência (NAF) são um bom exemplo onde a lucratividade é diretamente proporcional a latência (LOVELESS; STOIKOV; WAEBER, 2013). Com a evolução do hardware e de ferramentas de processamento de dados diversas aplicações que antes levavam horas para produzir resultados, hoje precisam produzir resultados em questão de minutos ou segundos (BARLOW, 2013). Este tipo de aplicação tem como característica, além da necessidade de processamento em tempo-real ou quase real, a ingestão contínua de grandes e ilimitadas quantidades de dados na forma de tuplas ou eventos. A crescente demanda por aplicações com esses requisitos levou a criação de sistemas que disponibilizam um modelo de programação que abstrai detalhes como escalonamento, tolerância a falhas, processamento e otimização de consultas. Estes sistemas são conhecidos como Stream Processing Systems (SPS), Data Stream Management Systems (DSMS) (CHAKRAVARTHY, 2009) ou Stream Processing Engines (SPE) (ABADI et al., 2005). Ultimamente estes sistemas adotaram uma arquitetura distribuída como forma de lidar com as quantidades cada vez maiores de dados (ZAHARIA et al., 2012). Entre estes sistemas estão S4, Storm, Spark Streaming, Flink Streaming e mais recentemente Samza e Apache Beam. Estes sistemas modelam o processamento de dados através de um grafo de fluxo com vértices representando os operadores e as arestas representando os data streams. Mas as similaridades não vão muito além disso, pois cada sistema possui suas particularidades com relação aos mecanismos de tolerância e recuperação a falhas, escalonamento e paralelismo de operadores, e padrões de comunicação. Neste senário seria útil possuir uma ferramenta para a comparação destes sistemas em diferentes workloads, para auxiliar na seleção da plataforma mais adequada para um trabalho específico. Este trabalho propõe um benchmark composto por aplicações de diferentes áreas, bem como um framework para o desenvolvimento e avaliação de SPSs distribuídos. / Recently a new application domain characterized by the continuous and low-latency processing of large volumes of data has been gaining attention. The growing number of applications of such genre has led to the creation of Stream Processing Systems (SPSs), systems that abstract the details of real-time applications from the developer. More recently, the ever increasing volumes of data to be processed gave rise to distributed SPSs. Currently there are in the market several distributed SPSs, however the existing benchmarks designed for the evaluation this kind of system covers only a few applications and workloads, while these systems have a much wider set of applications. In this work a benchmark for stream processing systems is proposed. Based on a survey of several papers with real-time and stream applications, the most used applications and areas were outlined, as well as the most used metrics in the performance evaluation of such applications. With these information the metrics of the benchmark were selected as well as a list of possible application to be part of the benchmark. Those passed through a workload characterization in order to select a diverse set of applications. To ease the evaluation of SPSs a framework was created with an API to generalize the application development and collect metrics, with the possibility of extending it to support other platforms in the future. To prove the usefulness of the benchmark, a subset of the applications were executed on Storm and Spark using the Azure Platform and the results have demonstrated the usefulness of the benchmark suite in comparing these systems. Processamento distribuido Processamento : Alto desempenho Distributed systems Benchmark suite Stream processing Real-time processing Big data
426	Chromosome 3D Structure Modeling and New Approaches For General Statistical Inference Rongrong Zhang (5930474) 03 January 2019 (has links) <div>This thesis consists of two separate topics, which include the use of piecewise helical models for the inference of 3D spatial organizations of chromosomes and new approaches for general statistical inference. The recently developed Hi-C technology enables a genome-wide view of chromosome</div><div>spatial organizations, and has shed deep insights into genome structure and genome function. However, multiple sources of uncertainties make downstream data analysis and interpretation challenging. Specically, statistical models for inferring three-dimensional (3D) chromosomal structure from Hi-C data are far from their maturity. Most existing methods are highly over-parameterized, lacking clear interpretations, and sensitive to outliers. We propose a parsimonious, easy to interpret, and robust piecewise helical curve model for the inference of 3D chromosomal structures</div><div>from Hi-C data, for both individual topologically associated domains and whole chromosomes. When applied to a real Hi-C dataset, the piecewise helical model not only achieves much better model tting than existing models, but also reveals that geometric properties of chromatin spatial organization are closely related to genome function.</div><div><br></div><div><div>For potential applications in big data analytics and machine learning, we propose to use deep neural networks to automate the Bayesian model selection and parameter estimation procedures. Two such frameworks are developed under different scenarios. First, we construct a deep neural network-based Bayes estimator for the parameters of a given model. The neural Bayes estimator mitigates the computational challenges faced by traditional approaches for computing Bayes estimators. When applied to the generalized linear mixed models, the neural Bayes estimator</div><div>outperforms existing methods implemented in R packages and SAS procedures. Second, we construct a deep convolutional neural networks-based framework to perform</div><div>simultaneous Bayesian model selection and parameter estimation. We refer to the neural networks for model selection and parameter estimation in the framework as the</div><div>neural model selector and parameter estimator, respectively, which can be properly trained using labeled data systematically generated from candidate models. Simulation</div><div>study shows that both the neural selector and estimator demonstrate excellent performances.</div></div><div><br></div><div><div>The theory of Conditional Inferential Models (CIMs) has been introduced to combine information for efficient inference in the Inferential Models framework for priorfree</div><div>and yet valid probabilistic inference. While the general theory is subject to further development, the so-called regular CIMs are simple. We establish and prove a</div><div>necessary and sucient condition for the existence and identication of regular CIMs. More specically, it is shown that for inference based on a sample from continuous</div><div>distributions with unknown parameters, the corresponding CIM is regular if and only if the unknown parameters are generalized location and scale parameters, indexing</div><div>the transformations of an affine group.</div></div> Statistics big data analytics machine learning deep neural networks bayesian model selection
427	Efficient Matrix-aware Relational Query Processing in Big Data Systems Yongyang Yu (5930462) 03 January 2019 (has links) <div>In the big data era, the use of large-scale machine learning methods is becoming ubiquitous in data exploration tasks ranging from business intelligence and bioinformatics to self-driving cars. In these domains, a number of queries are composed of various kinds of operators, such as relational operators for preprocessing input data, and machine learning models for complex analysis. Usually, these learning methods heavily rely on matrix computations. As a result, it is imperative to develop novel query processing approaches and systems that are aware of big matrix data and corresponding operators, scale to clusters of hundreds of machines, and leverage distributed memory for high-performance computation. This dissertation introduces and studies several matrix-aware relational query processing strategies, analyzes and optimizes their performance.</div><div><br></div><div><div>The first contribution of this dissertation is MatFast, a matrix computation system for efficiently processing and optimizing matrix-only queries in a distributed in-memory environment. We introduce a set of heuristic rules to rewrite special features of a matrix query for less memory footprint, and cost models to estimate the sparsity of sparse matrix multiplications, and to distribute the matrix data partitions among various compute workers for a communication-efficient execution. We implement and test the query processing strategies in an open-source distributed dataflow</div><div>engine (Apache Spark).</div></div><div><br></div><div><div>In the second contribution of this dissertation, we extend MatFast to MatRel, where we study how to efficiently process queries that involve both matrix and relational operators. We identify a series of equivalent transformation rules to rewrite a logical plan when both relational and matrix operations are present. We introduce selection, projection, aggregation, and join operators over matrix data, and propose optimizations to reduce computation overhead. We also design a cost model to distribute matrix data among various compute workers for communication-efficient</div><div>evaluation of relational join operations.</div></div><div><br></div><div><div>In the third and last contribution of this dissertation, we demonstrate how to leverage MatRel for optimizing complex matrix-aware relational query evaluation pipelines. Especially, we showcase how to efficiently learn model parameters for deep neural networks of various applications with MatRel, e.g., Word2Vec.</div></div> Applied Computer Science big data query optimization matrix computation distributed computing
428	O uso das informações de big data na gestão de crise de marca / The use of big data information in the brand crisis managemant Salvador, Alexandre Borba 06 August 2015 (has links) As crises de marca não só experimentam um crescimento em quantidade como também passam a ter sua visibilidade aumentada pelas redes sociais. A repercussão de uma crise de imagem de marca afeta negativamente tanto o brand equity como as vendas no curto prazo. Mais do que isso, gera custosas campanhas para minimização dos efeitos negativos. Se por um lado o avanço tecnológico aumenta a visibilidade da crise, por outro, possibilita acesso a uma série de informações, internas e externas, que podem ajudar na definição de um plano de ação. Big Data é um termo recentemente criado para designar o crescimento das informações, grandes em volume, diversificadas em formato e recebidas em alta velocidade. No ambiente de marketing, o sistema de informação de marketing (SIM) tem por objetivo fornecer as informações ao tomador de decisão de marketing. Informação relevante, confiável e disponibilizada em um curto espaço de tempo é fundamental para que as decisões sejam tomadas rapidamente, garantindo a liderança do processo de gestão de crise. A partir da pergunta \"qual o uso das informações provenientes do big data na gestão de crise de marca?\" e com o objetivo de \"verificar como gestores fazem uso das informações provenientes de big data na gestão de crise\", elaborou-se este estudo exploratório, empírico, qualitativo e com uso de entrevistas em profundidade com executivos de marketing com experiência em gestão de crise de marca. As entrevistas com seis gestores com experiência em crise e dois especialistas possibilitaram verificar uma grande diferença no uso das informações de big data na gestão de crises de marca, nas diferentes etapas da crise identificadas no referencial teórico: identificação e prevenção, gestão da crise, recuperação e melhorias e aprendizados. / The brand crises not only experience a growth in quantity but also now have increased their visibility through social networks. The impact of a brand crisis negatively affects the brand equity and short-term sales. It also generates costly campaigns to minimize its negative effect. If technological advancement increases the visibility of the crisis, it also provides access to a wealth of internal and external information that can help define an action plan. The term Big Data refers to the growth of information volume, diversification of formats and production and reception in real time, so that a traditional processing system could not store and analyze them. In the marketing environment, marketing information system (MIS) aims to provide information to the marketing decision maker. Relevance, reliability and availability of information is critical for the decision process. It also could ensures the leadership of the crisis management process. From the question \"what is the use of information from big data in brand crisis management?\" and in order to \"verify how managers make use of information from big data in crisis management\", this study exploratory empirical, qualitative, using interviews with marketing executives with experience in brand crisis management was elaborated. Interviews with six managers with experience in crisis and two experts made it possible to verify a big difference in the use of big data information on brand management of crises in the different stages of the crisis identified in the theoretical framework: identification and prevention, crisis management, recovery and improvement and learning. Big data Brand Informação Information Information system Marcas Opinião pública Public opinion Sistema de informação em marketing
429	Developing a data quality scorecard that measures data quality in a data warehouse Grillo, Aderibigbe January 2018 (has links) The main purpose of this thesis is to develop a data quality scorecard (DQS) that aligns the data quality needs of the Data warehouse stakeholder group with selected data quality dimensions. To comprehend the research domain, a general and systematic literature review (SLR) was carried out, after which the research scope was established. Using Design Science Research (DSR) as the methodology to structure the research, three iterations were carried out to achieve the research aim highlighted in this thesis. In the first iteration, as DSR was used as a paradigm, the artefact was build from the results of the general and systematic literature review conduct. A data quality scorecard (DQS) was conceptualised. The result of the SLR and the recommendations for designing an effective scorecard provided the input for the development of the DQS. Using a System Usability Scale (SUS), to validate the usability of the DQS, the results of the first iteration suggest that the DW stakeholders found the DQS useful. The second iteration was conducted to further evaluate the DQS through a run through in the FMCG domain and then conducting a semi-structured interview. The thematic analysis of the semi-structured interviews demonstrated that the stakeholder's participants' found the DQS to be transparent; an additional reporting tool; Integrates; easy to use; consistent; and increases confidence in the data. However, the timeliness data dimension was found to be redundant, necessitating a modification to the DQS. The third iteration was conducted with similar steps as the second iteration but with the modified DQS in the oil and gas domain. The results from the third iteration suggest that DQS is a useful tool that is easy to use on a daily basis. The research contributes to theory by demonstrating a novel approach to DQS design This was achieved by ensuring the design of the DQS aligns with the data quality concern areas of the DW stakeholders and the data quality dimensions. Further, this research lay a good foundation for the future by establishing a DQS model that can be used as a base for further development.
430	Performance assessment of Apache Spark applications AL Jorani, Salam January 2019 (has links) This thesis addresses the challenges of large software and data-intensive systems. We will discuss a Big Data software that consists of quite a bit of Linux configuration, some Scala coding and a set of frameworks that work together to achieve the smooth performance of the system. Moreover, the thesis focuses on the Apache Spark framework and the challenging of measuring the lazy evaluation of the transformation operations of Spark. Investigating the challenges are essential for the performance engineers to increase their ability to study how the system behaves and take decisions in early design iteration. Thus, we made some experiments and measurements to achieve this goal. In addition to that, and after analyzing the result we could create a formula that will be useful for the engineers to predict the performance of the system in production. Big Data Apache Spark BigBlu Lazy evaluation of Spark Computer Sciences Datavetenskap (datalogi)

Search results