Global ETD Search

481	Análisis exploratorio del rol del CFO y el Big Data en Chile Alfaro Carrasco, Matías 08 1900 (has links) Seminario para optar al título de Ingeniero Comercial, Mención Administración / Este estudio busca reconocer las oportunidades y desafíos, que implica el Big Data, en la industria financiera y en el Rol de directores financieros (CFOs) y sus colaboradores en las empresas Chilenas. Los directivos de hoy se ajustan a un rol cambiante dentro de las empresas, dónde este nuevo rol puede ser apoyado por las herramientas del Big Data y una agenda corporativa que lo soporte. En este contexto, el rol del CFO debe ser estudiado desde una nueva perspectiva, vinculándolo a una función más comercial pero manteniendo la expertise técnica, todo bajo un creciente ecosistema de abundancia de información. Reconocer desafíos y oportunidades, tanto para la industria financiera, cómo para los CFOs, puede resultar clave en la consecución de un cambio de agenda corporativa que soporte al Big Data y el nuevo rol del CFO. El documento representa un punto de partida, hacia una “plataforma de descubrimiento” que posibilite mejoras en los modelos de riesgo, el desarrollo de propuestas de valor o modelos de negocio basados en datos y la disminución de los costos operativos, al permitir automatizar procesos, producto de la implementación de una agenda corporativa, que soporte el Big Data y permita a los Directivos, encontrar la forma de ejecutar este cambio. Algunos de los beneficios de la implementación de una estrategia Big Data para los CFOs, podrían ser; una toma de decisiones más veloz, con nuevos puntos de vista. Igualmente, proveer un mejor soporte, gestión y mitigación de los riesgos en la empresa. También, mejorar el modelo de negocios existente, seleccionando indicadores clave de la performance que se conecten con una ejecución efectiva de la estrategia. En los resultados exploratorios destacan la percepción positiva que otorgan los participantes, la mayoría de grandes empresas y con un título universitario y postgrado, al rol del Big Data en apoyar la labor del CFO, dando el punta pie inicial para cambios en la agenda corporativa de las empresas, cambios en su propuesta de valor y estrategia, para aprovechar al máximo el valor que el Big Data puede aportar a sus organizaciones. Financiamiento de empresas Negocios--Aspectos económicos Big Data Rol de directores financieros (CFO) Administración
482	Modélisation et apprentissage de dépendances á l’aide de copules dans les modéles probabilistes latents / Modeling and learning dependencies with copulas in latent topic models Amoualian, Hesam 12 December 2017 (has links) Ce travail de thése a pour objectif de s’intéresser à une classe de modèles hiérarchiques bayesiens, appelés topic models, servant à modéliser de grands corpus de documents et ceci en particulier dans le cas où ces documents arrivent séquentiellement. Pour cela, nous introduisons au Chapitre 3, trois nouveaux modèles prenant en compte les dépendances entre les thèmes relatifs à chaque document pour deux documents successifs. Le premier modèle s’avère être une généralisation directe du modèle LDA (Latent Dirichlet Allocation). On utilise une loi de Dirichlet pour prendre en compte l’influence sur un document des paramètres relatifs aux thèmes sous jacents du document précédent. Le deuxième modèle utilise les copules, outil générique servant à modéliser les dépendances entre variables aléatoires. La famille de copules utilisée est la famille des copules Archimédiens et plus précisément la famille des copules de Franck qui vérifient de bonnes propriétés (symétrie, associativité) et qui sont donc adaptés à la modélisation de variables échangeables. Enfin le dernier modèle est une extension non paramétrique du deuxième. On intègre cette fois ci lescopules dans la construction stick-breaking des Processus de Dirichlet Hiérarchique (HDP). Nos expériences numériques, réalisées sur cinq collections standard, mettent en évidence les performances de notre approche, par rapport aux approches existantes dans la littérature comme les dynamic topic models, le temporal LDA et les Evolving Hierarchical Processes, et ceci à la fois sur le plan de la perplexité et en terme de performances lorsqu’on cherche à détecter des thèmes similaires dans des flux de documents. Notre approche, comparée aux autres, se révèle être capable de modéliser un plus grand nombre de situations allant d’une dépendance forte entre les documents à une totale indépendance. Par ailleurs, l’hypothèse d’échangeabilité sous jacente à tous les topics models du type du LDA amène souvent à estimer des thèmes différents pour des mots relevant pourtant du même segment de phrase ce qui n’est pas cohérent. Dans le Chapitre 4, nous introduisons le copulaLDA (copLDA), qui généralise le LDA en intégrant la structure du texte dans le modèle of the text et de relaxer l’hypothèse d’indépendance conditionnelle. Pour cela, nous supposons que les groupes de mots dans un texte sont reliés thématiquement entre eux. Nous modélisons cette dépendance avec les copules. Nous montrons de manièreempirique l’efficacité du modèle copLDA pour effectuer à la fois des tâches de natureintrinsèque et extrinsèque sur différents corpus accessibles publiquement. Pour compléter le modèle précédent (copLDA), le chapitre 5 présente un modèle de type LDA qui génére des segments dont les thèmes sont cohérents à l’intérieur de chaque document en faisant de manière simultanée la segmentation des documents et l’affectation des thèmes à chaque mot. La cohérence entre les différents thèmes internes à chaque groupe de mots est assurée grâce aux copules qui relient les thèmes entre eux. De plus ce modèle s’appuie tout à la fois sur des distributions spécifiques pour les thèmes reliés à chaque document et à chaque groupe de mots, ceci permettant de capturer les différents degrés de granularité. Nous montrons que le modèle proposé généralise naturellement plusieurs modèles de type LDA qui ont été introduits pour des tâches similaires. Par ailleurs nos expériences, effectuées sur six bases de données différentes mettent en évidence les performances de notre modèle mesurée de différentes manières : à l’aide de la perplexité, de la Pointwise Mutual Information Normalisée, qui capture la cohérence entre les thèmes et la mesure Micro F1 measure utilisée en classification de texte. / This thesis focuses on scaling latent topic models for big data collections, especiallywhen document streams. Although the main goal of probabilistic modeling is to find word topics, an equally interesting objective is to examine topic evolutions and transitions. To accomplish this task, we propose in Chapter 3, three new models for modeling topic and word-topic dependencies between consecutive documents in document streams. The first model is a direct extension of Latent Dirichlet Allocation model (LDA) and makes use of a Dirichlet distribution to balance the influence of the LDA prior parameters with respect to topic and word-topic distributions of the previous document. The second extension makes use of copulas, which constitute a generic tool to model dependencies between random variables. We rely here on Archimedean copulas, and more precisely on Franck copula, as they are symmetric and associative and are thus appropriate for exchangeable random variables. Lastly, the third model is a non-parametric extension of the second one through the integration of copulas in the stick-breaking construction of Hierarchical Dirichlet Processes (HDP). Our experiments, conducted on five standard collections that have been used in several studies on topic modeling, show that our proposals outperform previous ones, as dynamic topic models, temporal LDA and the Evolving Hierarchical Processes,both in terms of perplexity and for tracking similar topics in document streams. Compared to previous proposals, our models have extra flexibility and can adapt to situations where there are no dependencies between the documents.On the other hand, the "Exchangeability" assumption in topic models like LDA oftenresults in inferring inconsistent topics for the words of text spans like noun-phrases, which are usually expected to be topically coherent. In Chapter 4, we propose copulaLDA (copLDA), that extends LDA by integrating part of the text structure to the model and relaxes the conditional independence assumption between the word-specific latent topics given the per-document topic distributions. To this end, we assume that the words of text spans like noun-phrases are topically bound and we model this dependence with copulas. We demonstrate empirically the effectiveness of copLDA on both intrinsic and extrinsic evaluation tasks on several publicly available corpora. To complete the previous model (copLDA), Chapter 5 presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine-grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification. Latente L'échelle des modèles Collections de données et les flux Scaling Latent Topic/Class Models Big Data 004
483	Signal Detection of Adverse Drug Reaction using the Adverse Event Reporting System: Literature Review and Novel Methods Pham, Minh H. 29 March 2018 (has links) One of the objectives of the U.S. Food and Drug Administration is to protect the public health through post-marketing drug safety surveillance, also known as Pharmacovigilance. An inexpensive and efficient method to inspect post-marketing drug safety is to use data mining algorithms on electronic health records to discover associations between drugs and adverse events. The purpose of this study is two-fold. First, we review the methods and algorithms proposed in the literature for identifying association drug interactions to an adverse event and discuss their advantages and drawbacks. Second, we attempt to adapt some novel methods that have been used in comparable problems such as the genome-wide association studies and the market-basket problems. Most of the common methods in the drug-adverse event problem have univariate structure and thus are vulnerable to give false positive when certain drugs are usually co-prescribed. Therefore, we will study applicability of multivariate methods in the literature such as Logistic Regression and Regression-adjusted Gamma-Poisson Shrinkage Model for the association studies. We also adopted Random Forest and Monte Carlo Logic Regression from the genome-wide association study to our problem because of their ability to detect inherent interactions. We have built a computer program for the Regression-adjusted Gamma Poisson Shrinkage model, which was proposed by DuMouchel in 2013 but has not been made available in any public software package. A comparison study between popular methods and the proposed new methods is presented in this study. Association Study Big Data Pharmacovigilance Data Mining Statistical Algorithms Bioinformatics Medicinal Chemistry and Pharmaceutics Statistics and Probability
484	Interactive Analytics and Visualization for Data Driven Calculation of Individualized COPD Risk Arkstål, Emil January 2018 (has links) Chronic obstructive pulmonary disease (COPD) is a high mortality disease, second to stroke and ischemic heart disease. This non-curable disease progressively exacerbates, leading to high personal and societal economic impact, reduced quality of life and often death. General treatment plans for COPD risk mistreating the individuals’ condition. To be effective, the treatment should be individualized following the practices of precision medicine. The aim of this thesis was to develop a data driven algorithm and system with visualization to assess individual COPD risk. With MRI body composition profile measurements, it is possible to accurately assess propensity of a multitude of metabolic conditions, such as coronary heart disease and type 2 diabetes. The algorithm and system has been developed using Wolfram Language and R within the Wolfram Mathematica framework. The algorithm calculates individualized virtual control groups metabolically similar to the patient’s body composition and spirometric profile. Using UK Biobank data, our tool was used to assess patient COPD propensity using an individual-specific virtual control group with AUROC 0.778 (female) and 0.758 (men). Additionally, the tool was used to identify new body composition profiles related to COPD and associated comorbid conditions. COPD Visualization Big Data Virtual Control Group Pharmaceutical Biotechnology Läkemedelsbioteknik Human Computer Interaction
485	Big data e o Sistema Único de Saúde no Brasil: uma sugestão para solução de problemas Fraga, Joana Azevedo 28 November 2016 (has links) Submitted by Joana Fraga (joanafr1@gmail.com) on 2017-07-19T15:41:48Z No. of bitstreams: 1 Dissertação Joana Azevedo Fraga.pdf: 1747417 bytes, checksum: 9ec02ab6496c17b49e88107d66852597 (MD5) / Approved for entry into archive by Vania Magalhaes (magal@ufba.br) on 2017-07-20T11:43:54Z (GMT) No. of bitstreams: 1 Dissertação Joana Azevedo Fraga.pdf: 1747417 bytes, checksum: 9ec02ab6496c17b49e88107d66852597 (MD5) / Made available in DSpace on 2017-07-20T11:43:54Z (GMT). No. of bitstreams: 1 Dissertação Joana Azevedo Fraga.pdf: 1747417 bytes, checksum: 9ec02ab6496c17b49e88107d66852597 (MD5) / Capes / Esta dissertação traz uma discussão sobre a utilização de tecnologias, apresentando o conceito de Economia da Saúde e Sistemas de Inovação e da ferramenta Big Data, oriunda do meio das tecnologias de informação, analisando sua possível importância econômica para o Sistema Único de Saúde. O Sistema Único de Saúde (SUS) do Brasil é apresentado, sendo mostrada sua estrutura e principais desafios e objetivos para gestão. Dessa forma, são relacionados os possíveis benefícios gerados pela utilização de Big Data para os serviços de saúde e gestão do sistema, tendo forte caráter inovativo. / This dissertation brings a discussion about the use of technologies, presenting the concept of Health Economics and Innovation Systems and the Big Data tool, coming from the information technology medium, analyzing its possible economic importance for the Unified Health System. (SUS) of Brazil is presented, showing its structure and main challenges and objectives for management. In this way, the possible benefits generated by the use of Big Data for the health and management services of the system are considered, having a strong innovative character. Economia dos recursos humanos Sistema Único de Saúde Economia da saúde Big data Saúde - Inovação tecnológica
486	Big data e concorrência: uma avaliação dos impactos da exploração de big data para o método antitruste tradicional de análise de concentrações econômicas Monteiro, Gabriela Reis Paiva January 2017 (has links) Submitted by Gabriela Reis Paiva Monteiro (gmonteiro@fgvmail.br) on 2018-02-20T21:11:14Z No. of bitstreams: 1 Dissertação_Aluna_Gabriela Reis Paiva Monteiro_Mestrado em Direito da Regulação_19.02.2018.pdf: 1599058 bytes, checksum: 501ba3c9cc74f0a69bef579348904595 (MD5) / Approved for entry into archive by Diego Andrade (diego.andrade@fgv.br) on 2018-02-21T12:40:01Z (GMT) No. of bitstreams: 1 Dissertação_Aluna_Gabriela Reis Paiva Monteiro_Mestrado em Direito da Regulação_19.02.2018.pdf: 1599058 bytes, checksum: 501ba3c9cc74f0a69bef579348904595 (MD5) / Made available in DSpace on 2018-03-02T13:37:48Z (GMT). No. of bitstreams: 1 Dissertação_Aluna_Gabriela Reis Paiva Monteiro_Mestrado em Direito da Regulação_19.02.2018.pdf: 1599058 bytes, checksum: 501ba3c9cc74f0a69bef579348904595 (MD5) Previous issue date: 2018-02-02 / Uma característica de mercados digitais é a geração e análise de uma “enxurrada” de dados, o que tem sido considerado um elemento chave de muitos negócios que emergem no cenário da “Internet das Coisas”. O termo big data reflete essa tendência de coletar, adquirir, armazenar e processar grandes volumes de dados digitais para criar valor econômico. Os modelos de negócio das plataformas online frequentemente se baseiam na exploração de dados, em particular os de natureza pessoal, que são usados como insumo para melhorar e personalizar os serviços ou produtos que oferecem. Até recentemente, as autoridades antitrustes ainda não haviam se debruçado completamente sobre as implicações do uso de big data para uma política de defesa da concorrência, mas essa situação tem se modificado com o surgimento de discussões sobre as preocupações anticompetitivas suscitadas pela exploração dessa capacidade. Dessa forma, esta dissertação buscou investigar se e em que medida a exploração de big data em mercados digitais pode ser considerada uma vantagem comparativa que suscita riscos anticompetitivos e, nesse caso, como a análise dessa variável competitiva pode ser incorporada ao método antitruste tradicional para o controle de estruturas. Esta investigação identificou que, em determinadas situações, a capacidade de big data pode representar relevante vantagem competitiva, gerando diversas preocupações concorrenciais no contexto de concentrações econômicas. De forma geral, essas preocupações anticompetitivas podem ser analisadas dentro do escopo das etapas do método antitruste clássico, não se verificando, neste momento, a necessidade de um novo arcabouço metodológico que seja especificamente aplicável ao exame de operações envolvendo agentes econômicos cujos modelos de negócio se baseiem preponderantemente em dados. Não obstante, determinadas ferramentas desse método precisarão ser adaptadas ou alargadas pela autoridade concorrencial brasileira, principalmente para que sejam levadas em consideração outras dimensões competitivas não relacionadas a preço, como qualidade, inovação e privacidade, bem como particularidades do big data e do ecossistema de sua exploração na avaliação dos efeitos e das eficiências da operação, assim como de eventuais remédios / A feature of digital markets is the generation and analysis of a “torrent” of data which is being considered as a key aspect of many business emerging in the context of the “Internet of Things”. The word big data reflects this trend towards collecting, acquiring, storing and processing great volumes of digital data to create economic value. Online platforms’ business models are frequently based on exploiting data, in particular personal data, which are used as an input to improve and personalize services and product that they offer. Until recently, antitrust authorities had not carefully analyzed the impacts of the use of big to a competition policy, but this situation has been changing with the emergence of discussions about anticompetitive concerns raised by the exploitation of this capacity. In light of this, this work aimed at investigating if and to which extent the exploitation of big data in digital markets may be considered a comparative advantage that raises antitrust risks and, in this case, how an analysis of this competitive variable should be incorporated in the traditional antitrust approach to mergers and acquisitions. This investigation identified that, under certain conditions, big data capacity may result in a relevant competitive advantage, giving raise to anticompetitive concerns in the context of mergers and acquisitions. In a general manner, these concerns may be analyzed within the scope of the phases of the classic antitrust method, and there is no need, at this moment, for a new methodologic framework specifically applicable to the analysis of transactions involving firms which business models are preponderantly based on the use of data. Notwithstanding, certain tools might need to be adapted or enlarged by the Brazilian antitrust authority, mainly to take into account non-price competition dimensions, such as quality, innovation and privacy, as well as particular features of big data and its ecosystem in the assessment of the transaction’s effects and efficiencies, as well as potential remedies Dados Privacidade Método Metodologia Antitruste Concorrência CADE Big data Data analytics Direito
487	Efficient Bandwidth Reservation Strategies for Data Movements on High Performance Networks Zuo, Liudong 01 August 2015 (has links) Many next-generation e-science applications require fast and reliable transfer of large volumes of data, now frequently termed as ``big data", with guaranteed performance, which is typically enabled by the bandwidth reservation service in high-performance networks (HPNs). Users normally specify the properties and requirements of their data transfers in the bandwidth reservation requests (BRRs), and want to make bandwidth reservations on the HPNs to satisfy the requirements of their data transfers. The challenges of the bandwidth reservation arise from the requirements desired by both the users and the bandwidth reservation service providers of the HPNs. We focus on two important bandwidth reservation problems formulated from the combinations of the requirements from both users and the bandwidth reservation service providers of the HPNs: (i) Problem of scheduling all BRRs in one batch while achieving their best average data transfer earliest completion time and shortest duration, and (ii) Problem of scheduling two generic types of BRRs concerning data transfer reliability with different objectives and constraints in unreliable HPNs that are subject to node and link failures. We prove the two subproblems of the first problem are NP-complete problems, and fast and efficient heuristic algorithms are proposed. While the two subproblems of the second problem can be optimally solved in polynomial time. The corresponding optimal algorithms and proofs are given. We conduct extensive simulations to compare the performance of the proposed heuristic and optimal algorithms with naive scheduling algorithms and the algorithms currently used in production network in various performance metrics. The performance superiority of the proposed heuristic and optimal algorithms is verified. Bandwidth reservation Bandwidth scheduling Big data Dynamic provisioning High-performance networks
488	CrowdCloud: Combining Crowdsourcing with Cloud Computing for SLO Driven Big Data Analysis Flatt, Taylor 01 December 2017 (has links) The evolution of structured data from simple rows and columns on a spreadsheet to more complex unstructured data such as tweets, videos, voice, and others, has resulted in a need for more adaptive analytical platforms. It is estimated that upwards of 80% of data on the Internet today is unstructured. There is a drastic need for crowdsourcing platforms to perform better in the wake of the tsunami of data. We investigated the employment of a monitoring service which would allow the system take corrective action in the event the results were trending in away from meeting the accuracy, budget, and time SLOs. Initial implementation and system validation has shown that taking corrective action generally leads to a better success rate of reaching the SLOs. Having a system which can dynamically adjust internal parameters in order to perform better can lead to more harmonious interactions between humans and machine algorithms and lead to more efficient use of resources. Big Data Analytics Crowdsourcing Data Analytics Human Augmented Computing Microtask Service level objectives
489	Apprentissage statistique : application au trafic routier à partir de données structurées et aux données massives / Machine learning : Application to road traffic as structured data and to Big Data Guillouet, Brendan 18 November 2016 (has links) Cette thèse s'intéresse à l'apprentissage pour données massives. On considère en premier lieu, des trajectoires définies par des séquences de géolocalisations. Une nouvelle mesure de distance entre trajectoires (Symmetrized Segment-Path Distance) permet d'identifier par classification hiérarchique des groupes de trajectoires, modélisés ensuite par des mélanges gaussiens décrivant les déplacements par zones. Cette modélisation est utilisée de façon générique pour résoudre plusieurs types de problèmes liés aux trafic routier : prévision de la destination finale d'une trajectoire, temps d'arrivée à destination, prochaine zone de localisation. Les exemples analysés montrent que le modèle proposé s'applique à des environnements routiers différents et, qu'une fois appris, il s'applique à des trajectoires aux propriétés spatiales et temporelles différentes. En deuxième lieu, les environnements technologiques d'apprentissage pour données massives sont comparés sur des cas d'usage industriels. / This thesis focuses on machine learning techniques for application to big data. We first consider trajectories defined as sequences of geolocalized data. A hierarchical clustering is then applied on a new distance between trajectories (Symmetrized Segment-Path Distance) producing groups of trajectories which are then modeled with Gaussian mixture in order to describe individual movements. This modeling can be used in a generic way in order to resolve the following problems for road traffic : final destination, trip time or next location predictions. These examples show that our model can be applied to different traffic environments and that, once learned, can be applied to trajectories whose spatial and temporal characteristics are different. We also produce comparisons between different technologies which enable the application of machine learning methods on massive volumes of data. Apprentissage Classification non supervisée Données massives trafic routier Trajectoire Machine Learning Clustering Big Data Road Traffic Trajectory
490	High performance trace replay event simulation of parallel programs behavior / Ferramenta de alto desempenho para análise de comportamento de programas paralelos baseada em rastos de execução Korndorfer, Jonas Henrique Muller January 2016 (has links) Sistemas modernos de alto desempenho compreendem milhares a milhões de unidades de processamento. O desenvolvimento de uma aplicação paralela escalável para tais sistemas depende de um mapeamento preciso da utilização recursos disponíveis. A identificação de recursos não utilizados e os gargalos de processamento requere uma boa análise desempenho. A observação de rastros de execução é uma das técnicas mais úteis para esse fim. Infelizmente, o rastreamento muitas vezes produz grandes arquivos de rastro, atingindo facilmente gigabytes de dados brutos. Portanto ferramentas para análise de desempenho baseadas em rastros precisam processar esses dados para uma forma legível e serem eficientes a fim de permitirem uma análise rápida e útil. A maioria das ferramentas existentes, tais como Vampir, Scalasca e TAU, focam no processamento de formatos de rastro com semântica associada, geralmente definidos para lidar com programas desenvolvidos com bibliotecas populares como OpenMP, MPI e CUDA. No entanto, nem todas aplicações paralelas utilizam essas bibliotecas e assim, algumas vezes, essas ferramentas podem não ser úteis. Felizmente existem outras ferramentas que apresentam uma abordagem mais dinâmica, utilizando um formato de arquivo de rastro aberto e sem semântica específica. Algumas dessas ferramentas são Paraver, Pajé e PajeNG. Por outro lado, ser genérico tem custo e assim tais ferramentas frequentemente apresentam baixo desempenho para o processamento de grandes rastros. O objetivo deste trabalho é apresentar otimizações feitas para o conjunto de ferramentas PajeNG. São apresentados o desenvolvimento de um estratégia de paralelização para o PajeNG e uma análise de desempenho para demonstrar nossos ganhos. O PajeNG original funciona sequencialmente, processando um único arquivo de rastro que contém todos os dados do programa rastreado. Desta forma, a escalabilidade da ferramenta fica muito limitada pela leitura dos dados. Nossa estratégia divide o arquivo em pedaços permitindo seu processamento em paralelo. O método desenvolvido para separar os rastros permite que cada pedaço execute em um fluxo de execução separado. Nossos experimentos foram executados em máquinas com acesso não uniforme à memória (NUMA).Aanálise de desempenho desenvolvida considera vários aspectos como localidade das threads, o número de fluxos, tipo de disco e também comparações entre os nós NUMA. Os resultados obtidos são muito promissores, escalando o PajeNG cerca de oito a onze vezes, dependendo da máquina. / Modern high performance systems comprise thousands to millions of processing units. The development of a scalable parallel application for such systems depends on an accurate mapping of application processes on top of available resources. The identification of unused resources and potential processing bottlenecks requires good performance analysis. The trace-based observation of a parallel program execution is one of the most helpful techniques for such purpose. Unfortunately, tracing often produces large trace files, easily reaching the order of gigabytes of raw data. Therefore tracebased performance analysis tools have to process such data to a human readable way and also should be efficient to allow an useful analysis. Most of the existing tools such as Vampir, Scalasca, TAU have focus on the processing of trace formats with a fixed and well-defined semantic. The corresponding file format are usually proposed to handle applications developed using popular libraries like OpenMP, MPI, and CUDA. However, not all parallel applications use such libraries and so, sometimes, these tools cannot be useful. Fortunately, there are other tools that present a more dynamic approach by using an open trace file format without specific semantic. Some of these tools are the Paraver, Pajé and PajeNG. However the fact of being generic comes with a cost. These tools very frequently present low performance for the processing of large traces. The objective of this work is to present performance optimizations made in the PajeNG tool-set. This comprises the development of a parallelization strategy and a performance analysis to set our gains. The original PajeNG works sequentially by processing a single trace file with all data from the observed application. This way, the scalability of the tool is very limited by the reading of the trace file. Our strategy splits such file to process several pieces in parallel. The created method to split the traces allows the processing of each piece in each thread. The experiments were executed in non-uniform memory access (NUMA) machines. The performance analysis considers several aspects like threads locality, number of flows, disk type and also comparisons between the NUMA nodes. The obtained results are very promising, scaling up the PajeNG about eight to eleven times depending on the machine. Processamento paralelo Processamento : Alto desempenho Parallel application Performance analysis High performance Big data Trace replay

Search results