Global ETD Search

301	Ballstering : un algorithme de clustering dédié à de grands échantillons / Ballstering : a clustering algorithm for large datasets Courjault-Rade, Vincent 17 April 2018 (has links) Ballstering appartient à la famille des méthodes de machine learning qui ont pour but de regrouper en classes les éléments formant la base de données étudiée et ce sans connaissance au préalable des classes qu'elle contient. Ce type de méthodes, dont le représentant le plus connu est k-means, se rassemblent sous le terme de "partitionnement de données" ou "clustering". Récemment un algorithme de partitionnement "Fast Density Peak Clustering" (FDPC) paru dans le journal Science a suscité un intérêt certain au sein de la communauté scientifique pour son aspect innovant et son efficacité sur des données distribuées en groupes non-concentriques. Seulement cet algorithme présente une complexité telle qu'il ne peut être aisément appliqué à des données volumineuses. De plus nous avons pu identifier plusieurs faiblesses pouvant nuire très fortement à la qualité de ses résultats, dont en particulier la présence d'un paramètre général dc difficile à choisir et ayant malheureusement un impact non-négligeable. Compte tenu de ces limites, nous avons repris l'idée principale de FDPC sous un nouvel angle puis apporté successivement des modifications en vue d'améliorer ses points faibles. Modifications sur modifications ont finalement donné naissance à un algorithme bien distinct que nous avons nommé Ballstering. Le fruit de ces 3 années de thèse se résume principalement en la conception de ce dernier, un algorithme de partitionnement dérivé de FDPC spécialement conçu pour être efficient sur de grands volumes de données. Tout comme son précurseur, Ballstering fonctionne en deux phases: une phase d'estimation de densité suivie d'une phase de partitionnement. Son élaboration est principalement fondée sur la construction d'une sous-procédure permettant d'effectuer la première phase de FDPC avec une complexité nettement amoindrie tout évitant le choix de dc qui devient dynamique, déterminé suivant la densité locale. Nous appelons ICMDW cette sous-procédure qui représente une partie conséquente de nos contributions. Nous avons également remanié certaines des définitions au cœur de FDPC et revu entièrement la phase 2 en s'appuyant sur la structure arborescente des résultats fournis par ICDMW pour finalement produire un algorithme outrepassant toutes les limitations que nous avons identifié chez FDPC. / Ballstering belongs to the machine learning methods that aim to group in classes a set of objects that form the studied dataset, without any knowledge of true classes within it. This type of methods, of which k-means is one of the most famous representative, are named clustering methods. Recently, a new clustering algorithm "Fast Density Peak Clustering" (FDPC) has aroused great interest from the scientific community for its innovating aspect and its efficiency on non-concentric distributions. However this algorithm showed a such complexity that it can't be applied with ease on large datasets. Moreover, we have identified several weaknesses that impact the quality results and the presence of a general parameter dc difficult to choose while having a significant impact on the results. In view of those limitations, we reworked the principal idea of FDPC in a new light and modified it successively to finally create a distinct algorithm that we called Ballstering. The work carried out during those three years can be summarised by the conception of this clustering algorithm especially designed to be effective on large datasets. As its Precursor, Ballstering works in two phases: An estimation density phase followed by a clustering step. Its conception is mainly based on a procedure that handle the first step with a lower complexity while avoiding at the same time the difficult choice of dc, which becomes automatically defined according to local density. We name ICMDW this procedure which represent a consistent part of our contributions. We also overhauled cores definitions of FDPC and entirely reworked the second phase (relying on the graph structure of ICMDW's intermediate results), to finally produce an algorithm that overcome all the limitations that we have identified. Partitionnement de données Apprentissage non-supervisé Apprentissage automatique Fouille de données Big data Densité Clustering Clustering Unsupervised learning Machine learning Big data Density peak Cluster
302	Digitaliseringens påverkan på energibranschen : En flerfallstudie på framstående svenska energibolag / The impact of digitalization in the Swedish energy sector Oscarsson, David, Palmenäs, Johan January 2018 (has links) The ongoing digitalization affects all sectors and changes the competitive landscape. A sector that is often seen upon as traditional, with low digital maturity is the energy sector. Hence, existing literature has focused on overcoming technical difficulties associated with the digitalization and lacks reasoning concerning the implications on existing business models. The purpose of the study is therefore to investigate how the digitalization affects companies in the Swedish energy sector when it comes to innovations in the business model, how companies creates, delivers and captures value. This purpose is addressed through an exploratory multiple case study including some of the most prominent actors on the Swedish energy market. The result of the study shows that the digitalization has had multiple implications in all of the business model´s building blocks, but it is still associated with a lot of uncertainties and the most radical changes are expected to happen in the future. Theoretical implications of this study are the increased understanding to how digitalization drives business model innovations and how application of new technologies can lead to increased business value. Practical implications are deepened knowledge for business managers in how digitalization can be utilized to gain increased value in an industry with an overall low digital maturity. / Syfte – Studiens syfte är att undersöka hur digitaliseringen påverkar företag i svenska energibranschen när det kommer till att skapa, leverera och fånga värde. Detta genom att skapa förståelse genom att undersöka hur digitaliseringen har påverkat företagen i den svenska energibranschen. Studiens syfte, att undersöka hur digitaliseringen driver affärsmodellsinnovationer inom varje del av energibranschens värdekedja, är explorativt. Studiens underlag grundar sig på insamling av empirisk data för att skapa ny kunskap, vilket medför att studiens forskningsansats är induktiv. Metod – Datainsamlingen har huvudsakligen genomförts via semistrukturerade intervjuer som analyserats via tematisk analys. Det selektiva urvalet grundar sig i fem olika kriterier där två av dessa kriterier ansågs som nödvändiga för samtliga av de företag som användes som fallföretag i studien. Därefter har tre kriterier använts för att identifiera viktiga aspekter kopplat till respektive forskningsfråga. Forskningsfrågorna ämnar besvara hur företagen anses använda digitaliseringen för att skapa, leverera eller fånga värde av den vara som de producerar och/eller levererar. Därefter har ett snöbollsurval tillämpats för att identifiera intervjupersoner på respektive fallföretag. Resultat – Resultatet av studien påvisar att svenska energibolag har förändrat sina affärsmodeller utifrån dimensionerna skapa, fånga och leverera värde till följd av digitaliseringen. Detta har genomförts på olika sätt mellan fallföretagen, både genom inkrementell och radikal affärsmodellsinnovation. Teoretiska implikationer – Studien bidrar till förståelsen för hur digitaliseringen vidare driver affärsmodellsinnovationer, där har studien flertalet teoretiska bidrag och tillför insikter i hur digitalisering som fenomen påverkar och förändrar affärsmodeller. Praktiska implikationer – Studien bidrar med insikter hur digitaliseringen påverkar den svenska energibranschen sett från ett perspektiv från företag i framkant inom detta område. Studien har undersökt fallföretag efter ett visst antal kriterier, dessa kriterier har lett till att framstående företag liknande bästa praxis inom området har bidragit, vilket kommer leda till en ökad förståelse för andra bolag i samma bransch. Dessutom kan rapporten nyttjas för att identifiera förbättringspotential i företagen och agera som en katalysator för att digitalt transformera verksamheten. Digitalization Business model Smart grid Big Data Internet of Things Digitalisering Affärsmodell Smart elnät Big Data Internet of Things Economics and Business Ekonomi och näringsliv
303	Utilização de filtros em programa de imagem digital / Use of filters in mobile photo-sharing application and services Azevedo, Telma Luiza de 26 April 2017 (has links) A imagem concentra a informação ideológica que abrange complexas estruturas que permeiam a vida de milhões de usuários e constituem e constroem a sociedade em nosso tempo. A partir do olhar sobre o panorama atual das práticas fotográficas na sociedade, a dissertação trata da utilização dos filtros, que consistem em ferramentas que o fotógrafo pode utilizar para aplicar diversos efeitos em suas imagens, como, por exemplo, evidenciar cores, alterar os contrastes da cena, modificar foco, aplicar efeitos gráficos, absorver parte da luz que chega a lente, isto é, sobrepor camadas de informação às mesmas, na produção de fotografias compartilhadas em redes sociais. Filtros também se referem ao ato de classificar e selecionar os fluxos de dados em rede referentes a informações públicas ou privadas de usuários ao redor do globo interferindo nas atividades de milhões de indivíduos. Deste modo, a promoção do conhecimento científico de uma esfera da linguagem fotográfica compartilhada, criativa e experimental popularizada pela tecnologia em nossos dias é imprescindível para evidenciar a abrangência do fenômeno e promover ou provocar a reflexão sobre determinantes financeiros que permeiam comportamentos cotidianos e, então, agir sobre os padrões instituídos e não apenas reproduzi-los / The image concentrates the ideological information that encompasses complex structures that permeate the lives of millions of users and constitute and build society in our time. From a look at the current view of the photographic practices in society, the dissertation deals with the use of filters, which consist of tools that the photographer can use to apply various effects to his images, such as highlighting colors, changing contrasts of the scene, modify focus, apply graphic effects, absorb part of the light that reaches the lens, that is, superimpose layers of information on them, in the production of shared photographs in social networks. Filters also refer to the act of classifying and selecting networked data flows for public or private information from users around the globe interfering with the activities of millions of individuals. Thus the promotion of scientific knowledge of a sphere of shared, creative and experimental photographic language popularized by technology in our day is essential to highlight the scope of the phenomenon and to promote or provoke reflection of the financial determinants that permeate habitual behaviors, and so transforming the established standards and not just reproduce them. Big data Big Data Compartilhamento Data flows Digital image Filters Filtros Fluxos de dados Fotografia Imagem digital Instagram Instagram Photography Privacidade Privacy Rede social Sharing Social network
304	Effective and unsupervised fractal-based feature selection for very large datasets: removing linear and non-linear attribute correlations / Seleção de atributos efetiva e não-supervisionada em grandes bases de dados: aplicando a Teoria de Fractais para remover correlações lineares e não-lineares Fraideinberze, Antonio Canabrava 04 September 2017 (has links) Given a very large dataset of moderate-to-high dimensionality, how to mine useful patterns from it? In such cases, dimensionality reduction is essential to overcome the well-known curse of dimensionality. Although there exist algorithms to reduce the dimensionality of Big Data, unfortunately, they all fail to identify/eliminate non-linear correlations that may occur between the attributes. This MSc work tackles the problem by exploring concepts of the Fractal Theory and massive parallel processing to present Curl-Remover, a novel dimensionality reduction technique for very large datasets. Our contributions are: (a) Curl-Remover eliminates linear and non-linear attribute correlations as well as irrelevant attributes; (b) it is unsupervised and suits for analytical tasks in general not only classification; (c) it presents linear scale-up on both the data size and the number of machines used; (d) it does not require the user to guess the number of attributes to be removed, and; (e) it preserves the attributes semantics by performing feature selection, not feature extraction. We executed experiments on synthetic and real data spanning up to 1.1 billion points, and report that our proposed Curl-Remover outperformed two PCA-based algorithms from the state-of-the-art, being in average up to 8% more accurate. / Dada uma grande base de dados de dimensionalidade moderada a alta, como identificar padrões úteis nos objetos de dados? Nesses casos, a redução de dimensionalidade é essencial para superar um fenômeno conhecido na literatura como a maldição da alta dimensionalidade. Embora existam algoritmos capazes de reduzir a dimensionalidade de conjuntos de dados na escala de Terabytes, infelizmente, todos falham em relação à identificação/eliminação de correlações não lineares entre os atributos. Este trabalho de Mestrado trata o problema explorando conceitos da Teoria de Fractais e processamento paralelo em massa para apresentar Curl-Remover, uma nova técnica de redução de dimensionalidade bem adequada ao pré-processamento de Big Data. Suas principais contribuições são: (a) Curl-Remover elimina correlações lineares e não lineares entre atributos, bem como atributos irrelevantes; (b) não depende de supervisão do usuário e é útil para tarefas analíticas em geral não apenas para a classificação; (c) apresenta escalabilidade linear tanto em relação ao número de objetos de dados quanto ao número de máquinas utilizadas; (d) não requer que o usuário sugira um número de atributos para serem removidos, e; (e) mantêm a semântica dos atributos por ser uma técnica de seleção de atributos, não de extração de atributos. Experimentos foram executados em conjuntos de dados sintéticos e reais contendo até 1,1 bilhões de pontos, e a nova técnica Curl-Remover apresentou desempenho superior comparada a dois algoritmos do estado da arte baseados em PCA, obtendo em média até 8% a mais em acurácia de resultados. Big data Big data Feature selection Fractal theory Massive parallel processing Non-linear attribute correlations Processamento paralelo em massa Seleção de atributos Teoria de fractais
305	Uma nova arquitetura para Internet das Coisas com análise e reconhecimento de padrões e processamento com Big Data. / A novel Internet of Things architecture with pattern recognition and big data processing. Souza, Alberto Messias da Costa 16 October 2015 (has links) A Internet das Coisas é um novo paradigma de comunicação que estende o mundo virtual (Internet) para o mundo real com a interface e interação entre objetos. Ela possuirá um grande número de dispositivos heteregôneos interconectados, que deverá gerar um grande volume de dados. Um dos importantes desafios para seu desenvolvimento é se guardar e processar esse grande volume de dados em aceitáveis intervalos de tempo. Esta pesquisa endereça esse desafio, com a introdução de serviços de análise e reconhecimento de padrões nas camadas inferiores do modelo de para Internet das Coisas, que procura reduzir o processamento nas camadas superiores. Na pesquisa foram analisados os modelos de referência para Internet das Coisas e plataformas para desenvolvimento de aplicações nesse contexto. A nova arquitetura de implementada estende o LinkSmart Middeware pela introdução de um módulo para reconhecimento de padrões, implementa algoritmos para estimação de valores, detecção de outliers e descoberta de grupos nos dados brutos, oriundos de origens de dados. O novo módulo foi integrado à plataforma para Big Data Hadoop e usa as implementações algorítmicas do framework Mahout. Este trabalho destaca a importância da comunicação cross layer integrada à essa nova arquitetura. Nos experimentos desenvolvidos na pesquisa foram utilizadas bases de dados reais, provenientes do projeto Smart Santander, de modo a validar da nova arquitetura de IoT integrada aos serviços de análise e reconhecimento de padrões e a comunicação cross-layer. / The Internet of Things is a new communication paradigm in which the Internet is extended from the virtual world to interface and interact with objects of the physical world. The IoT has high number of heterogeneous interconnected devices, that generate huge volume data. The most important IoT challenges is store and proccess this large volume data. This research addresses this issue by introducing pattern recognition services into the lower layers of the Internet of Things reference model stack and reduces the processing at the higher layers. The research analyzes the Internet of Things reference model and Middleware platforms to develop applications in this context. The new architecture implementation extends the LinkSmart by introducing a pattern recognition manager that includes algorithms to estimate parameters, detect outliers, and to perform clustering of raw data from IoT resources. The new module is integrated with the Big Data Haddop platform and uses Mahout algorithms implementation. This work highlights the cross-layer communication intregated in the new IoT architecture. The experiments made in this research using the real database from Smart Santander Framework to validate the new IoT architecture with pattern recognition services and cross-layer communication. Big Data Communications network Internet das Coisas Internet of Things Modelo de referência Pattern recognition Reconhecimento de padrões Redes de comunicação Reference model. Big Data
306	Delayed Transfer Entropy applied to Big Data / Delayed Transfer Entropy aplicado a Big Data Dourado, Jonas Rossi 30 November 2018 (has links) Recent popularization of technologies such as Smartphones, Wearables, Internet of Things, Social Networks and Video streaming increased data creation. Dealing with extensive data sets led the creation of term big data, often defined as when data volume, acquisition rate or representation demands nontraditional approaches to data analysis or requires horizontal scaling for data processing. Analysis is the most important Big Data phase, where it has the objective of extracting meaningful and often hidden information. One example of Big Data hidden information is causality, which can be inferred with Delayed Transfer Entropy (DTE). Despite DTE wide applicability, it has a high demanding processing power which is aggravated with large datasets as those found in big data. This research optimized DTE performance and modified existing code to enable DTE execution on a computer cluster. With big data trend in sight, this results may enable bigger datasets analysis or better statistical evidence. / A recente popularização de tecnologias como Smartphones, Wearables, Internet das Coisas, Redes Sociais e streaming de Video aumentou a criação de dados. A manipulação de grande quantidade de dados levou a criação do termo Big Data, muitas vezes definido como quando o volume, a taxa de aquisição ou a representação dos dados demanda abordagens não tradicionais para analisar ou requer uma escala horizontal para o processamento de dados. A análise é a etapa de Big Data mais importante, tendo como objetivo extrair informações relevantes e às vezes escondidas. Um exemplo de informação escondida é a causalidade, que pode ser inferida utilizando Delayed Transfer Entropy (DTE). Apesar do DTE ter uma grande aplicabilidade, ele possui uma grande demanda computacional, esta última, é agravada devido a grandes bases de dados como as encontradas em Big Data. Essa pesquisa otimizou e modificou o código existente para permitir a execução de DTE em um cluster de computadores. Com a tendência de Big Data em vista, esse resultado pode permitir bancos de dados maiores ou melhores evidências estatísticas. Análise de Big Data Big Data analysis Causalidade Causality Cluster heterogêneo de computadores Delayed Transfer Entropy Delayed Transfer Entropy Estratégias de paralelismo Heterogeneous computer cluster Parallelism strategies Surrogate Surrogate
307	Etude spatiale des données collectées à bord des navires de pêche : en quoi est-elle pertinente pour la gestion des rejets de la pêche ? / Spatial analysis of on-board observer programme data : how is it relevant to the management of discards? Pointin, Fabien 05 November 2018 (has links) Depuis 2002, les pays membres de l’Union Européenne (UE) collectent, gèrent et fournissent des données nécessaires à la gestion des pêches et des rejets en particulier. Dans ce contexte, les programmes d’observation à la mer collectent des données à bord des navires de pêche sur la composition et la quantité des captures, y compris des rejets. En s’appuyant sur ces données, cette thèse a pour but d’analyser les variabilités spatio-temporelles des débarquements et des rejets de la pêche afin de contribuer à leur gestion. Pour cela, une méthode de cartographie basée sur des grilles à mailles variables a été développée. Cette méthode a été conçue pour produire des cartes pluriannuelles, annuelles et trimestrielles des débarquements et des rejets par espèce ou groupe selon le métier de pêche.Une plateforme basée sur des technologies Big Data a ensuite été utilisée avec pour objectifs d’affiner et d’automatiser la méthode de cartographie. Grâce à un système de stockage en ligne et un système d’analyse à haute performance, un grand nombre de cartes a ainsi pu être produit automatiquement par métier en regroupant ou non les années, les trimestres et les espèces. Finalement, l’utilité des cartes produites pour la gestion des rejets a été démontrée, en particulier dans le cadre de l’Obligation de Débarquement (Règlement (UE) n° 1380/2013). En s’appuyant sur des données de coûts et de revenus de flottilles, ces cartes permettent d’envisager des stratégies d’évitement de zones et/ou périodes de pêche propices aux captures non-désirées minimisant l’impact sur les performances économi / Since 2002, the European Union (EU) Members States collect, manage and supply data forthe management of fisheries and specifically of discards. In this context, at-sea observer programmes collect data on-board fishing vessels on the composition and quantity of catch, including discards. Based on these data, this study aims to analyse the spatio-temporal distribution of landings and discards so as to contribute to their management. In doing so, amapping method based on nested grids has been developed. This method has been designed to produce pluriannual, annual and quarterly maps of landings and discards per species or group of species according to the fishing metier.A platform based on Big Data technologies has then been used with the objectives of refining and automating the mapping method. Using an online storage system and a high-performance computing system, a large number of maps were produced automatically per métier, grouping or not years, quarters and species. Finally, the usefulness of the produced maps for managing discards has been demonstrated, particularly regarding the Landing Obligation (Regulation (UE) n° 1380/2013). Based on fleet cost and revenue data, these maps open up possibilities for identifying fishing zones and/or periods to be avoided (i.e., high bycatch) while minimising the impacts on fleet’s economic performances. Rejet Programme d'observation à la mer Grille à mailles variables Big Data Obligation de débarquement Incitations Economiques. Discards On-Board observer programme Nested Grid Big Data Landing Obligation
308	Similaridade em big data / Similarity in big data Santos, Lúcio Fernandes Dutra 19 July 2017 (has links) Os volumes de dados armazenados em grandes bases de dados aumentam em ritmo sempre crescente, pressionando o desempenho e a flexibilidade dos Sistemas de Gerenciamento de Bases de Dados (SGBDs). Os problemas de se tratar dados em grandes quantidades, escopo, complexidade e distribuição vêm sendo tratados também sob o tema de big data. O aumento da complexidade cria a necessidade de novas formas de busca - representar apenas números e pequenas cadeias de caracteres já não é mais suficiente. Buscas por similaridade vêm se mostrando a maneira por excelência de comparar dados complexos, mas até recentemente elas não estavam disponíveis nos SGBDs. Agora, com o início de sua disponibilidade, está se tornando claro que apenas os operadores de busca por similaridade fundamentais não são suficientes para lidar com grandes volumes de dados. Um dos motivos disso é que similaridade\' é, usualmente, definida considerando seu significado quando apenas poucos estão envolvidos. Atualmente, o principal foco da literatura em big data é aumentar a eficiência na recuperação dos dados usando paralelismo, existindo poucos estudos sobre a eficácia das respostas obtidas. Esta tese visa propor e desenvolver variações dos operadores de busca por similaridade para torná-los mais adequados para processar big data, apresentando visões mais abrangentes da base de dados, aumentando a eficácia das respostas, porém sem causar impactos consideráveis na eficiência dos algoritmos de busca e viabilizando sua execução escalável sobre grandes volumes de dados. Para alcançar esse objetivo, este trabalho apresenta quatro frentes de contribuições: A primeira consistiu em um modelo de diversificação de resultados que pode ser aplicado usando qualquer critério de comparação e operador de busca por similaridade. A segunda focou em definir técnicas de amostragem e de agrupamento de dados com o modelo de diversificação proposto, acelerando o processo de análise dos conjuntos de resultados. A terceira contribuição desenvolveu métodos de avaliação da qualidade dos conjuntos de resultados diversificados. Por fim, a última frente de contribuição apresentou uma abordagem para integrar os conceitos de mineração visual de dados e buscas por similaridade com diversidade em sistemas de recuperação por conteúdo, aumentando o entendimento de como a propriedade de diversidade pode ser aplicada. / The data being collected and generated nowadays increase not only in volume, but also in complexity, requiring new query operators. Health care centers collecting image exams and remote sensing from satellites and from earth-based stations are examples of application domains where more powerful and flexible operators are required. Storing, retrieving and analyzing data that are huge in volume, structure, complexity and distribution are now being referred to as big data. Representing and querying big data using only the traditional scalar data types are not enough anymore. Similarity queries are the most pursued resources to retrieve complex data, but until recently, they were not available in the Database Management Systems. Now that they are starting to become available, its first uses to develop real systems make it clear that the basic similarity query operators are not enough to meet the requirements of the target applications. The main reason is that similarity is a concept formulated considering only small amounts of data elements. Nowadays, researchers are targeting handling big data mainly using parallel architectures, and only a few studies exist targeting the efficacy of the query answers. This Ph.D. work aims at developing variations for the basic similarity operators to propose better suited similarity operators to handle big data, presenting a holistic vision about the database, increasing the effectiveness of the provided answers, but without causing impact on the efficiency on the searching algorithms. To achieve this goal, four mainly contributions are presented: The first one was a result diversification model that can be applied in any comparison criteria and similarity search operator. The second one focused on defining sampling and grouping techniques with the proposed diversification model aiming at speeding up the analysis task of the result sets. The third contribution concentrated on evaluation methods for measuring the quality of diversified result sets. Finally, the last one defines an approach to integrate the concepts of visual data mining and similarity with diversity searches in content-based retrieval systems, allowing a better understanding of how the diversity property is applied in the query process. Análise de qualidade de resultados Analysis of results quality Big data Big data Buscas em espaços métricos Buscas por similaridade Diversificação de resultados Result diversification Similarity queries Similarity search in metric space
309	Conception et développement d'un système d'intelligence économique (SIE) pour l'analyse de big data dans un environnement de cloud computing / Establishment of a competitive intelligence system (CIS) for big data analytics in a cloud computing El Haddadi, Amine 31 March 2018 (has links) Aujourd'hui, avec la connexion présente en tout lieu et à tout instant, des données considérables naissent. Ces données ou data deviennent un acteur clé pour la compréhension, l'analyse, l'anticipation et la résolution des grands problèmes économiques, politiques, sociaux et scientifiques. Les data changent aussi nos procédures de travail, notre environnement culturel, allant même jusqu'à restructurer notre manière de penser. Et à peine que le monde scientifique, manageriel et financier, s'intéresse au Big Data, qu'une nouvelle discipline est en train de monter en puissance : le Fast Data. Outre le volume saillant de données ; une autre variante devient décisive, la capacité de traiter à vitesse efficiente les données dans toute leur diversité, de les transformer en connaissances en fournissant la bonne information à la bonne personne et au bon moment, voire les utiliser pour prédire l'avenir. L'exploitation de Big Data nécessite la proposition de nouvelles approches mathématiques et informatiques adaptées mais aussi une réingénierie des approches managériales pour la maîtrise de l'environnement informationnel d'un organisme public ou privé. Tout en se basant sur une démarche de management stratégique d'information comme l'Intelligence Économique (IE). Cette dernière combine et englobe les techniques de Business Intelligence pour la maîtrise des données internes et les techniques de veille stratégique pour la surveillance et la maitrise des flux d'informations externe. Cependant, le Big Data, comme source d'information sans limite pour l'IE, a bouleversé le processus traditionnel de l'IE, ce qui demande une réingénierie de la démarche d'IE. Mes travaux de recherche s'inscrivent parfaitement dans ce contexte caractérisé par un environnement incertain et imprévisible. Dans l'objectif principal est de proposer un nouveau système d'IE (SIE) pour l'analyse de Big Data. Donc, comment peut-on adapter la démarche d'IE à la nouvelle ère moderne de Big Data ? Dans lequel les organismes publics ou privés se trouvent submergés par l'information. Une première réponse, fait l'objet de ma contribution sur la proposition d'un nouveau SIE nommé XEW 2.0, qui se base sur une architecture Big Data orientée service, agile et modulable. L'architecture décisionnelle de XEW 2.0, se compose de quatre services : le Service de Sourcing (SS-XEW), le Service de Data Warehousing (SDW-XEW), le Service de Big Data Analytics (SBDA-XEW) et le Service de Big Data Visualisation (SBDV-XEW). Chaque service est vu comme une composante indépendante qui doit rendre un service bien précis aux autres composantes de XEW 2.0. / In the information era, people's lives are deeply impacted by IT due to the exposure of social networks, emails, RSS feeds, chats, white papers, web pages, etc. Such data are considered very valuable for companies since they will help them in improving their strategies, analyzing their customers' trends or their competitors' marketing interventions is a simple and obvious example. Also, with the advent of the era of Big Data, organizations can obtain information about the dynamic environment of the markets by analyzing consumer's reactions, preferences, opinions and rating on various social media and other networking platforms. Thus, the companies should be equipped with the consciousness of competitive intelligence (CI), and grasp the key points of CI, with setting up an efficient and simple competitive intelligence system adapted to support Big Data. The objective of this thesis is to introduce a new architectural model of Big Data collecting, analyzing and using, named XEW 2.0. This system operates according to four principal steps, where each of which has a dedicated service : (i) XEW sourcing service (XEW-SS), allows searching, collecting, and processing the data from different sources ; (ii) XEW data warehousing services (XEW-DWS) : This brings a unified view of the target corpus and then, creates a data warehouse accessible from the analytics and visualization services ; (iii) XEW Big Data Analytics service (XEW-BDAS) : allows for making multidimensional analyses by adapting data mining algorithms to Big Data ; (iv) XEW Big Data Visualization service (XEW-BDVS) : allows visualizing Big Data in the form of innovative design and graphs representing, for instance, social networks, semantic networks, strategic alliances networks, etc. Système d'information Système d'intelligence économique Big data Big data mining Data visualisation Cloud computing Intelligence économique Veille stratégique Veille scientifique Veille technologique Tetralogie XPlor
310	Delayed Transfer Entropy applied to Big Data / Delayed Transfer Entropy aplicado a Big Data Jonas Rossi Dourado 30 November 2018 (has links) Recent popularization of technologies such as Smartphones, Wearables, Internet of Things, Social Networks and Video streaming increased data creation. Dealing with extensive data sets led the creation of term big data, often defined as when data volume, acquisition rate or representation demands nontraditional approaches to data analysis or requires horizontal scaling for data processing. Analysis is the most important Big Data phase, where it has the objective of extracting meaningful and often hidden information. One example of Big Data hidden information is causality, which can be inferred with Delayed Transfer Entropy (DTE). Despite DTE wide applicability, it has a high demanding processing power which is aggravated with large datasets as those found in big data. This research optimized DTE performance and modified existing code to enable DTE execution on a computer cluster. With big data trend in sight, this results may enable bigger datasets analysis or better statistical evidence. / A recente popularização de tecnologias como Smartphones, Wearables, Internet das Coisas, Redes Sociais e streaming de Video aumentou a criação de dados. A manipulação de grande quantidade de dados levou a criação do termo Big Data, muitas vezes definido como quando o volume, a taxa de aquisição ou a representação dos dados demanda abordagens não tradicionais para analisar ou requer uma escala horizontal para o processamento de dados. A análise é a etapa de Big Data mais importante, tendo como objetivo extrair informações relevantes e às vezes escondidas. Um exemplo de informação escondida é a causalidade, que pode ser inferida utilizando Delayed Transfer Entropy (DTE). Apesar do DTE ter uma grande aplicabilidade, ele possui uma grande demanda computacional, esta última, é agravada devido a grandes bases de dados como as encontradas em Big Data. Essa pesquisa otimizou e modificou o código existente para permitir a execução de DTE em um cluster de computadores. Com a tendência de Big Data em vista, esse resultado pode permitir bancos de dados maiores ou melhores evidências estatísticas. Análise de Big Data Causalidade Cluster heterogêneo de computadores Delayed Transfer Entropy Estratégias de paralelismo Surrogate Big Data analysis Causality Delayed Transfer Entropy Heterogeneous computer cluster Parallelism strategies Surrogate

Search results