• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 591
  • 119
  • 109
  • 75
  • 40
  • 40
  • 27
  • 22
  • 19
  • 10
  • 8
  • 7
  • 6
  • 6
  • 5
  • Tagged with
  • 1225
  • 1225
  • 181
  • 170
  • 163
  • 156
  • 150
  • 150
  • 149
  • 129
  • 112
  • 110
  • 110
  • 109
  • 108
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
301

Uma nova arquitetura para Internet das Coisas com análise e reconhecimento de padrões e processamento com Big Data. / A novel Internet of Things architecture with pattern recognition and big data processing.

Souza, Alberto Messias da Costa 16 October 2015 (has links)
A Internet das Coisas é um novo paradigma de comunicação que estende o mundo virtual (Internet) para o mundo real com a interface e interação entre objetos. Ela possuirá um grande número de dispositivos heteregôneos interconectados, que deverá gerar um grande volume de dados. Um dos importantes desafios para seu desenvolvimento é se guardar e processar esse grande volume de dados em aceitáveis intervalos de tempo. Esta pesquisa endereça esse desafio, com a introdução de serviços de análise e reconhecimento de padrões nas camadas inferiores do modelo de para Internet das Coisas, que procura reduzir o processamento nas camadas superiores. Na pesquisa foram analisados os modelos de referência para Internet das Coisas e plataformas para desenvolvimento de aplicações nesse contexto. A nova arquitetura de implementada estende o LinkSmart Middeware pela introdução de um módulo para reconhecimento de padrões, implementa algoritmos para estimação de valores, detecção de outliers e descoberta de grupos nos dados brutos, oriundos de origens de dados. O novo módulo foi integrado à plataforma para Big Data Hadoop e usa as implementações algorítmicas do framework Mahout. Este trabalho destaca a importância da comunicação cross layer integrada à essa nova arquitetura. Nos experimentos desenvolvidos na pesquisa foram utilizadas bases de dados reais, provenientes do projeto Smart Santander, de modo a validar da nova arquitetura de IoT integrada aos serviços de análise e reconhecimento de padrões e a comunicação cross-layer. / The Internet of Things is a new communication paradigm in which the Internet is extended from the virtual world to interface and interact with objects of the physical world. The IoT has high number of heterogeneous interconnected devices, that generate huge volume data. The most important IoT challenges is store and proccess this large volume data. This research addresses this issue by introducing pattern recognition services into the lower layers of the Internet of Things reference model stack and reduces the processing at the higher layers. The research analyzes the Internet of Things reference model and Middleware platforms to develop applications in this context. The new architecture implementation extends the LinkSmart by introducing a pattern recognition manager that includes algorithms to estimate parameters, detect outliers, and to perform clustering of raw data from IoT resources. The new module is integrated with the Big Data Haddop platform and uses Mahout algorithms implementation. This work highlights the cross-layer communication intregated in the new IoT architecture. The experiments made in this research using the real database from Smart Santander Framework to validate the new IoT architecture with pattern recognition services and cross-layer communication.
302

Delayed Transfer Entropy applied to Big Data / Delayed Transfer Entropy aplicado a Big Data

Dourado, Jonas Rossi 30 November 2018 (has links)
Recent popularization of technologies such as Smartphones, Wearables, Internet of Things, Social Networks and Video streaming increased data creation. Dealing with extensive data sets led the creation of term big data, often defined as when data volume, acquisition rate or representation demands nontraditional approaches to data analysis or requires horizontal scaling for data processing. Analysis is the most important Big Data phase, where it has the objective of extracting meaningful and often hidden information. One example of Big Data hidden information is causality, which can be inferred with Delayed Transfer Entropy (DTE). Despite DTE wide applicability, it has a high demanding processing power which is aggravated with large datasets as those found in big data. This research optimized DTE performance and modified existing code to enable DTE execution on a computer cluster. With big data trend in sight, this results may enable bigger datasets analysis or better statistical evidence. / A recente popularização de tecnologias como Smartphones, Wearables, Internet das Coisas, Redes Sociais e streaming de Video aumentou a criação de dados. A manipulação de grande quantidade de dados levou a criação do termo Big Data, muitas vezes definido como quando o volume, a taxa de aquisição ou a representação dos dados demanda abordagens não tradicionais para analisar ou requer uma escala horizontal para o processamento de dados. A análise é a etapa de Big Data mais importante, tendo como objetivo extrair informações relevantes e às vezes escondidas. Um exemplo de informação escondida é a causalidade, que pode ser inferida utilizando Delayed Transfer Entropy (DTE). Apesar do DTE ter uma grande aplicabilidade, ele possui uma grande demanda computacional, esta última, é agravada devido a grandes bases de dados como as encontradas em Big Data. Essa pesquisa otimizou e modificou o código existente para permitir a execução de DTE em um cluster de computadores. Com a tendência de Big Data em vista, esse resultado pode permitir bancos de dados maiores ou melhores evidências estatísticas.
303

Deep graphs

Traxl, Dominik 17 May 2017 (has links)
Netzwerk Theorie hat sich als besonders zweckdienlich in der Darstellung von Systemen herausgestellt. Jedoch fehlen in der Netzwerkdarstellung von Systemen noch immer essentielle Bausteine um diese generell zur Datenanalyse heranzuziehen zu können. Allen voran fehlt es an einer expliziten Assoziation von Informationen mit den Knoten und Kanten eines Netzwerks und einer schlüssigen Darstellung von Gruppen von Knoten und deren Relationen auf verschiedenen Skalen. Das Hauptaugenmerk dieser Dissertation ist der Einbindung dieser Bausteine in eine verallgemeinerte Rahmenstruktur gewidmet. Diese Rahmenstruktur - Deep Graphs - ist in der Lage als Bindeglied zwischen einer vereinheitlichten und generalisierten Netzwerkdarstellung von Systemen und den Methoden der Statistik und des maschinellen Lernens zu fungieren (Software: https://github.com/deepgraph/deepgraph). Anwendungen meiner Rahmenstruktur werden dargestellt. Ich konstruiere einen Regenfall Deep Graph und analysiere raumzeitliche Extrem-Regenfallcluster. Auf Grundlage dieses Graphs liefere ich einen statistischen Beleg, dass die Größenverteilung dieser Cluster einem exponentiell gedämpften Potenzgesetz folgt. Mit Hilfe eines generativen Sturm-Modells zeige ich, dass die exponentielle Dämpfung der beobachteten Größenverteilung durch das Vorhandensein von Landmasse auf unserem Planeten zustande kommen könnte. Dann verknüpfe ich zwei hochauflösende Satelliten-Produkte um raumzeitliche Cluster von Feuer-betroffenen Gebieten im brasilianischen Amazonas zu identifizieren und deren Brandeigenschaften zu charakterisieren. Zuletzt untersuche ich den Einfluss von weißem Rauschen und der globalen Kopplungsstärke auf die maximale Synchronisierbarkeit von Oszillatoren-Netzwerken für eine Vielzahl von Oszillatoren-Modellen, welche durch ein breites Spektrum an Netzwerktopologien gekoppelt sind. Ich finde ein allgemeingültiges sigmoidales Skalierungsverhalten, und validiere dieses mit einem geeignetem Regressionsmodell. / Network theory has proven to be a powerful instrument in the representation of complex systems. Yet, even in its latest and most general form (i.e., multilayer networks), it is still lacking essential qualities to serve as a general data analysis framework. These include, most importantly, an explicit association of information with the nodes and edges of a network, and a conclusive representation of groups of nodes and their respective interrelations on different scales. The implementation of these qualities into a generalized framework is the primary contribution of this dissertation. By doing so, I show how my framework - deep graphs - is capable of acting as a go-between, joining a unified and generalized network representation of systems with the tools and methods developed in statistics and machine learning. A software package accompanies this dissertation, see https://github.com/deepgraph/deepgraph. A number of applications of my framework are demonstrated. I construct a rainfall deep graph and conduct an analysis of spatio-temporal extreme rainfall clusters. Based on the constructed deep graph, I provide statistical evidence that the size distribution of these clusters is best approximated by an exponentially truncated powerlaw. By means of a generative storm-track model, I argue that the exponential truncation of the observed distribution could be caused by the presence of land masses. Then, I combine two high-resolution satellite products to identify spatio-temporal clusters of fire-affected areas in the Brazilian Amazon and characterize their land use specific burning conditions. Finally, I investigate the effects of white noise and global coupling strength on the maximum degree of synchronization for a variety of oscillator models coupled according to a broad spectrum of network topologies. I find a general sigmoidal scaling and validate it with a suitable regression model.
304

Etude spatiale des données collectées à bord des navires de pêche : en quoi est-elle pertinente pour la gestion des rejets de la pêche ? / Spatial analysis of on-board observer programme data : how is it relevant to the management of discards?

Pointin, Fabien 05 November 2018 (has links)
Depuis 2002, les pays membres de l’Union Européenne (UE) collectent, gèrent et fournissent des données nécessaires à la gestion des pêches et des rejets en particulier. Dans ce contexte, les programmes d’observation à la mer collectent des données à bord des navires de pêche sur la composition et la quantité des captures, y compris des rejets. En s’appuyant sur ces données, cette thèse a pour but d’analyser les variabilités spatio-temporelles des débarquements et des rejets de la pêche afin de contribuer à leur gestion. Pour cela, une méthode de cartographie basée sur des grilles à mailles variables a été développée. Cette méthode a été conçue pour produire des cartes pluriannuelles, annuelles et trimestrielles des débarquements et des rejets par espèce ou groupe selon le métier de pêche.Une plateforme basée sur des technologies Big Data a ensuite été utilisée avec pour objectifs d’affiner et d’automatiser la méthode de cartographie. Grâce à un système de stockage en ligne et un système d’analyse à haute performance, un grand nombre de cartes a ainsi pu être produit automatiquement par métier en regroupant ou non les années, les trimestres et les espèces. Finalement, l’utilité des cartes produites pour la gestion des rejets a été démontrée, en particulier dans le cadre de l’Obligation de Débarquement (Règlement (UE) n° 1380/2013). En s’appuyant sur des données de coûts et de revenus de flottilles, ces cartes permettent d’envisager des stratégies d’évitement de zones et/ou périodes de pêche propices aux captures non-désirées minimisant l’impact sur les performances économi / Since 2002, the European Union (EU) Members States collect, manage and supply data forthe management of fisheries and specifically of discards. In this context, at-sea observer programmes collect data on-board fishing vessels on the composition and quantity of catch, including discards. Based on these data, this study aims to analyse the spatio-temporal distribution of landings and discards so as to contribute to their management. In doing so, amapping method based on nested grids has been developed. This method has been designed to produce pluriannual, annual and quarterly maps of landings and discards per species or group of species according to the fishing metier.A platform based on Big Data technologies has then been used with the objectives of refining and automating the mapping method. Using an online storage system and a high-performance computing system, a large number of maps were produced automatically per métier, grouping or not years, quarters and species. Finally, the usefulness of the produced maps for managing discards has been demonstrated, particularly regarding the Landing Obligation (Regulation (UE) n° 1380/2013). Based on fleet cost and revenue data, these maps open up possibilities for identifying fishing zones and/or periods to be avoided (i.e., high bycatch) while minimising the impacts on fleet’s economic performances.
305

Similaridade em big data / Similarity in big data

Santos, Lúcio Fernandes Dutra 19 July 2017 (has links)
Os volumes de dados armazenados em grandes bases de dados aumentam em ritmo sempre crescente, pressionando o desempenho e a flexibilidade dos Sistemas de Gerenciamento de Bases de Dados (SGBDs). Os problemas de se tratar dados em grandes quantidades, escopo, complexidade e distribuição vêm sendo tratados também sob o tema de big data. O aumento da complexidade cria a necessidade de novas formas de busca - representar apenas números e pequenas cadeias de caracteres já não é mais suficiente. Buscas por similaridade vêm se mostrando a maneira por excelência de comparar dados complexos, mas até recentemente elas não estavam disponíveis nos SGBDs. Agora, com o início de sua disponibilidade, está se tornando claro que apenas os operadores de busca por similaridade fundamentais não são suficientes para lidar com grandes volumes de dados. Um dos motivos disso é que similaridade\' é, usualmente, definida considerando seu significado quando apenas poucos estão envolvidos. Atualmente, o principal foco da literatura em big data é aumentar a eficiência na recuperação dos dados usando paralelismo, existindo poucos estudos sobre a eficácia das respostas obtidas. Esta tese visa propor e desenvolver variações dos operadores de busca por similaridade para torná-los mais adequados para processar big data, apresentando visões mais abrangentes da base de dados, aumentando a eficácia das respostas, porém sem causar impactos consideráveis na eficiência dos algoritmos de busca e viabilizando sua execução escalável sobre grandes volumes de dados. Para alcançar esse objetivo, este trabalho apresenta quatro frentes de contribuições: A primeira consistiu em um modelo de diversificação de resultados que pode ser aplicado usando qualquer critério de comparação e operador de busca por similaridade. A segunda focou em definir técnicas de amostragem e de agrupamento de dados com o modelo de diversificação proposto, acelerando o processo de análise dos conjuntos de resultados. A terceira contribuição desenvolveu métodos de avaliação da qualidade dos conjuntos de resultados diversificados. Por fim, a última frente de contribuição apresentou uma abordagem para integrar os conceitos de mineração visual de dados e buscas por similaridade com diversidade em sistemas de recuperação por conteúdo, aumentando o entendimento de como a propriedade de diversidade pode ser aplicada. / The data being collected and generated nowadays increase not only in volume, but also in complexity, requiring new query operators. Health care centers collecting image exams and remote sensing from satellites and from earth-based stations are examples of application domains where more powerful and flexible operators are required. Storing, retrieving and analyzing data that are huge in volume, structure, complexity and distribution are now being referred to as big data. Representing and querying big data using only the traditional scalar data types are not enough anymore. Similarity queries are the most pursued resources to retrieve complex data, but until recently, they were not available in the Database Management Systems. Now that they are starting to become available, its first uses to develop real systems make it clear that the basic similarity query operators are not enough to meet the requirements of the target applications. The main reason is that similarity is a concept formulated considering only small amounts of data elements. Nowadays, researchers are targeting handling big data mainly using parallel architectures, and only a few studies exist targeting the efficacy of the query answers. This Ph.D. work aims at developing variations for the basic similarity operators to propose better suited similarity operators to handle big data, presenting a holistic vision about the database, increasing the effectiveness of the provided answers, but without causing impact on the efficiency on the searching algorithms. To achieve this goal, four mainly contributions are presented: The first one was a result diversification model that can be applied in any comparison criteria and similarity search operator. The second one focused on defining sampling and grouping techniques with the proposed diversification model aiming at speeding up the analysis task of the result sets. The third contribution concentrated on evaluation methods for measuring the quality of diversified result sets. Finally, the last one defines an approach to integrate the concepts of visual data mining and similarity with diversity searches in content-based retrieval systems, allowing a better understanding of how the diversity property is applied in the query process.
306

Conception et développement d'un système d'intelligence économique (SIE) pour l'analyse de big data dans un environnement de cloud computing / Establishment of a competitive intelligence system (CIS) for big data analytics in a cloud computing

El Haddadi, Amine 31 March 2018 (has links)
Aujourd'hui, avec la connexion présente en tout lieu et à tout instant, des données considérables naissent. Ces données ou data deviennent un acteur clé pour la compréhension, l'analyse, l'anticipation et la résolution des grands problèmes économiques, politiques, sociaux et scientifiques. Les data changent aussi nos procédures de travail, notre environnement culturel, allant même jusqu'à restructurer notre manière de penser. Et à peine que le monde scientifique, manageriel et financier, s'intéresse au Big Data, qu'une nouvelle discipline est en train de monter en puissance : le Fast Data. Outre le volume saillant de données ; une autre variante devient décisive, la capacité de traiter à vitesse efficiente les données dans toute leur diversité, de les transformer en connaissances en fournissant la bonne information à la bonne personne et au bon moment, voire les utiliser pour prédire l'avenir. L'exploitation de Big Data nécessite la proposition de nouvelles approches mathématiques et informatiques adaptées mais aussi une réingénierie des approches managériales pour la maîtrise de l'environnement informationnel d'un organisme public ou privé. Tout en se basant sur une démarche de management stratégique d'information comme l'Intelligence Économique (IE). Cette dernière combine et englobe les techniques de Business Intelligence pour la maîtrise des données internes et les techniques de veille stratégique pour la surveillance et la maitrise des flux d'informations externe. Cependant, le Big Data, comme source d'information sans limite pour l'IE, a bouleversé le processus traditionnel de l'IE, ce qui demande une réingénierie de la démarche d'IE. Mes travaux de recherche s'inscrivent parfaitement dans ce contexte caractérisé par un environnement incertain et imprévisible. Dans l'objectif principal est de proposer un nouveau système d'IE (SIE) pour l'analyse de Big Data. Donc, comment peut-on adapter la démarche d'IE à la nouvelle ère moderne de Big Data ? Dans lequel les organismes publics ou privés se trouvent submergés par l'information. Une première réponse, fait l'objet de ma contribution sur la proposition d'un nouveau SIE nommé XEW 2.0, qui se base sur une architecture Big Data orientée service, agile et modulable. L'architecture décisionnelle de XEW 2.0, se compose de quatre services : le Service de Sourcing (SS-XEW), le Service de Data Warehousing (SDW-XEW), le Service de Big Data Analytics (SBDA-XEW) et le Service de Big Data Visualisation (SBDV-XEW). Chaque service est vu comme une composante indépendante qui doit rendre un service bien précis aux autres composantes de XEW 2.0. / In the information era, people's lives are deeply impacted by IT due to the exposure of social networks, emails, RSS feeds, chats, white papers, web pages, etc. Such data are considered very valuable for companies since they will help them in improving their strategies, analyzing their customers' trends or their competitors' marketing interventions is a simple and obvious example. Also, with the advent of the era of Big Data, organizations can obtain information about the dynamic environment of the markets by analyzing consumer's reactions, preferences, opinions and rating on various social media and other networking platforms. Thus, the companies should be equipped with the consciousness of competitive intelligence (CI), and grasp the key points of CI, with setting up an efficient and simple competitive intelligence system adapted to support Big Data. The objective of this thesis is to introduce a new architectural model of Big Data collecting, analyzing and using, named XEW 2.0. This system operates according to four principal steps, where each of which has a dedicated service : (i) XEW sourcing service (XEW-SS), allows searching, collecting, and processing the data from different sources ; (ii) XEW data warehousing services (XEW-DWS) : This brings a unified view of the target corpus and then, creates a data warehouse accessible from the analytics and visualization services ; (iii) XEW Big Data Analytics service (XEW-BDAS) : allows for making multidimensional analyses by adapting data mining algorithms to Big Data ; (iv) XEW Big Data Visualization service (XEW-BDVS) : allows visualizing Big Data in the form of innovative design and graphs representing, for instance, social networks, semantic networks, strategic alliances networks, etc.
307

Delayed Transfer Entropy applied to Big Data / Delayed Transfer Entropy aplicado a Big Data

Jonas Rossi Dourado 30 November 2018 (has links)
Recent popularization of technologies such as Smartphones, Wearables, Internet of Things, Social Networks and Video streaming increased data creation. Dealing with extensive data sets led the creation of term big data, often defined as when data volume, acquisition rate or representation demands nontraditional approaches to data analysis or requires horizontal scaling for data processing. Analysis is the most important Big Data phase, where it has the objective of extracting meaningful and often hidden information. One example of Big Data hidden information is causality, which can be inferred with Delayed Transfer Entropy (DTE). Despite DTE wide applicability, it has a high demanding processing power which is aggravated with large datasets as those found in big data. This research optimized DTE performance and modified existing code to enable DTE execution on a computer cluster. With big data trend in sight, this results may enable bigger datasets analysis or better statistical evidence. / A recente popularização de tecnologias como Smartphones, Wearables, Internet das Coisas, Redes Sociais e streaming de Video aumentou a criação de dados. A manipulação de grande quantidade de dados levou a criação do termo Big Data, muitas vezes definido como quando o volume, a taxa de aquisição ou a representação dos dados demanda abordagens não tradicionais para analisar ou requer uma escala horizontal para o processamento de dados. A análise é a etapa de Big Data mais importante, tendo como objetivo extrair informações relevantes e às vezes escondidas. Um exemplo de informação escondida é a causalidade, que pode ser inferida utilizando Delayed Transfer Entropy (DTE). Apesar do DTE ter uma grande aplicabilidade, ele possui uma grande demanda computacional, esta última, é agravada devido a grandes bases de dados como as encontradas em Big Data. Essa pesquisa otimizou e modificou o código existente para permitir a execução de DTE em um cluster de computadores. Com a tendência de Big Data em vista, esse resultado pode permitir bancos de dados maiores ou melhores evidências estatísticas.
308

Big Data em conteúdo espontâneo não-estruturado da internet como estratégia organizacional de orientação para o mercado

Corrêa Junior, Dirceu Silva Mello 25 April 2018 (has links)
Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2018-09-25T15:42:43Z No. of bitstreams: 1 Dirceu Silva Mello Corrêa Junior_.pdf: 5130564 bytes, checksum: 9921c0e8eafdc5eb26cc6cf6211bdb01 (MD5) / Made available in DSpace on 2018-09-25T15:42:43Z (GMT). No. of bitstreams: 1 Dirceu Silva Mello Corrêa Junior_.pdf: 5130564 bytes, checksum: 9921c0e8eafdc5eb26cc6cf6211bdb01 (MD5) Previous issue date: 2018-04-25 / Nenhuma / O Big Data é uma realidade social, com crescente impacto nos negócios. Entretanto, uma pesquisa realizada com executivos americanos de grandes corporações identificou uma baixa capacidade no aproveitamento efetivo dessa oportunidade de inteligência competitiva em suas empresas. Ao aprofundar o entendimento desse contexto, a partir da perspectiva de Orientação para o Mercado, a presente dissertação apresentou uma análise exploratória sobre a atual capacidade de grandes empresas com atuação nacional em absorver valor do Big Data, focando sua atenção num tipo específico de conteúdo, chamado Dado Não-Estruturado. Como resultado, identificou-se que as empresas estudadas se encontram em um momento peculiar para a gestão moderna de Orientação para o Mercado, uma espécie de processo evolutivo e de transição na compreensão e aproveitamento desse dilúvio de dados. Tal momento de adaptação é ainda reforçado por uma tendência para o uso de dados mais espontâneos dos consumidores. Neste estudo inicialmente são apresentadas cinco dimensões desse momento peculiar, abordando sistematicamente quesitos relacionados à organização interna; fornecedores e perfis de investimentos; adaptações internas; entre outros achados estratégicos. Após, também é detalhada a atual caminhada na efetiva compreensão do Big Data, a partir das práticas possíveis identificadas nesse contexto empresarial. / Big Data is a social reality with growing business impact. However, a survey of US executives of large corporations identified a low capacity to effectively exploit this competitive intelligence opportunity in their companies. In order to deepen the understanding of this context, from the perspective of Market Orientation, the present dissertation presented an exploratory analysis about the current capacity of large companies with national performance in absorbing Big Data value, focusing their attention on a type of content, called Unstructured Data. As a result, it was identified that the companies studied are in a peculiar moment for the modern management of Market Orientation, a sort of evolutionary process and of transition in the understanding and use of this deluge of data. This moment of adaptation is further reinforced by a trend towards the use of more spontaneous data from consumers. In this study, five dimensions of this peculiar moment are presented, systematically addressing questions related to internal organization; suppliers and investment profiles; internal adaptations; among other strategic findings. Afterwards, the current path to the understanding of Big Data is also detailed, based on the possible practices identified in this business context.
309

Modélisation NoSQL des entrepôts de données multidimensionnelles massives / Modeling Multidimensional Data Warehouses into NoSQL

El Malki, Mohammed 08 December 2016 (has links)
Les systèmes d’aide à la décision occupent une place prépondérante au sein des entreprises et des grandes organisations, pour permettre des analyses dédiées à la prise de décisions. Avec l’avènement du big data, le volume des données d’analyses atteint des tailles critiques, défiant les approches classiques d’entreposage de données, dont les solutions actuelles reposent principalement sur des bases de données R-OLAP. Avec l’apparition des grandes plateformes Web telles que Google, Facebook, Twitter, Amazon… des solutions pour gérer les mégadonnées (Big Data) ont été développées et appelées « Not Only SQL ». Ces nouvelles approches constituent une voie intéressante pour la construction des entrepôts de données multidimensionnelles capables de supporter des grandes masses de données. La remise en cause de l’approche R-OLAP nécessite de revisiter les principes de la modélisation des entrepôts de données multidimensionnelles. Dans ce manuscrit, nous avons proposé des processus d’implantation des entrepôts de données multidimensionnelles avec les modèles NoSQL. Nous avons défini quatre processus pour chacun des deux modèles NoSQL orienté colonnes et orienté documents. De plus, le contexte NoSQL rend également plus complexe le calcul efficace de pré-agrégats qui sont habituellement mis en place dans le contexte ROLAP (treillis). Nous avons élargis nos processus d’implantations pour prendre en compte la construction du treillis dans les deux modèles retenus.Comme il est difficile de choisir une seule implantation NoSQL supportant efficacement tous les traitements applicables, nous avons proposé deux processus de traductions, le premier concerne des processus intra-modèles, c’est-à-dire des règles de passage d’une implantation à une autre implantation du même modèle logique NoSQL, tandis que le second processus définit les règles de transformation d’une implantation d’un modèle logique vers une autre implantation d’un autre modèle logique. / Decision support systems occupy a large space in companies and large organizations in order to enable analyzes dedicated to decision making. With the advent of big data, the volume of analyzed data reaches critical sizes, challenging conventional approaches to data warehousing, for which current solutions are mainly based on R-OLAP databases. With the emergence of major Web platforms such as Google, Facebook, Twitter, Amazon...etc, many solutions to process big data are developed and called "Not Only SQL". These new approaches are an interesting attempt to build multidimensional data warehouse capable of handling large volumes of data. The questioning of the R-OLAP approach requires revisiting the principles of modeling multidimensional data warehouses.In this manuscript, we proposed implementation processes of multidimensional data warehouses with NoSQL models. We defined four processes for each model; an oriented NoSQL column model and an oriented documents model. Each of these processes fosters a specific treatment. Moreover, the NoSQL context adds complexity to the computation of effective pre-aggregates that are typically set up within the ROLAP context (lattice). We have enlarged our implementations processes to take into account the construction of the lattice in both detained models.As it is difficult to choose a single NoSQL implementation that supports effectively all the applicable treatments, we proposed two translation processes. While the first one concerns intra-models processes, i.e., pass rules from an implementation to another of the same NoSQL logic model, the second process defines the transformation rules of a logic model implementation to another implementation on another logic model.
310

Modelagem de sistemas de informação para a mineração de processos: características e propriedades das linguagens / Information systems modeling for a process mining: characteristics and properties of languages

Teixeira Junior, Gilmar 03 May 2017 (has links)
Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2017-07-19T11:05:26Z No. of bitstreams: 2 Dissertação - Gilmar Teixeira Junior - 2017.pdf: 6982787 bytes, checksum: c52c456e0cb8184f1f7144d862bff726 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-07-19T11:05:52Z (GMT) No. of bitstreams: 2 Dissertação - Gilmar Teixeira Junior - 2017.pdf: 6982787 bytes, checksum: c52c456e0cb8184f1f7144d862bff726 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-07-19T11:05:52Z (GMT). No. of bitstreams: 2 Dissertação - Gilmar Teixeira Junior - 2017.pdf: 6982787 bytes, checksum: c52c456e0cb8184f1f7144d862bff726 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-05-03 / Storing information in large data repositories (Big Data) creates opportunities for Organizations to use Process Mining techniques to extract knowledge about the performance and actual flow of their processes of business. One of the fundamental elements for achieving this objective is the relationship between process modeling languages, process event logging (logs) and Process Mining algorithms. In this work, comparisons were made between three languages (BPMN, Petri Nets and YAWL) which are usually used to model business processes with respect to their capabilities of use in Process Mining, especially in Process Discovery. The models created were based on typical Workflow patterns and five scenarios were simulated for each language using three Process Discovery algorithms (Alpha, Heuristic Miner and ILP Miner). The results indicate that the choice of the language used in the modeling and in recording of the business processes influences the quality of the results obtained by the Process Discovery algorithms. This work also presents suggestions for the development of process modeling languages and process mining algorithms. / O armazenamento das informações em grandes repositórios de dados (Big Data) geram oportunidades para que as Organizações utilizem técnicas de Mineração de Processos (Process Mining) para extrair conhecimento sobre o desempenho e o fluxo real de seus processos de negócio. Um dos elementos fundamentais para que este objetivo seja alcançado está na relação entre as linguagens de modelagem de processos, o registro dos eventos de processo (logs) e os algoritmos de Mineração de Processos. Neste trabalho, foram realizadas comparações entre três linguagens (BPMN, Redes de Petri e YAWL) normalmente utilizadas para modelar processos de negócio com respeito a suas capacidades de utilização na Mineração de Processos, em especial, na Descoberta de Processos. Os modelos criados foram baseados em padrões típicos de Workflow e cinco cenários foram simulados para cada linguagem utilizando três algoritmos de Descoberta de Processos (Alpha, Heuristic Miner e ILP Miner). Os resultados indicam que a escolha da linguagem utilizada na modelagem e no registro dos processos de negócio influencia na qualidade dos resultados obtidos pelos algoritmos de Descoberta de Processos. O trabalho também apresenta sugestões para o desenvolvimento das linguagens de modelagem de processos e dos algoritmos de Mineração de Processos.

Page generated in 0.0462 seconds