Global ETD Search

261	Une approche automatisée basée sur des contraintes d'intégrité définies en UML et OCL pour la vérification de la cohérence logique dans les systèmes SOLAP : Applications dans le domaine agri-environnemental Boulil, Kamal 26 October 2012 (has links) (PDF) Les systèmes d'Entrepôts de Données et OLAP spatiaux (EDS et SOLAP) sont des technologies d'aide à la décision permettant l'analyse multidimensionnelle de gros volumes de données spatiales. Dans ces systèmes, la qualité de l'analyse dépend de trois facteurs : la qualité des données entreposées, la qualité des agrégations et la qualité de l'exploration des données. La qualité des données entreposées dépend de critères comme la précision, l'exhaustivité et la cohérence logique. La qualité d'agrégation dépend de problèmes structurels (e.g. les hiérarchies non strictes qui peuvent engendrer le comptage en double des mesures) et de problèmes sémantiques (e.g. agréger les valeurs de température par la fonction Sum peut ne pas avoir de sens considérant une application donnée). La qualité d'exploration est essentiellement affectée par des requêtes utilisateur inconsistantes (e.g. quelles ont été les valeurs de température en URSS en 2010 ?). Ces requêtes peuvent engendrer des interprétations erronées des résultats. Cette thèse s'attaque aux problèmes d'incohérence logique qui peuvent affecter les qualités de données, d'agrégation et d'exploration. L'incohérence logique est définie habituellement comme la présence de contradictions dans les données. Elle est typiquement contrôlée au moyen de Contraintes d'Intégrité (CI). Dans cette thèse nous étendons d'abord la notion de CI (dans le contexte des systèmes SOLAP) afin de prendre en compte les incohérences relatives aux agrégations et requêtes utilisateur. Pour pallier les limitations des approches existantes concernant la définition des CI SOLAP, nous proposons un Framework basé sur les langages standards UML et OCL. Ce Framework permet la spécification conceptuelle et indépendante des plates-formes des CI SOLAP et leur implémentation automatisée. Il comporte trois parties : (1) Une classification des CI SOLAP. (2) Un profil UML implémenté dans l'AGL MagicDraw, permettant la représentation conceptuelle des modèles des systèmes SOLAP et de leurs CI. (3) Une implémentation automatique qui est basée sur les générateurs de code Spatial OCL2SQL et UML2MDX qui permet de traduire les spécifications conceptuelles en code au niveau des couches EDS et serveur SOLAP. Enfin, les contributions de cette thèse ont été appliquées dans le cadre de projets nationaux de développement d'applications (S)OLAP pour l'agriculture et l'environnement. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre OLAP Spatial Entrepôt de Données Spatiales Qualité de Données Qualité d'Agrégation de Données Qualité d'Exploration SOLAP Modélisation Multidimensionnelle Profil UML Langage de Contraintes Objet Génération de Code
262	Newsminer: um sistema de data warehouse baseado em texto de notícias / Newsminer: a data warehouse system based on news websites Nogueira, Rodrigo Ramos 12 May 2017 (has links) Submitted by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:12:56Z No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:14:04Z (GMT) No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:14:13Z (GMT) No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) / Made available in DSpace on 2017-10-09T14:14:24Z (GMT). No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) Previous issue date: 2017-05-12 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Data and text mining applications managing Web data have been the subject of recent research. In every case, data mining tasks need to work on clean, consistent, and integrated data for obtaining the best results. Thus, Data Warehouse environments are a valuable source of clean, integrated data for data mining applications. Data Warehouse technology has evolved to retrieve and process data from the Web. In particular, news websites are rich sources that can compose a linguistic corpus. By inserting corpus into a Data Warehousing environment, applications can take advantage of the flexibility that a multidimensional model and OLAP operations provide. Among the benefits are the navigation through the data, the selection of the part of the data considered relevant, data analysis at different levels of abstraction, and aggregation, disaggregation, rotation and filtering over any set of data. This paper presents Newsminer, a data warehouse environment, which provides a consistent and clean set of texts in the form of a multidimensional corpus for consumption by external applications and users. The proposal includes an architecture that integrates the gathering of news in real time, a semantic enrichment module as part of the ETL stage, which adds semantic properties to the data such as news category and POS-tagging annotation and the access to data cubes for consumption by applications and users. Two experiments were performed. The first experiment selects the best news classifier for the semantic enrichment module. The statistical analysis of the results indicated that the Perceptron classifier achieved the best results of F-measure, with a good result of computational time. The second experiment collected data to evaluate real-time news preprocessing. For the data set collected, the results indicated that it is possible to achieve online processing time. / As aplicações de mineração de dados e textos oriundos da Internet têm sido alvo de recentes pesquisas. E, em todos os casos, as tarefas de mineração de dados necessitam trabalhar sobre dados limpos, consistentes e integrados para obter os melhores resultados. Sendo assim, ambientes de Data Warehouse são uma valiosa fonte de dados limpos e integrados para as aplicações de mineração. A tecnologia de Data Warehouse tem evoluído no sentido de recuperar e tratar dados provenientes da Web. Em particular, os sites de notícias são fontes ricas em textos, que podem compor um corpus linguístico. Inserindo o corpus em um ambiente de Data Warehouse, as aplicações poderão tirar proveito da flexibilidade que um modelo multidimensional e as operações OLAP fornecem. Dentre as vantagens estão a navegação pelos dados, a seleção da parte dos dados considerados relevantes, a análise dos dados em diferentes níveis de abstração, e a agregação, desagregação, rotação e filtragem sobre qualquer conjunto de dados. Este trabalho apresenta o ambiente de Data Warehouse Newsminer, que fornece um conjunto de textos consistente e limpo, na forma de um corpus multidimensional para consumo por aplicações externas e usuários. A proposta inclui uma arquitetura que integra a coleta textos de notícias em tempo próximo do tempo real, um módulo de enriquecimento semântico como parte da etapa de ETL, que acrescenta propriedades semânticas aos dados coletados tais como a categoria da notícia e a anotação POS-tagging, e a disponibilização de cubos de dados para consumo por aplicações e usuários. Foram executados dois experimentos. O primeiro experimento é relacionado à escolha do melhor classificador de categorias das notícias do módulo de enriquecimento semântico. A análise estatística dos resultados indicou que o classificador Perceptron atingiu os melhores resultados de F-medida, com resultado bom de tempo de processamento. O segundo experimento coletou dados para avaliar o pré-processamento de notícias em tempo real. Para o conjunto de dados coletados, os resultados indicaram que é possível atingir tempo de processamento online. / OB800972 Mineração de dados (Computação) Sites da Web Corpora multidimensional Enriquecimento semântico Categorização de notícias OLAP Multidimensional corpora Data mining Web sites Data Warehouse News websites Semantic enrichment News categorization
263	Plan Bouquets : An Exploratory Approach to Robust Query Processing Dutt, Anshuman January 2016 (has links) (PDF) Over the last four decades, relational database systems, with their mathematical basis in first-order logic, have provided a congenial and efficient environment to handle enterprise data during its entire life cycle of generation, storage, maintenance and processing. An organic reason for their pervasive popularity is intrinsic support for declarative user queries, wherein the user only specifies the end objectives, and the system takes on the responsibility of identifying the most efficient means, called “plans”, to achieve these objectives. A crucial input to generating efficient query execution plans are the compile-time estimates of the data volumes that are output by the operators implementing the algebraic predicates present in the query. These volume estimates are typically computed using the “selectivities” of the predicates. Unfortunately, a pervasive problem encountered in practice is that these selectivities often differ significantly from the values actually encountered during query execution, leading to poor plan choices and grossly inflated response times. While the database research community has spent considerable efforts to address the above challenge, the prior techniques all suffer from a systemic limitation - the inability to provide any guarantees on the execution performance. In this thesis, we materially address this long-standing open problem by developing a radically different query processing strategy that lends itself to attractive guarantees on run-time performance. Specifically, in our approach, the compile-time estimation process is completely eschewed for error-prone selectivities. Instead, from the set of optimal plans in the query’s selectivity error space, a limited subset called the “plan bouquet”, is selected such that at least one of the bouquet plans is 2-optimal at each location in the space. Then, at run time, an exploratory sequence of cost-budgeted executions from the plan bouquet is carried out, eventually finding a plan that executes to completion within its assigned budget. The duration and switching of these executions is controlled by a graded progression of isosurfaces projected onto the optimal performance profile. We prove that this construction provides viable guarantees on the worst-case performance relative to an oracular system that magically possesses accurate apriori knowledge of all selectivities. Moreover, it ensures repeatable execution strategies across different invocations of a query, an extremely desirable feature in industrial settings. Our second contribution is a suite of techniques that substantively improve on the performance guarantees offered by the basic bouquet algorithm. First, we present an algorithm that skips carefully chosen executions from the basic plan bouquet sequence, leveraging the observation that an expensive execution may provide better coverage as compared to a series of cheaper siblings, thereby reducing the aggregate exploratory overheads. Next, we explore randomized variants with regard to both the sequence of plan executions and the constitution of the plan bouquet, and show that the resulting guarantees are markedly superior, in expectation, to the corresponding worst case values. From a deployment perspective, the above techniques are appealing since they are completely “black-box”, that is, non-invasive with regard to the database engine, implementable using only API features that are commonly available in modern systems. As a proof of concept, the bouquet approach has been fully prototyped in QUEST, a Java-based tool that provides a visual and interactive demonstration of the bouquet identification and execution phases. In similar spirit, we propose an efficient isosurface identification algorithm that avoids exploration of large portions of the error space and drastically reduces the effort involved in bouquet construction. The plan bouquet approach is ideally suited for “canned” query environments, where the computational investment in bouquet identification is amortized over multiple query invocations. The final contribution of this thesis is extending the advantage of compile-time sub-optimality guarantees to ad hoc query environments where the overheads of the off-line bouquet identification may turn out to be impractical. Specifically, we propose a completely revamped bouquet algorithm that constructs the cost-budgeted execution sequence in an “on-the-fly” manner. This is achieved through a “white-box” interaction style with the engine, whereby the plan output cardinalities exposed by the engine are used to compute lower bounds on the error-prone selectivities during plan executions. For this algorithm, the sub-optimality guarantees are in the form of a low order polynomial of the number of error-prone selectivities in the query. The plan bouquet approach has been empirically evaluated on both PostgreSQL and a commercial engine ComOpt, over the TPC-H and TPC-DS benchmark environments. Our experimental results indicate that it delivers orders of magnitude improvements in the worst-case behavior, without impairing the average-case performance, as compared to the native optimizers of these systems. In absolute terms, the worst case sub-optimality is upper bounded by 20 across the suite of queries, and the average performance is empirically found to be within a factor of 4 wrt the optimal. Even with the on-the-fly bouquet algorithm, the guarantees are found to be within a factor of 3 as compared to those achievable in the corresponding canned query environment. Overall, the plan bouquet approach provides novel performance guarantees that open up exciting possibilities for robust query processing. Robust Query Processing Plan Bouquets Improve Robustness Bounds Plan Bouquet Architecture Query Processing Monadic Second Order Logic Plan Bouquet Approach Plan Bouquet Sequence Query Processing Computer Science
264	Datový sklad pro analýzu územněsprávních celků / Applying Business Intelligence tools for the Financial and Property Analysis of Municipalities Horký, Martin January 2008 (has links) This master thesis deals with the software support of one Financial and Property Analysis of Municipalities (FAMA). The co-author of this method is Doctor Petr Toth from the Institute of Public Administration and Regional Development at University of Economics in Prague. The method is supported by Microsoft Access database application and two C++ applications ArisDestiller and Prognoza at present time. The aim of this thesis is to evaluate the present software support of Financial and Property Analysis and implement an alternative solution based on Business Intelligence tools. Chosen part of analysis is implemented in the pilot project using Microsoft SQL Server 2005 environment. The thesis focuses mainly on using integration tools for XML data. The contribution of the thesis is to implement data warehouse, which uses the municipalities'annual bills in XML format from the database of Ministry of Finance of the Czech Republic. The Financial and Property Analysis method and the current software support are discussed in the first half of this thesis. The other half of the work is dedicated to the Business Intelligence pilot project. Both means of software support are reviewed and compared from various perspectives in the conclusion.
265	Podpora rozhodování pomocí podnikových informačních systémů společnosti Exact Software / Decision Making Support Using Exact Software's Enterprise Information Systems Pitka, Lukáš January 2009 (has links) This diploma thesis deals with an information technology support of organization's decision making processes using Exact Software's enterprise information systems. The main goal of this paper is a demonstration of various Exact Software applications using real world examples from several Czech enterprises. Illustration of various benefits of integrated enterprise systems architecture is another goal of this paper. Possible benefits gained during the process of implementation of Exact Software's applications and necessary requirements that have to be fulfilled create the last goal. Risks and issues that can be encountered during the implementation process are mentioned along with benefits. To achieve these goals this diploma paper contains a lot of practical illustrations supplemented by theoretic explanation. Paper includes four parts. First part is an introduction into decision making processes in enterprises and provides a general overview of Exact Software's enterprise information systems portfolio. Another area covered by the first part is an overview of possible benefits of these systems in according to the level of organization management. Second part is dedicated to Exact Synergy Enterprise. At the beginning of this part there are outlined procedures, benefits and assumptions necessary for successful system implementation. The following text in part two contains many real world examples from the decision making point of view. Third part covers Business Intelligence application Exact Business Analytics from the standpoint of implementation guidelines, prerequisites and benefits. Subsequent subchapters contain practical examples of usage of this system. The last part is comprised from two case studies of Exact Software's systems implementation -- first one is a financial counseling company, second one is a company from the IS/ICT area. The paper as a whole represents unique material for companies that are looking for solutions on their decision making and analytical procedures problems. On the basis of this paper the reader can create an overview of current capabilities of enterprise information systems. A vast majority of presented solutions are system independent i.e. the usage of systems from Exact Software is only exemplary. Case studies of complex enterprise IS architectures in real world enterprises represents another great contribution to this diploma paper.
266	Návrh metodiky testování BI řešení / Design of methodology for BI solutions testing Jakubičková, Nela January 2011 (has links) This thesis deals with Business Intelligence and its testing. It seeks to highlight the differences from the classical software testing and finally design a methodology for BI solutions testing that could be used in practice on real projects of BI companies. The aim of thesis is to design a methodology for BI solutions testing based on theoretical knowledge of Business Intelligence and software testing with an emphasis on the specific BI characteristics and requirements and also in accordance with Clever Decision's requirements and test it in practice on a real project in this company. The paper is written up on the basis of studying literature in the field of Business Intelligence and software testing from Czech and foreign sources as well as on the recommendations and experience of Clever Decision's employees. It is one of the few if not the first sources dealing with methodology for BI solutions testing in the Czech language. This work could also serve as a basis for more comprehensive methodologies of BI solutions testing. The thesis can be divided into theoretical and practical part. The theoretical part tries to explain the purpose of Business Intelligence use in enterprises. It elucidates particular components of the BI solution, then the actual software testing, various types of tests, with emphasis on the differences and specificities of Business Intelligence. The theoretical part is followed by designed methodology for BI solutions using a generic model for the BI/DW solution testing. The practical part's highlight is the description of real BI project testing in Clever Decision according to the designed methodology.
267	Data marts as management information delivery mechanisms: utilisation in manufacturing organisations with third party distribution Ponelis, S.R. (Shana Rachel) 06 August 2003 (has links) Customer knowledge plays a vital part in organisations today, particularly in sales and marketing processes, where customers can either be channel partners or final consumers. Managing customer data and/or information across business units, departments, and functions is vital. Frequently, channel partners gather and capture data about downstream customers and consumers that organisations further upstream in the channel require to be incorporated into their information systems in order to allow for management information delivery to their users. In this study, the focus is placed on manufacturing organisations using third party distribution since the flow of information between channel partner organisations in a supply chain (in contrast to the flow of products) provides an important link between organisations and increasingly represents a source of competitive advantage in the marketplace. The purpose of this study is to determine whether there is a significant difference in the use of sales and marketing data marts as management information delivery mechanisms in manufacturing organisations in different industries, particularly the pharmaceuticals and branded consumer products. The case studies presented in this dissertation indicates that there are significant differences between the use of sales and marketing data marts in different manufacturing industries, which can be ascribed to the industry, both directly and indirectly. / Thesis (MIS(Information Science))--University of Pretoria, 2002. / Information Science / MIS / unrestricted Pharmaceuticals Online analytical processing (olap) Supply chain Data mart Information dissemination Data warehouse Manufacturing Branded consumer products Information systems Data communications Sales Marketing Management information requirements Query and reporting Third party distribution Sales forecasting UCTD
268	[en] OLAP2DATACUBE: AN ON-DEMAND TRANSFORMATION FRAMEWORK FROM OLAP TO RDF DATA CUBES / [pt] OLAP2DATACUBE: UM FRAMEWORK PARA TRANSFORMAÇÕES EM TEMPO DE EXECUÇÃO DE OLAP PARA CUBOS DE DADOS EM RDF PERCY ENRIQUE RIVERA SALAS 13 April 2016 (has links) [pt] Dados estatísticos são uma das mais importantes fontes de informações, relevantes para um grande número de partes interessadas nos domínios governamentais, científicos e de negócios. Um conjunto de dados estatísticos compreende uma coleção de observações feitas em alguns pontos através de um espaço lógico e muitas vezes é organizado como cubos de dados. A definição adequada de cubos de dados, especialmente das suas dimensões, ajuda a processar as observações e, mais importante, ajuda a combinar observações de diferentes cubos de dados. Neste contexto, os princípios de Linked Data podem ser proveitosamente aplicados na definição de cubos de dados, no sentido de que os princípios oferecem uma estratégia para fornecer a semântica ausentes nas dimensões, incluindo os seus valores. Nesta tese, descrevemos o processo e a implementação de uma arquitetura de mediação, chamada OLAP2DataCube On Demand Framework, que ajuda a descrever e consumir dados estatísticos, expostos como triplas RDF, mas armazenados em bancos de dados relacionais. O Framework possui um catálogo de descrições de Linked Data Cubes, criado de acordo com os princípios de Linked Data. O catálogo tem uma descrição padronizada para cada cubo de dados armazenado em bancos de dados (relacionais) estatísticos conhecidos pelo Framework. O Framework oferece uma interface para navegar pelas descrições dos Linked Data Cubes e para exportar os cubos de dados como triplas RDF geradas por demanda a partir das fontes de dados subjacentes. Também discutimos a implementação de operações sofisticadas de busca de metadados, operações OLAP em cubo de dados, tais como slice e dice, e operações de mashup sofisticadas de cubo de dados que criam novos cubos através da combinação de outros cubos. / [en] Statistical data is one of the most important sources of information, relevant to a large number of stakeholders in the governmental, scientific and business domains alike. A statistical data set comprises a collection of observations made at some points across a logical space and is often organized as what is called a data cube. The proper definition of the data cubes, especially of their dimensions, helps processing the observations and, more importantly, helps combining observations from different data cubes. In this context, the Linked Data principles can be profitably applied to the definition of data cubes, in the sense that the principles offer a strategy to provide the missing semantics of the dimensions, including their values. In this thesis we describe the process and the implementation of a mediation architecture, called OLAP2DataCube On Demand, which helps describe and consume statistical data, exposed as RDF triples, but stored in relational databases. The tool features a catalogue of Linked Data Cube descriptions, created according to the Linked Data principles. The catalogue has a standardized description for each data cube actually stored in each statistical (relational) database known to the tool. The tool offers an interface to browse the linked data cube descriptions and to export the data cubes as RDF triples, generated on demand from the underlying data sources. We also discuss the implementation of sophisticated metadata search operations, OLAP data cube operations, such as slice and dice, and data cube mashup operations that create new cubes by combining other cubes. [pt] MODELOS MULTIDIMENSIONAIS [pt] MASHUP DE DADOS [pt] OPERACOES OLAP [pt] R2RML [pt] TRIPLIFICACAO [pt] DADOS ESTATISTICOS [pt] LINKED DATA [pt] INTEGRACAO DE DADOS [en] MULTIDIMENSIONAL MODEL [en] R2RML [en] TRIPLIFICATION [en] STATISTICAL DATA [en] LINKED DATA [en] DATA INTEGRATION
269	Metodika vývoje a nasazování Business Intelligence v malých a středních podnicích / Methodology of development and deployment of Business Intelligence solutions in Small and Medium Sized Enterprises Rydzi, Daniel January 2005 (has links) Dissertation thesis deals with development and implementation of Business Intelligence (BI) solutions for Small and Medium Sized Enterprises (SME) in the Czech Republic. This thesis represents climax of author's up to now effort that has been put into completing a methodological model for development of this kind of applications for SMEs using self-owned skills and minimum of external resources and costs. This thesis can be divided into five major parts. First part that describes used technologies is divided into two chapters. First chapter describes contemporary state of Business Intelligence concept and it also contains original taxonomy of Business Intelligence solutions. Second chapter describes two Knowledge Discovery in Databases (KDD) techniques that were used for building those BI solutions that are introduced in case studies. Second part describes the area of Czech SMEs, which is an environment where the thesis was written and which it is meant to contribute to. This environment is represented by one chapter that defines the differences of SMEs against large corporations. Furthermore, there are author's reasons why he is personally focusing on this area explained. Third major part introduces the results of survey that was conducted among Czech SMEs with support of Department of Information Technologies of Faculty of Informatics and Statistics of University of Economics in Prague. This survey had three objectives. First one was to map the readiness of Czech SMEs for BI solutions development and deployment. Second was to determine major problems and consequent decisions of Czech SMEs that could be supported by BI solutions and the third objective was to determine top factors preventing SMEs from developing and deploying BI solutions. Fourth part of the thesis is also the core one. In two chapters there is the original Methodology for development and deployment of BI solutions by SMEs described as well as other methodologies that were studied. Original methodology is partly based on famous CRISP-DM methodology. Finally, last part describes particular company that has become a testing ground for author's theories and that supports his research. In further chapters it introduces case-studies of development and deployment of those BI solutions in this company, that were build using contemporary BI and KDD techniques with respect to original methodology. In that sense, these case-studies verified theoretical methodology in real use.
270	A data management and analytic model for business intelligence applications Banda, Misheck 05 1900 (has links) Most organisations use several data management and business intelligence solutions which are on-premise and, or cloud-based to manage and analyse their constantly growing business data. Challenges faced by organisations nowadays include, but are not limited to growth limitations, big data, inadequate analytics, computing, and data storage capabilities. Although these organisations are able to generate reports and dashboards for decision-making in most cases, effective use of their business data and an appropriate business intelligence solution could achieve and retain informed decision-making and allow competitive reaction to the dynamic external environment. A data management and analytic model has been proposed on which organisations could rely for decisive guidance when planning to procure and implement a unified business intelligence solution. To achieve a sound model, literature was reviewed by extensively studying business intelligence in general, and exploring and developing various deployment models and architectures consisting of naïve, on-premise, and cloud-based which revealed their benefits and challenges. The outcome of the literature review was the development of a hybrid business intelligence model and the accompanying architecture as the main contribution to the study.In order to assess the state of business intelligence utilisation, and to validate and improve the proposed architecture, two case studies targeting users and experts were conducted using quantitative and qualitative approaches. The case studies found and established that a decision to procure and implement a successful business intelligence solution is based on a number of crucial elements, such as, applications, devices, tools, business intelligence services, data management and infrastructure. The findings further recognised that the proposed hybrid architecture is the solution for managing complex organisations with serious data challenges. / Computing / M. Sc. (Computing) Analytics Business intelligence applications Big data Data management Data warehouse Cloud/cloud computing Extract Transform Load On-line analytical processing On-line transaction processing Open source licensed BI tools BI Total cost of ownership 658.472 Business intelligence Big data Data warehousing Cloud computing OLAP technology

Search results