Spelling suggestions: "subject:"olla""
261 |
Newsminer: um sistema de data warehouse baseado em texto de notícias / Newsminer: a data warehouse system based on news websitesNogueira, Rodrigo Ramos 12 May 2017 (has links)
Submitted by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:12:56Z
No. of bitstreams: 1
NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:14:04Z (GMT) No. of bitstreams: 1
NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:14:13Z (GMT) No. of bitstreams: 1
NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) / Made available in DSpace on 2017-10-09T14:14:24Z (GMT). No. of bitstreams: 1
NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5)
Previous issue date: 2017-05-12 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Data and text mining applications managing Web data have been the subject of recent research. In every case, data mining tasks need to work on clean, consistent, and integrated data for obtaining the best results. Thus, Data Warehouse environments are a valuable source of clean, integrated data for data mining applications. Data Warehouse technology has evolved to retrieve and process data from the Web. In particular, news websites are rich sources that can compose a linguistic corpus. By inserting corpus into a Data Warehousing environment, applications can take advantage of the flexibility that a multidimensional model and OLAP operations provide. Among the benefits are the navigation through the data, the selection of the part of the data considered relevant, data analysis at different levels of abstraction, and aggregation, disaggregation, rotation and filtering over any set of data. This paper presents Newsminer, a data warehouse environment, which provides a consistent and clean set of texts in the form of a multidimensional corpus for consumption by external applications and users. The proposal includes an architecture that integrates the gathering of news in real time, a semantic enrichment module as part of the ETL stage, which adds semantic properties to the data such as news category and POS-tagging annotation and the access to data cubes for consumption by applications and users. Two experiments were performed. The first experiment selects the best news classifier for the semantic enrichment module. The statistical analysis of the results indicated that the Perceptron classifier achieved the best results of F-measure, with a good result of computational time. The second experiment collected data to evaluate real-time news preprocessing. For the data set collected, the results indicated that it is possible to achieve online processing time. / As aplicações de mineração de dados e textos oriundos da Internet têm sido alvo de recentes pesquisas. E, em todos os casos, as tarefas de mineração de dados necessitam trabalhar sobre dados limpos, consistentes e integrados para obter os melhores resultados. Sendo assim, ambientes de Data Warehouse são uma valiosa fonte de dados limpos e integrados para as aplicações de mineração. A tecnologia de Data Warehouse tem evoluído no sentido de recuperar e tratar dados provenientes da Web. Em particular, os sites de notícias são fontes ricas em textos, que podem compor um corpus linguístico. Inserindo o corpus em um ambiente de Data Warehouse, as aplicações poderão tirar proveito da flexibilidade que um modelo multidimensional e as operações OLAP fornecem. Dentre as vantagens estão a navegação pelos dados, a seleção da parte dos dados considerados relevantes, a análise dos dados em diferentes níveis de abstração, e a agregação, desagregação, rotação e filtragem sobre qualquer conjunto de dados. Este trabalho apresenta o ambiente de Data Warehouse Newsminer, que fornece um conjunto de textos consistente e limpo, na forma de um corpus multidimensional para consumo por aplicações externas e usuários. A proposta inclui uma arquitetura que integra a coleta textos de notícias em tempo próximo do tempo real, um módulo de enriquecimento semântico como parte da etapa de ETL, que acrescenta propriedades semânticas aos dados coletados tais como a categoria da notícia e a anotação POS-tagging, e a disponibilização de cubos de dados para consumo por aplicações e usuários. Foram executados dois experimentos. O primeiro experimento é relacionado à escolha do melhor classificador de categorias das notícias do módulo de enriquecimento semântico. A análise estatística dos resultados indicou que o classificador Perceptron atingiu os melhores resultados de F-medida, com resultado bom de tempo de processamento. O segundo experimento coletou dados para avaliar o pré-processamento de notícias em tempo real. Para o conjunto de dados coletados, os resultados indicaram que é possível atingir tempo de processamento online. / OB800972
|
262 |
Plan Bouquets : An Exploratory Approach to Robust Query ProcessingDutt, Anshuman January 2016 (has links) (PDF)
Over the last four decades, relational database systems, with their mathematical basis in first-order logic, have provided a congenial and efficient environment to handle enterprise data during its entire life cycle of generation, storage, maintenance and processing. An organic reason for their pervasive popularity is intrinsic support for declarative user queries, wherein the user only specifies the end objectives, and the system takes on the responsibility of identifying the most efficient means, called “plans”, to achieve these objectives. A crucial input to generating efficient query execution plans are the compile-time estimates of the data volumes that are output by the operators implementing the algebraic predicates present in the query. These volume estimates are typically computed using the “selectivities” of the predicates. Unfortunately, a pervasive problem encountered in practice is that these selectivities often differ significantly from the values actually encountered during query execution, leading to poor plan choices and grossly inflated response times. While the database research community has spent considerable efforts to address the above challenge, the prior techniques all suffer from a systemic limitation - the inability to provide any guarantees on the execution performance.
In this thesis, we materially address this long-standing open problem by developing a radically different query processing strategy that lends itself to attractive guarantees on run-time performance. Specifically, in our approach, the compile-time estimation process is completely eschewed for error-prone selectivities. Instead, from the set of optimal plans in the query’s selectivity error space, a limited subset called the “plan bouquet”, is selected such that at least one of the bouquet plans is 2-optimal at each location in the space. Then, at run time, an exploratory sequence of cost-budgeted executions from the plan bouquet is carried out, eventually finding a plan that executes to completion within its assigned budget. The duration and switching of these executions is controlled by a graded progression of isosurfaces projected onto the optimal performance profile. We prove that this construction provides viable guarantees on the worst-case performance relative to an oracular system that magically possesses accurate apriori knowledge of all selectivities. Moreover, it ensures repeatable execution strategies across different invocations of a query, an extremely desirable feature in industrial settings.
Our second contribution is a suite of techniques that substantively improve on the performance guarantees offered by the basic bouquet algorithm. First, we present an algorithm that skips carefully chosen executions from the basic plan bouquet sequence, leveraging the observation that an expensive execution may provide better coverage as compared to a series of cheaper siblings, thereby reducing the aggregate exploratory overheads. Next, we explore randomized variants with regard to both the sequence of plan executions and the constitution of the plan bouquet, and show that the resulting guarantees are markedly superior, in expectation, to the corresponding worst case values.
From a deployment perspective, the above techniques are appealing since they are completely “black-box”, that is, non-invasive with regard to the database engine, implementable using only API features that are commonly available in modern systems. As a proof of concept, the bouquet approach has been fully prototyped in QUEST, a Java-based tool that provides a visual and interactive demonstration of the bouquet identification and execution phases. In similar spirit, we propose an efficient isosurface identification algorithm that avoids exploration of large portions of the error space and drastically reduces the effort involved in bouquet construction.
The plan bouquet approach is ideally suited for “canned” query environments, where the computational investment in bouquet identification is amortized over multiple query invocations. The final contribution of this thesis is extending the advantage of compile-time sub-optimality guarantees to ad hoc query environments where the overheads of the off-line bouquet identification may turn out to be impractical. Specifically, we propose a completely revamped bouquet algorithm that constructs the cost-budgeted execution sequence in an “on-the-fly” manner. This is achieved through a “white-box” interaction style with the engine, whereby the plan output cardinalities exposed by the engine are used to compute lower bounds on the error-prone selectivities during plan executions. For this algorithm, the sub-optimality guarantees are in the form of a low order polynomial of the number of error-prone selectivities in the query.
The plan bouquet approach has been empirically evaluated on both PostgreSQL and a commercial engine ComOpt, over the TPC-H and TPC-DS benchmark environments. Our experimental results indicate that it delivers orders of magnitude improvements in the worst-case behavior, without impairing the average-case performance, as compared to the native optimizers of these systems. In absolute terms, the worst case sub-optimality is upper bounded by 20 across the suite of queries, and the average performance is empirically found to be within a factor of 4 wrt the optimal. Even with the on-the-fly bouquet algorithm, the guarantees are found to be within a factor of 3 as compared to those achievable in the corresponding canned query environment.
Overall, the plan bouquet approach provides novel performance guarantees that open up exciting possibilities for robust query processing.
|
263 |
Datový sklad pro analýzu územněsprávních celků / Applying Business Intelligence tools for the Financial and Property Analysis of MunicipalitiesHorký, Martin January 2008 (has links)
This master thesis deals with the software support of one Financial and Property Analysis of Municipalities (FAMA). The co-author of this method is Doctor Petr Toth from the Institute of Public Administration and Regional Development at University of Economics in Prague. The method is supported by Microsoft Access database application and two C++ applications ArisDestiller and Prognoza at present time. The aim of this thesis is to evaluate the present software support of Financial and Property Analysis and implement an alternative solution based on Business Intelligence tools. Chosen part of analysis is implemented in the pilot project using Microsoft SQL Server 2005 environment. The thesis focuses mainly on using integration tools for XML data. The contribution of the thesis is to implement data warehouse, which uses the municipalities'annual bills in XML format from the database of Ministry of Finance of the Czech Republic. The Financial and Property Analysis method and the current software support are discussed in the first half of this thesis. The other half of the work is dedicated to the Business Intelligence pilot project. Both means of software support are reviewed and compared from various perspectives in the conclusion.
|
264 |
Podpora rozhodování pomocí podnikových informačních systémů společnosti Exact Software / Decision Making Support Using Exact Software's Enterprise Information SystemsPitka, Lukáš January 2009 (has links)
This diploma thesis deals with an information technology support of organization's decision making processes using Exact Software's enterprise information systems. The main goal of this paper is a demonstration of various Exact Software applications using real world examples from several Czech enterprises. Illustration of various benefits of integrated enterprise systems architecture is another goal of this paper. Possible benefits gained during the process of implementation of Exact Software's applications and necessary requirements that have to be fulfilled create the last goal. Risks and issues that can be encountered during the implementation process are mentioned along with benefits. To achieve these goals this diploma paper contains a lot of practical illustrations supplemented by theoretic explanation. Paper includes four parts. First part is an introduction into decision making processes in enterprises and provides a general overview of Exact Software's enterprise information systems portfolio. Another area covered by the first part is an overview of possible benefits of these systems in according to the level of organization management. Second part is dedicated to Exact Synergy Enterprise. At the beginning of this part there are outlined procedures, benefits and assumptions necessary for successful system implementation. The following text in part two contains many real world examples from the decision making point of view. Third part covers Business Intelligence application Exact Business Analytics from the standpoint of implementation guidelines, prerequisites and benefits. Subsequent subchapters contain practical examples of usage of this system. The last part is comprised from two case studies of Exact Software's systems implementation -- first one is a financial counseling company, second one is a company from the IS/ICT area. The paper as a whole represents unique material for companies that are looking for solutions on their decision making and analytical procedures problems. On the basis of this paper the reader can create an overview of current capabilities of enterprise information systems. A vast majority of presented solutions are system independent i.e. the usage of systems from Exact Software is only exemplary. Case studies of complex enterprise IS architectures in real world enterprises represents another great contribution to this diploma paper.
|
265 |
Návrh metodiky testování BI řešení / Design of methodology for BI solutions testingJakubičková, Nela January 2011 (has links)
This thesis deals with Business Intelligence and its testing. It seeks to highlight the differences from the classical software testing and finally design a methodology for BI solutions testing that could be used in practice on real projects of BI companies. The aim of thesis is to design a methodology for BI solutions testing based on theoretical knowledge of Business Intelligence and software testing with an emphasis on the specific BI characteristics and requirements and also in accordance with Clever Decision's requirements and test it in practice on a real project in this company. The paper is written up on the basis of studying literature in the field of Business Intelligence and software testing from Czech and foreign sources as well as on the recommendations and experience of Clever Decision's employees. It is one of the few if not the first sources dealing with methodology for BI solutions testing in the Czech language. This work could also serve as a basis for more comprehensive methodologies of BI solutions testing. The thesis can be divided into theoretical and practical part. The theoretical part tries to explain the purpose of Business Intelligence use in enterprises. It elucidates particular components of the BI solution, then the actual software testing, various types of tests, with emphasis on the differences and specificities of Business Intelligence. The theoretical part is followed by designed methodology for BI solutions using a generic model for the BI/DW solution testing. The practical part's highlight is the description of real BI project testing in Clever Decision according to the designed methodology.
|
266 |
Data marts as management information delivery mechanisms: utilisation in manufacturing organisations with third party distributionPonelis, S.R. (Shana Rachel) 06 August 2003 (has links)
Customer knowledge plays a vital part in organisations today, particularly in sales and marketing processes, where customers can either be channel partners or final consumers. Managing customer data and/or information across business units, departments, and functions is vital. Frequently, channel partners gather and capture data about downstream customers and consumers that organisations further upstream in the channel require to be incorporated into their information systems in order to allow for management information delivery to their users. In this study, the focus is placed on manufacturing organisations using third party distribution since the flow of information between channel partner organisations in a supply chain (in contrast to the flow of products) provides an important link between organisations and increasingly represents a source of competitive advantage in the marketplace. The purpose of this study is to determine whether there is a significant difference in the use of sales and marketing data marts as management information delivery mechanisms in manufacturing organisations in different industries, particularly the pharmaceuticals and branded consumer products. The case studies presented in this dissertation indicates that there are significant differences between the use of sales and marketing data marts in different manufacturing industries, which can be ascribed to the industry, both directly and indirectly. / Thesis (MIS(Information Science))--University of Pretoria, 2002. / Information Science / MIS / unrestricted
|
267 |
[en] OLAP2DATACUBE: AN ON-DEMAND TRANSFORMATION FRAMEWORK FROM OLAP TO RDF DATA CUBES / [pt] OLAP2DATACUBE: UM FRAMEWORK PARA TRANSFORMAÇÕES EM TEMPO DE EXECUÇÃO DE OLAP PARA CUBOS DE DADOS EM RDFPERCY ENRIQUE RIVERA SALAS 13 April 2016 (has links)
[pt] Dados estatísticos são uma das mais importantes fontes de informações,
relevantes para um grande número de partes interessadas nos domínios governamentais, científicos e de negócios. Um conjunto de dados estatísticos compreende uma coleção de observações feitas em alguns pontos através de um espaço lógico e muitas vezes é organizado como cubos de dados. A definição
adequada de cubos de dados, especialmente das suas dimensões, ajuda a processar
as observações e, mais importante, ajuda a combinar observações de
diferentes cubos de dados. Neste contexto, os princípios de Linked Data podem
ser proveitosamente aplicados na definição de cubos de dados, no sentido de
que os princípios oferecem uma estratégia para fornecer a semântica ausentes
nas dimensões, incluindo os seus valores. Nesta tese, descrevemos o processo e
a implementação de uma arquitetura de mediação, chamada OLAP2DataCube
On Demand Framework, que ajuda a descrever e consumir dados estatísticos,
expostos como triplas RDF, mas armazenados em bancos de dados relacionais.
O Framework possui um catálogo de descrições de Linked Data Cubes, criado
de acordo com os princípios de Linked Data. O catálogo tem uma descrição
padronizada para cada cubo de dados armazenado em bancos de dados (relacionais)
estatísticos conhecidos pelo Framework. O Framework oferece uma interface
para navegar pelas descrições dos Linked Data Cubes e para exportar os
cubos de dados como triplas RDF geradas por demanda a partir das fontes de
dados subjacentes. Também discutimos a implementação de operações sofisticadas
de busca de metadados, operações OLAP em cubo de dados, tais como
slice e dice, e operações de mashup sofisticadas de cubo de dados que criam
novos cubos através da combinação de outros cubos. / [en] Statistical data is one of the most important sources of information,
relevant to a large number of stakeholders in the governmental, scientific
and business domains alike. A statistical data set comprises a collection of
observations made at some points across a logical space and is often organized
as what is called a data cube. The proper definition of the data cubes,
especially of their dimensions, helps processing the observations and, more
importantly, helps combining observations from different data cubes. In this
context, the Linked Data principles can be profitably applied to the definition
of data cubes, in the sense that the principles offer a strategy to provide the
missing semantics of the dimensions, including their values. In this thesis we
describe the process and the implementation of a mediation architecture, called
OLAP2DataCube On Demand, which helps describe and consume statistical
data, exposed as RDF triples, but stored in relational databases. The tool
features a catalogue of Linked Data Cube descriptions, created according to the
Linked Data principles. The catalogue has a standardized description for each
data cube actually stored in each statistical (relational) database known to the
tool. The tool offers an interface to browse the linked data cube descriptions
and to export the data cubes as RDF triples, generated on demand from the
underlying data sources. We also discuss the implementation of sophisticated
metadata search operations, OLAP data cube operations, such as slice and
dice, and data cube mashup operations that create new cubes by combining
other cubes.
|
268 |
Metodika vývoje a nasazování Business Intelligence v malých a středních podnicích / Methodology of development and deployment of Business Intelligence solutions in Small and Medium Sized EnterprisesRydzi, Daniel January 2005 (has links)
Dissertation thesis deals with development and implementation of Business Intelligence (BI) solutions for Small and Medium Sized Enterprises (SME) in the Czech Republic. This thesis represents climax of author's up to now effort that has been put into completing a methodological model for development of this kind of applications for SMEs using self-owned skills and minimum of external resources and costs. This thesis can be divided into five major parts. First part that describes used technologies is divided into two chapters. First chapter describes contemporary state of Business Intelligence concept and it also contains original taxonomy of Business Intelligence solutions. Second chapter describes two Knowledge Discovery in Databases (KDD) techniques that were used for building those BI solutions that are introduced in case studies. Second part describes the area of Czech SMEs, which is an environment where the thesis was written and which it is meant to contribute to. This environment is represented by one chapter that defines the differences of SMEs against large corporations. Furthermore, there are author's reasons why he is personally focusing on this area explained. Third major part introduces the results of survey that was conducted among Czech SMEs with support of Department of Information Technologies of Faculty of Informatics and Statistics of University of Economics in Prague. This survey had three objectives. First one was to map the readiness of Czech SMEs for BI solutions development and deployment. Second was to determine major problems and consequent decisions of Czech SMEs that could be supported by BI solutions and the third objective was to determine top factors preventing SMEs from developing and deploying BI solutions. Fourth part of the thesis is also the core one. In two chapters there is the original Methodology for development and deployment of BI solutions by SMEs described as well as other methodologies that were studied. Original methodology is partly based on famous CRISP-DM methodology. Finally, last part describes particular company that has become a testing ground for author's theories and that supports his research. In further chapters it introduces case-studies of development and deployment of those BI solutions in this company, that were build using contemporary BI and KDD techniques with respect to original methodology. In that sense, these case-studies verified theoretical methodology in real use.
|
269 |
A data management and analytic model for business intelligence applicationsBanda, Misheck 05 1900 (has links)
Most organisations use several data management and business intelligence solutions which are on-premise and, or cloud-based to manage and analyse their constantly growing business data. Challenges faced by organisations nowadays include, but are not limited to growth limitations, big data, inadequate analytics, computing, and data storage capabilities. Although these organisations are able to generate reports and dashboards for decision-making in most cases, effective use of their business data and an appropriate business intelligence solution could achieve and retain informed decision-making and allow competitive reaction to the dynamic external environment. A data management and analytic model has been proposed on which organisations could rely for decisive guidance when planning to procure and implement a unified business intelligence solution. To achieve a sound model, literature was reviewed by extensively studying business intelligence in general, and exploring and developing various deployment models and architectures consisting of naïve, on-premise, and cloud-based which revealed their benefits and challenges. The outcome of the literature review was the development of a hybrid business intelligence model and the accompanying architecture as the main contribution to the study.In order to assess the state of business intelligence utilisation, and to validate and improve the proposed architecture, two case studies targeting users and experts were conducted using quantitative and qualitative approaches. The case studies found and established that a decision to procure and implement a successful business intelligence solution is based on a number of crucial elements, such as, applications, devices, tools, business intelligence services, data management and infrastructure. The findings further recognised that the proposed hybrid architecture is the solution for managing complex organisations with serious data challenges. / Computing / M. Sc. (Computing)
|
270 |
Istar : um esquema estrela otimizado para Image Data Warehouses baseado em similaridadeAnibal, Luana Peixoto 26 August 2011 (has links)
Made available in DSpace on 2016-06-02T19:05:54Z (GMT). No. of bitstreams: 1
3993.pdf: 3294402 bytes, checksum: 982c043143364db53c8a4e2084205995 (MD5)
Previous issue date: 2011-08-26 / A data warehousing environment supports the decision-making process through the investigation and analysis of data in an organized and agile way. However, the current
data warehousing technologies do not allow that the decision-making processe be carried out based on images pictorial (intrinsic) features. This analysis can not be carried out in a
conventional data warehousing because it requires the management of data related to the intrinsic features of the images to perform similarity comparisons. In this work, we
propose a new data warehousing environment called iCube to enable the processing of OLAP perceptual similarity queries over images, based on their pictorial (intrinsic) features. Our approach deals with and extends the three main phases of the traditional data warehousing process to allow the use of images as data. For the data integration phase, or ETL phase, we propose a process to represent the image by its intrinsic
content (such as color or texture numerical descriptors) and integrate this data with conventional data in the DW. For the dimensional modeling phase, we propose a star schema, called iStar, that stores both the intrinsic and the conventional image data. Moreover, at this stage, our approach models the schema to represent and support the use of different user-defined perceptual layers. For the data analysis phase, we propose an environment in which the OLAP engine uses the image similarity as a query predicate. This environment employs a filter mechanism to speed-up the query execution. The iStar was validated through performance tests for evaluating both the building cost and the cost to process IOLAP queries. The results showed that our approach provided an impressive performance improvement in IOLAP query processing. The performance gain of the iCube over the best related work (i.e. SingleOnion) was up to 98,21%. / Um ambiente de data warehousing (DWing) auxilia seus usuários a tomarem decisões a partir de investigações e análises dos dados de maneira organizada e ágil. Entretanto, os atuais recursos de DWing não possibilitam que o processo de tomada de decisão seja realizado com base em comparações do conteúdo intrínseco de imagens. Esta análise
não pode ser realizada por aplicações de DW convencionais porque essa utiliza, como base, imagens digitais e necessita realizar operações baseadas em similaridade, para as
quais um DW convencional não oferece suporte. Neste trabalho, é proposto um ambiente de data warehouse chamado iCube que provê suporte ao processamento de consultas IOLAP (Image On-Line Analytical Processing) baseadas em diversas percepções de similaridade entre as imagens. O iCube realiza adaptações nas três principais fases de um ambiente de data warehousing convencional para permitir o uso de imagens como dados de um data warehouse (DW). Para a fase de integração, ou fase ETL (Extract, Trasnform and Load), nós propomos um processo para representar as imagens a partir de seu conteúdo intrínseco (i.e., por exemplo por meio de descritores numéricos que
representam cor ou textura dessas imagens) e integrar esse conteúdo intrínseco a dados convencionais em um DW. Neste trabalho, nós também propomos um esquema estrela
otimizado para o iCube, denominado iStar, que armazena tanto dados convencionais quanto dados de representação do conteúdo intrínseco das imagens. Ademais, nesta fase, o iStar foi projetado para representar e prover suporte ao uso de diferentes camadas perceptuais definidas pelo usuário. Para a fase de análise de dados, o iCube permite que processos OLAP sejam executados com o uso de comparações de similaridade como predicado de consultas e com o uso de mecanismos de filtragem para acelerar o processamento de consultas OLAP. O iCube foi validado a partir de testes de
desempenho para a construção da estrutura e para o processamento de consultas IOLAP. Os resultados demonstraram que o iCube melhora significativamente o
desempenho no processamento de consultas IOLAP quando comparado aos atuais recursos de IDWing. Os ganhos de desempenho do iCube contra o melhor trabalho correlato (i.e. SingleOnion) foram de até 98,21%.
|
Page generated in 0.067 seconds