Spelling suggestions: "subject:"[een] DATA LAKE"" "subject:"[enn] DATA LAKE""
1 |
[en] A DATA REFERENCE ARCHITECTURE FOR BRAZILIAN ELECTRICAL COMPANIES / [pt] UMA ARQUITETURA DE REFERÊNCIA PARA DADOS DE EMPRESAS DO SETOR ELÉTRICO BRASILEIROMARCELO DE CARVALHO 03 June 2024 (has links)
[pt] Durante a década de 1990, o setor elétrico brasileiro passou por profundas mudanças em seu modelo operacional. O Estado Brasileiro passou a assumir um papel menos desenvolvimentista e mais regulatório, o que levou a criação da Agência Nacional de Energia Elétrica (ANEEL). Um dos papéis da ANEEL é assegurar a qualidade do serviço prestado pelos agentes do setor (empresas de geração, transmissão e distribuição de energia). Quando um agente não cumpre os padrões estabelecidos, a ANEEL pode aplicar penalidades. Nesse sentido, o aprimoramento dos processos de manutenção desempenham um papel crucial na garantia da confiabilidade e eficiência de sistemas elétricos e consequente redução de penalidades. A manutenção preditiva passa a ser adotada, em complemento às metodologias mais tradicionais (reativa e preventiva). Contudo, essa metodologia representa uma mudança fundamental em relação às anteriores, pois busca antecipar falhas com base em dados e análises. Nesse sentido,a incorporação da manutenção preditiva nos processos de manutenção, pressupõe a disponibilidade de dados de operação e manutenção dos equipamentos, bem como de recursos tecnológicos que viabilizem a análise desses dados. Essa dissertação propõe uma arquitetura tecnológica de referência, que habilite o desenvolvimento dessas análises, considerando aspectos de gestão, governança e conformidade empresariais praticados pelos agentes do setor. / [en] During the 1990s, the Brazilian electricity sector underwent profound
changes in its operational model. The Brazilian state began to assume a less
developmental and more regulatory role, leading to the creation of the National
Electric Energy Agency (ANEEL). One of the roles of ANEEL is to ensure the
quality of service provided by sector agents (energy generation, transmission,
and distribution companies). When an agent does not meet established standards, ANEEL can apply penalties. In this sense, the improvement of maintenance processes plays a crucial role in ensuring the reliability and efficiency
of electrical systems and consequently reducing penalties. Predictive maintenance is being adopted, in addition to more traditional methodologies (reactive and preventive). However, this methodology represents a fundamental change
compared to previous ones, as it seeks to anticipate failures based on data
and analysis. In this sense, the incorporation of predictive maintenance into
maintenance processes presupposes the availability of operating and maintenance data of the equipment, as well as the technological resources that enable the analysis of these data. This dissertation proposes a reference technological
architecture that enables the development of these analyzes, considering aspects of management, governance, and corporate compliance practiced by the sector s agents.
|
2 |
Cooperative caching for object storageKaynar Terzioglu, Emine Ugur 29 October 2022 (has links)
Data is increasingly stored in data lakes, vast immutable object stores that can be accessed from anywhere in the data center. By providing low cost and scalable storage, today immutable object-storage based data lakes are used by a wide range of applications with diverse access patterns. Unfortunately, performance can suffer for applications that do not match the access patterns for which the data lake was designed. Moreover, in many of today's (non-hyperscale) data centers, limited bisectional bandwidth will limit data lake performance. Today many computer clusters integrate caches both to address the mismatch between application performance requirements and the capabilities of the shared data lake, and to reduce the demand on the data center network. However, per-cluster caching;
i) means the expensive cache resources cannot be shifted between clusters based on demand,
ii) makes sharing expensive because data accessed by multiple clusters is independently cached by each of them,
and
iii) makes it difficult for clusters to grow and shrink if their servers are being used to cache storage.
In this dissertation, we present two novel data-center wide cooperative cache architectures, Datacenter-Data-Delivery Network (D3N) and Directory-Based Datacenter-Data-Delivery Network (D4N) that are designed to be part of the data lake itself rather than part of the computer clusters that use it. D3N and D4N distribute caches across the data center to enable data sharing and elasticity of cache resources where requests are transparently directed to nearby cache nodes. They dynamically adapt to changes in access patterns and accelerate workloads while providing the same consistency, trust, availability, and resilience guarantees as the underlying data lake. We nd that exploiting the immutability of object stores significantly reduces the complexity and provides opportunities for cache management strategies that were not feasible for previous cooperative cache systems for le or block-based storage.
D3N is a multi-layer cooperative cache that targets workloads with large read-only datasets like big data analytics. It is designed to be easily integrated into existing data lakes with only limited support for write caching of intermediate data, and avoiding any global state by, for example, using consistent hashing for locating blocks and making all caching decisions based purely on local information. Our prototype is performant enough to fully exploit the (5 GB/s read) SSDs and (40, Gbit/s) NICs in our system and improve the runtime of realistic workloads by up to 3x. The simplicity of D3N has enabled us, in collaboration with industry partners, to upstream the two-layer version of D3N into the existing code base of the Ceph object store as a new experimental feature, making it available to the many data lakes around the world based on Ceph.
D4N is a directory-based cooperative cache that provides a reliable write tier and a distributed directory that maintains a global state. It explores the use of global state to implement more sophisticated cache management policies and enables application-specific tuning of caching policies to support a wider range of applications than D3N. In contrast to previous cache systems that implement their own mechanism for maintaining dirty data redundantly, D4N re-uses the existing data lake (Ceph) software for implementing a write tier and exploits the semantics of immutable objects to move aged objects to the shared data lake. This design greatly reduces the barrier to adoption and enables D4N to take advantage of sophisticated data lake features such as erasure coding. We demonstrate that D4N is performant enough to saturate the bandwidth of the SSDs, and it automatically adapts replication to the working set of the demands and outperforms the state of art cluster cache Alluxio. While it will be substantially more complicated to integrate the D4N prototype into production quality code that can be adopted by the community, these results are compelling enough that our partners are starting that effort.
D3N and D4N demonstrate that cooperative caching techniques, originally designed for file systems, can be employed to integrate caching into today’s immutable object-based data lakes. We find that the properties of immutable object storage greatly simplify the adoption of these techniques, and enable integration of caching in a fashion that enables re-use of existing battle tested software; greatly reducing the barrier of adoption. In integrating the caching in the data lake, and not the compute cluster, this research opens the door to efficient data center wide sharing of data and resources.
|
3 |
Arkitektonisk utformning av en lagringsplattform för Business Intelligence : En litteratur- och fallstudie riktad mot små och medelstora företagLundström, Adam January 2018 (has links)
BI, business intelligence, which means to collect and analyse data to inform business decisions, is a concept that has grown to be a significant part of business development. In most cases, a storage platform is necessary to provide data from the companies’ different data sources to the BI-tools. There are different ways of doing this. Some of them are by the help of a data lake, a data warehouse or a combination of both. By taking this into consideration the purpose of this study is to create an architectural design of a storage platform for small and medium-sized enterprises (SME). To be able to formulate a result which holds as high validity and reliability as possible, this study conducts both a literature and a case study. The case study has taken place at an IT service company which classifies as a SME and the working methodology has been an agile approach with scrum as reference. This method was chosen to be able to efficiently follow the customer demands. The architecture provided consists of a combination of a data hub, which acts as a data lake, and a data warehouse. The data hub differs from a data lake by harmonizing and indexing data, which makes it easier to handle. Regarding the intension of the data warehouse, it is to yield relevant and processed data to BI-tools. The architecture design of the platform that has been developed in this study cannot be said to be applicable by all companies. Instead, it can be a basis for companies that are thinking about creating a data platform. / BI, Business intelligence, vilket betyder att samla och analysera data som beslutsstöd, har växt till att vara en betydande del inom företagsutveckling. För att möjliggöra för BI krävs ofta att en datalagringsplattform tillhandahåller data från verksamhetens datakällor. Det finns olika sätt att göra detta, men några av dem är med hjälp av en datasjö, ett datalager eller en kombination av båda. Mot bakgrund av detta, har den här studien syftat till att framställa en övergripande arkitektur som lämpar sig till små och medelstora företag (SMF). För att frambringa ett resultat med så hög validitet och reliabilitet som mö jligt baseras studien på både en litteraturstudie och en fallstudie. Fallstudien har genomförts på ett IT-konsultföretag som klassas som SMF och arbetsmetodiken har varit agil och baserats pa scrum. Denna metod valdes för att på ett effektivt sä tt säkerställa att kundens krav uppfylldes. Den arkitektur som föreslagits är en kombination mellan en datahubb, vilket verkar som en datasjö , och ett datalager. Datahubben skiljer sig från datasjön eftersom den harmoniserar och indexerar data, vilket gör den enklare att hantera. Rörande datalagret, ämnar det tillgodose bearbetad och relevant data för BI-verktyg. Den övergripande arkitektur som presenteras anses inte vara applicerbar för alla verksamheter, men kan användas som en grund för verksamheter som tänker implementera en lagringsplattform.
|
4 |
L’évolution des systèmes et architectures d’information sous l’influence des données massives : les lacs de données / The information architecture evolution under the big data influence : the data lakesMadera, Cedrine 22 November 2018 (has links)
La valorisation du patrimoine des données des organisation est mise au cœur de leur transformation digitale. Sous l’influence des données massives le système d’information doit s’adapter et évoluer. Cette évolution passe par une transformation des systèmes décisionnels mais aussi par l’apparition d’un nouveau composant du système d’information : Les lacs de données. Nous étudions cette évolution des systèmes décisionnels, les éléments clés qui l’influence mais aussi les limites qui apparaissent , du point de vue de l’architecture, sous l’influence des données massives. Nous proposons une évolution des systèmes d’information avec un nouveau composant qu’est le lac de données. Nous l’étudions du point de vue de l’architecture et cherchons les facteurs qui peuvent influencer sa conception , comme la gravité des données. Enfin, nous amorçons une piste de conceptualisation des lacs de données en explorant l’approche ligne de produit.Nouvelle versionSous l'influence des données massives nous étudions l'impact que cela entraîne notamment avec l'apparition de nouvelles technologies comme Apache Hadoop ainsi que les limite actuelles des système décisionnel.Les limites rencontrées par les systèmes décisionnels actuels impose une évolution au système d 'information qui doit s'adapter et qui donne naissance à un nouveau composant : le lac de données.Dans un deuxième temps nous étudions en détail ce nouveau composant, formalisons notre définition, donnons notre point de vue sur son positionnement dans le système d information ainsi que vis à vis des systèmes décisionnels.Par ailleurs, nous mettons en évidence un facteur influençant l’architecture des lacs de données : la gravité des données, en dressant une analogie avec la loi de la gravité et en nous concentrant sur les facteurs qui peuvent influencer la relation donnée-traitement.Nous mettons en évidence , au travers d'un cas d'usage , que la prise en compte de la gravité des données peut influencer la conception d'un lac de données.Nous terminons ces travaux par une adaptation de l'approche ligne de produit logiciel pour amorcer une méthode de formalisations et modélisation des lacs de données. Cette méthode nous permet :- d’établir une liste de composants minimum à mettre en place pour faire fonctionner un lac de données sans que ce dernier soit transformé en marécage,- d’évaluer la maturité d'un lac de donnée existant,- de diagnostiquer rapidement les composants manquants d'un lac de données existant qui serait devenu un marécage,- de conceptualiser la création des lacs de données en étant "logiciel agnostique”. / Data is on the heart of the digital transformation.The consequence is anacceleration of the information system evolution , which must adapt. The Big data phenomenonplays the role of catalyst of this evolution.Under its influence appears a new component of the information system: the data lake.Far from replacing the decision support systems that make up the information system, data lakes comecomplete information systems’s architecture.First, we focus on the factors that influence the evolution of information systemssuch as new software and middleware, new infrastructure technologies, but also the decision support system usage itself.Under the big data influence we study the impact that this entails especially with the appearance ofnew technologies such as Apache Hadoop as well as the current limits of the decision support system .The limits encountered by the current decision support system force a change to the information system which mustadapt and that gives birth to a new component: the data lake.In a second time we study in detail this new component, formalize our definition, giveour point of view on its positioning in the information system as well as with regard to the decision support system .In addition, we highlight a factor influencing the architecture of data lakes: data gravity, doing an analogy with the law of gravity and focusing on the factors that mayinfluence the data-processing relationship. We highlight, through a use case, that takingaccount of the data gravity can influence the design of a data lake.We complete this work by adapting the software product line approach to boot a methodof formalizations and modeling of data lakes. This method allows us:- to establish a minimum list of components to be put in place to operate a data lake without transforming it into a data swamp,- to evaluate the maturity of an existing data lake,- to quickly diagnose the missing components of an existing data lake that would have become a dataswamp- to conceptualize the creation of data lakes by being "software agnostic “.
|
5 |
Multi-Model Snowflake Schema CreationGruenberg, Rebecca 25 April 2022 (has links)
No description available.
|
6 |
Data Governance : A conceptual framework in order to prevent your Data Lake from becoming a Data SwampPaschalidi, Charikleia January 2015 (has links)
Information Security nowadays is becoming a very popular subject of discussion among both academics and organizations. Proper Data Governance is the first step to an effective Information Security policy. As a consequence, more and more organizations are now switching their approach to data, considering them as assets, in order to get as much value as possible out of it. Living in an IT-driven world makes a lot of researchers to approach Data Governance by borrowing IT Governance frameworks.The aim of this thesis is to contribute to this research by doing an Action Research in a big Financial Institution in the Netherlands that is currently releasing a Data Lake where all the data will be gathered and stored in a secure way. During this research a framework on implementing a proper Data Governance into the Data Lake is introduced.The results were promising and indicate that under specific circumstances, this framework could be very beneficial not only for this specific institution, but for every organisation that would like to avoid confusions and apply Data Governance into their tasks. / <p>Validerat; 20151222 (global_studentproject_submitter)</p>
|
Page generated in 0.0348 seconds