• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 1
  • 1
  • 1
  • Tagged with
  • 7
  • 7
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Kylo Data Lakes Configuration deployed in Public Cloud environments in Single Node Mode

Peng, Rong January 2019 (has links)
The master thesis introduces the Kylo Data Lake which deployed in the public cloud environment,provides a perspective of datalake configuration and data ingestion experiment. This paper reveals the underlying architecture of Kylo data lake.
2

Phenotypic expansion in KIF1A-related dominant disorders: A description of novel variants and review of published cases

Montenegro-Garreaud, Ximena, Hansen, Adam W., Khayat, Michael M., Chander, Varuna, Grochowski, Christopher M., Jiang, Yunyun, Li, He, Mitani, Tadahiro, Kessler, Elena, Jayaseelan, Joy, Shen, Hua, Gezdirici, Alper, Pehlivan, Davut, Meng, Qingchang, Rosenfeld, Jill A., Jhangiani, Shalini N., Madan-Khetarpal, Suneeta, Scott, Daryl A., Abarca-Barriga, Hugo, Trubnykova, Milana, Gingras, Marie Claude, Muzny, Donna M., Posey, Jennifer E., Liu, Pengfei, Lupski, James R., Gibbs, Richard A. 01 December 2020 (has links)
KIF1A is a molecular motor for membrane-bound cargo important to the development and survival of sensory neurons. KIF1A dysfunction has been associated with several Mendelian disorders with a spectrum of overlapping phenotypes, ranging from spastic paraplegia to intellectual disability. We present a novel pathogenic in-frame deletion in the KIF1A molecular motor domain inherited by two affected siblings from an unaffected mother with apparent germline mosaicism. We identified eight additional cases with heterozygous, pathogenic KIF1A variants ascertained from a local data lake. Our data provide evidence for the expansion of KIF1A-associated phenotypes to include hip subluxation and dystonia as well as phenotypes observed in only a single case: gelastic cataplexy, coxa valga, and double collecting system. We review the literature and suggest that KIF1A dysfunction is better understood as a single neuromuscular disorder with variable involvement of other organ systems than a set of discrete disorders converging at a single locus. / National Institutes of Health / Revisión por pares
3

Cooperative caching for object storage

Kaynar Terzioglu, Emine Ugur 29 October 2022 (has links)
Data is increasingly stored in data lakes, vast immutable object stores that can be accessed from anywhere in the data center. By providing low cost and scalable storage, today immutable object-storage based data lakes are used by a wide range of applications with diverse access patterns. Unfortunately, performance can suffer for applications that do not match the access patterns for which the data lake was designed. Moreover, in many of today's (non-hyperscale) data centers, limited bisectional bandwidth will limit data lake performance. Today many computer clusters integrate caches both to address the mismatch between application performance requirements and the capabilities of the shared data lake, and to reduce the demand on the data center network. However, per-cluster caching; i) means the expensive cache resources cannot be shifted between clusters based on demand, ii) makes sharing expensive because data accessed by multiple clusters is independently cached by each of them, and iii) makes it difficult for clusters to grow and shrink if their servers are being used to cache storage. In this dissertation, we present two novel data-center wide cooperative cache architectures, Datacenter-Data-Delivery Network (D3N) and Directory-Based Datacenter-Data-Delivery Network (D4N) that are designed to be part of the data lake itself rather than part of the computer clusters that use it. D3N and D4N distribute caches across the data center to enable data sharing and elasticity of cache resources where requests are transparently directed to nearby cache nodes. They dynamically adapt to changes in access patterns and accelerate workloads while providing the same consistency, trust, availability, and resilience guarantees as the underlying data lake. We nd that exploiting the immutability of object stores significantly reduces the complexity and provides opportunities for cache management strategies that were not feasible for previous cooperative cache systems for le or block-based storage. D3N is a multi-layer cooperative cache that targets workloads with large read-only datasets like big data analytics. It is designed to be easily integrated into existing data lakes with only limited support for write caching of intermediate data, and avoiding any global state by, for example, using consistent hashing for locating blocks and making all caching decisions based purely on local information. Our prototype is performant enough to fully exploit the (5 GB/s read) SSDs and (40, Gbit/s) NICs in our system and improve the runtime of realistic workloads by up to 3x. The simplicity of D3N has enabled us, in collaboration with industry partners, to upstream the two-layer version of D3N into the existing code base of the Ceph object store as a new experimental feature, making it available to the many data lakes around the world based on Ceph. D4N is a directory-based cooperative cache that provides a reliable write tier and a distributed directory that maintains a global state. It explores the use of global state to implement more sophisticated cache management policies and enables application-specific tuning of caching policies to support a wider range of applications than D3N. In contrast to previous cache systems that implement their own mechanism for maintaining dirty data redundantly, D4N re-uses the existing data lake (Ceph) software for implementing a write tier and exploits the semantics of immutable objects to move aged objects to the shared data lake. This design greatly reduces the barrier to adoption and enables D4N to take advantage of sophisticated data lake features such as erasure coding. We demonstrate that D4N is performant enough to saturate the bandwidth of the SSDs, and it automatically adapts replication to the working set of the demands and outperforms the state of art cluster cache Alluxio. While it will be substantially more complicated to integrate the D4N prototype into production quality code that can be adopted by the community, these results are compelling enough that our partners are starting that effort. D3N and D4N demonstrate that cooperative caching techniques, originally designed for file systems, can be employed to integrate caching into today’s immutable object-based data lakes. We find that the properties of immutable object storage greatly simplify the adoption of these techniques, and enable integration of caching in a fashion that enables re-use of existing battle tested software; greatly reducing the barrier of adoption. In integrating the caching in the data lake, and not the compute cluster, this research opens the door to efficient data center wide sharing of data and resources.
4

Arkitektonisk utformning av en lagringsplattform för Business Intelligence : En litteratur- och fallstudie riktad mot små och medelstora företag

Lundström, Adam January 2018 (has links)
BI, business intelligence, which means to collect and analyse data to inform business decisions, is a concept that has grown to be a significant part of business development. In most cases, a storage platform is necessary to provide data from the companies’ different data sources to the BI-tools. There are different ways of doing this. Some of them are by the help of a data lake, a data warehouse or a combination of both. By taking this into consideration the purpose of this study is to create an architectural design of a storage platform for small and medium-sized enterprises (SME). To be able to formulate a result which holds as high validity and reliability as possible, this study conducts both a literature and a case study. The case study has taken place at an IT service company which classifies as a SME and the working methodology has been an agile approach with scrum as reference. This method was chosen to be able to efficiently follow the customer demands.  The architecture provided consists of a combination of a data hub, which acts as a data lake, and a data warehouse. The data hub differs from a data lake by harmonizing and indexing data, which makes it easier to handle. Regarding the intension of the data warehouse, it is to yield relevant and processed data to BI-tools. The architecture design of the platform that has been developed in this study cannot be said to be applicable by all companies. Instead, it can be a basis for companies that are thinking about creating a data platform. / BI, Business intelligence, vilket betyder att samla och analysera data som beslutsstöd, har växt till att vara en betydande del inom företagsutveckling. För att möjliggöra för BI krävs ofta att en datalagringsplattform tillhandahåller data från verksamhetens datakällor. Det finns olika sätt att göra detta, men några av dem är med hjälp av en datasjö, ett datalager eller en kombination av båda. Mot bakgrund av detta, har den här studien syftat till att framställa en övergripande arkitektur som lämpar sig till små  och medelstora företag (SMF). För att frambringa ett resultat med så  hög validitet och reliabilitet som mö jligt baseras studien på  både en litteraturstudie och en fallstudie. Fallstudien har genomförts på  ett IT-konsultföretag som klassas som SMF och arbetsmetodiken har varit agil och baserats pa  scrum. Denna metod valdes för att på  ett effektivt sä tt säkerställa att kundens krav uppfylldes. Den arkitektur som föreslagits är en kombination mellan en datahubb, vilket verkar som en datasjö , och ett datalager. Datahubben skiljer sig från datasjön eftersom den harmoniserar och indexerar data, vilket gör den enklare att hantera. Rörande datalagret, ämnar det tillgodose bearbetad och relevant data för BI-verktyg. Den övergripande arkitektur som presenteras anses inte vara applicerbar för alla verksamheter, men kan användas som en grund för verksamheter som tänker implementera en lagringsplattform.
5

L’évolution des systèmes et architectures d’information sous l’influence des données massives : les lacs de données / The information architecture evolution under the big data influence : the data lakes

Madera, Cedrine 22 November 2018 (has links)
La valorisation du patrimoine des données des organisation est mise au cœur de leur transformation digitale. Sous l’influence des données massives le système d’information doit s’adapter et évoluer. Cette évolution passe par une transformation des systèmes décisionnels mais aussi par l’apparition d’un nouveau composant du système d’information : Les lacs de données. Nous étudions cette évolution des systèmes décisionnels, les éléments clés qui l’influence mais aussi les limites qui apparaissent , du point de vue de l’architecture, sous l’influence des données massives. Nous proposons une évolution des systèmes d’information avec un nouveau composant qu’est le lac de données. Nous l’étudions du point de vue de l’architecture et cherchons les facteurs qui peuvent influencer sa conception , comme la gravité des données. Enfin, nous amorçons une piste de conceptualisation des lacs de données en explorant l’approche ligne de produit.Nouvelle versionSous l'influence des données massives nous étudions l'impact que cela entraîne notamment avec l'apparition de nouvelles technologies comme Apache Hadoop ainsi que les limite actuelles des système décisionnel.Les limites rencontrées par les systèmes décisionnels actuels impose une évolution au système d 'information qui doit s'adapter et qui donne naissance à un nouveau composant : le lac de données.Dans un deuxième temps nous étudions en détail ce nouveau composant, formalisons notre définition, donnons notre point de vue sur son positionnement dans le système d information ainsi que vis à vis des systèmes décisionnels.Par ailleurs, nous mettons en évidence un facteur influençant l’architecture des lacs de données : la gravité des données, en dressant une analogie avec la loi de la gravité et en nous concentrant sur les facteurs qui peuvent influencer la relation donnée-traitement.Nous mettons en évidence , au travers d'un cas d'usage , que la prise en compte de la gravité des données peut influencer la conception d'un lac de données.Nous terminons ces travaux par une adaptation de l'approche ligne de produit logiciel pour amorcer une méthode de formalisations et modélisation des lacs de données. Cette méthode nous permet :- d’établir une liste de composants minimum à mettre en place pour faire fonctionner un lac de données sans que ce dernier soit transformé en marécage,- d’évaluer la maturité d'un lac de donnée existant,- de diagnostiquer rapidement les composants manquants d'un lac de données existant qui serait devenu un marécage,- de conceptualiser la création des lacs de données en étant "logiciel agnostique”. / Data is on the heart of the digital transformation.The consequence is anacceleration of the information system evolution , which must adapt. The Big data phenomenonplays the role of catalyst of this evolution.Under its influence appears a new component of the information system: the data lake.Far from replacing the decision support systems that make up the information system, data lakes comecomplete information systems’s architecture.First, we focus on the factors that influence the evolution of information systemssuch as new software and middleware, new infrastructure technologies, but also the decision support system usage itself.Under the big data influence we study the impact that this entails especially with the appearance ofnew technologies such as Apache Hadoop as well as the current limits of the decision support system .The limits encountered by the current decision support system force a change to the information system which mustadapt and that gives birth to a new component: the data lake.In a second time we study in detail this new component, formalize our definition, giveour point of view on its positioning in the information system as well as with regard to the decision support system .In addition, we highlight a factor influencing the architecture of data lakes: data gravity, doing an analogy with the law of gravity and focusing on the factors that mayinfluence the data-processing relationship. We highlight, through a use case, that takingaccount of the data gravity can influence the design of a data lake.We complete this work by adapting the software product line approach to boot a methodof formalizations and modeling of data lakes. This method allows us:- to establish a minimum list of components to be put in place to operate a data lake without transforming it into a data swamp,- to evaluate the maturity of an existing data lake,- to quickly diagnose the missing components of an existing data lake that would have become a dataswamp- to conceptualize the creation of data lakes by being "software agnostic “.
6

Multi-Model Snowflake Schema Creation

Gruenberg, Rebecca 25 April 2022 (has links)
No description available.
7

Data Governance : A conceptual framework in order to prevent your Data Lake from becoming a Data Swamp

Paschalidi, Charikleia January 2015 (has links)
Information Security nowadays is becoming a very popular subject of discussion among both academics and organizations. Proper Data Governance is the first step to an effective Information Security policy. As a consequence, more and more organizations are now switching their approach to data, considering them as assets, in order to get as much value as possible out of it. Living in an IT-driven world makes a lot of researchers to approach Data Governance by borrowing IT Governance frameworks.The aim of this thesis is to contribute to this research by doing an Action Research in a big Financial Institution in the Netherlands that is currently releasing a Data Lake where all the data will be gathered and stored in a secure way. During this research a framework on implementing a proper Data Governance into the Data Lake is introduced.The results were promising and indicate that under specific circumstances, this framework could be very beneficial not only for this specific institution, but for every organisation that would like to avoid confusions and apply Data Governance into their tasks. / <p>Validerat; 20151222 (global_studentproject_submitter)</p>

Page generated in 0.05 seconds