Global ETD Search

541	Design and Implementation of Key Exchange Mechanisms for Software Artifacts using Ocean Protocol. Myadam, Nishkal Gupta, Patnam, Bhavith January 2020 (has links) During the modern times, innovators and researchers developed a key technology known as Artificial Intelligence (AI) Marketplace which leverages the power of AI to efficiently utilize the data generated by millions of devices to create new and better services and software products. H2020 Bonseyes is one such project that provides us a collaborative cloud based model of the AI marketplace for the users who generally don’t have access to large data sets, algorithms etc by allowing them to collaborate which each other and exchange the software artifacts. Collaboration leads to issues related to authentication and authorization which are addressed by Public Key In- frastructure(PKI).The main component of the PKI is the Certificate Authority which acts a anchor of trust, whose architecture is designed to be centralized. Centralized architecture is prone to many attacks and also failures which makes it vulnerable and weak.The adverse effects of the CA based PKI can be avoided by implementing a distributed PKI.This thesis focuses on a hybrid methodology consisting of Qualitative and Quanti- tative analysis by performing a literature review for accumulating knowledge from the Ocean Protocol which is a decentralized AI marketplace.The thesis aims to design and implement the framework used in the ocean protocol and evaluate its performance.The thesis also aims to develop a reference framework to be compatible with the Bonseyes Project. Moreover, our research also provides the reader with the concepts and technologies used in other implementations of distributed PKI. Bonseyes AI Marketplace Ocean Protocol Public Key Infrastructure Distributed Public Key Infrastructure Secret Store functionality Telecommunications Telekommunikation
542	Fysiska klädbutikers kamp mot e-handel : En granskning av de små fysiska butikernas konkurrensmedel gentemot e-handel / Physical clothing store's fight against e-commerce : A review of the small physical store's competitive advantages in relation to e - commerce Vrsajko, Milos, Fridsén, Maja January 2020 (has links) E-handeln växer mer och mer för varje år och det har aldrig någonsin varit lättare att köpa produkterän vad det är idag. För fysiska butiker är det därför viktigt att använda sina konkurrensmedel,marknadsföringsstrategier och handelsmiljö på rätt sätt för att överleva på marknaden och för attöverleva konkurrensen från den växande e-handeln. De fysiska butikerna behöver fokusera på hur dekan vara konkurrenskraftiga på marknaden för att inte bli utkonkurrerade av e-handeln. Syftet medstudien är att undersöka hur lokala klädbutiker utan e-handel överlever och bemöter konkurrensen fråne-handeln.Vi har använt oss utav en kvalitativ metod för att kunna besvara vår frågeställning och vårt syfte. Vigenomförde kvalitativa intervjuer med fyra butiksägare av fyra små butiker som inte har någon ehandel,och vi analyserade sedan deras svar genom att jämföra svaren som vi fick med det valdateoretiska ramverk som vi använt oss av i studien och även med tidigare forskning inom ämnet. Vigjorde även observationer hos dessa fyra butiker.Slutsatsen som dragits av undersökningen är att mindre butiker överlever på marknaden genom att haen god service och lojala kunder. Det är dessutom inte lönt för mindre butiker att starta en e-handel dådet resulterar i mer jobb och kostnader. / The e-commerce is growing every year and the opportunities to purchase products such as clotheshave never been easier than it is today. For the physical stores it is now important to use theircompetitive tools, marketing strategies, trading environment in the right way to survive the expand ofe-commerce. Their focus needs to be on how to conquer their market as a store and to not get outconquered by the e-commerce. The purpose of this study is while the e-commerce is expanding andtaking over more market shares how does the physical stores react and what kind of competitive toolscan they use to keep the sales up and going without an online store.We used a qualitative research method to answer that question. We conducted interviews with fourowners of shops that does not have an online store, and analyzed the answers by comparing them toour chosen theoretical framework and previous studies. We also made observations at these fourstores.The conclusion drawn from the survey is that smaller stores survive in the market by having goodservice and loyal customers. Furthermore, it is not worthwhile for smaller stores to start an ecommerceas this results in more jobs and costs. Competitive tools E-commerce Physical stores Clothing store Retail mix Konkurrensmedel Enskild butik E-handel Modebutik Konkurrensmedelsmixen Business Administration Företagsekonomi
543	Jaiguawa: regalos personalizados para bebes / Jaiguawa: personalized gifts for babies Arroyo Huere, Sayumi Rosa, Suarez Huaman, Yener Hanley, Vasquez Tong, Rafael Nicolás, Yachas Palomares, Evelyn, Zapana Nina, Yesenia 06 July 2019 (has links) El crecimiento poblacional es importante para el desarrollo de un país. La mayoría de las familias se organizan para celebrar la llegada del recién nacido, ya sea a lo grande realizando un babyshower o haciendo una pequeña reunión familiar. Para tales acontecimientos, se tiene que elegir un regalo especial que sea útil y detallista. Los invitados o familiares a dichos eventos tienen distintas opciones para elegir el presente a regalar, tales como un conjunto de ropa, coches, pañales o juguetes. Aunque las primeras opciones sea lo más común que se regala en este evento, existe una opción más personalizada para sacar de lo común aquel obsequio y convertirlo en el mejor detalle para regalar en dicha ocasión. “Jaiguawa” nace con la idea de brindar el mejor obsequio para la madre y el recién nacido, un arreglo de ropa de bebé con distintas presentaciones usando diversos accesorios de calidad para su mejor presentación, estos productos pueden llegar a ser de manera personalizada, de esta manera poder crear ese regalo u obsequio que nuestro cliente desee. Las personas que suelen trabajar y no tienen tiempo para encontrar un buen presente, tienden a comprar el regalo a última hora incluso retrasando su llegada al evento. Jaiguawa al ser una tienda online, realiza envíos vía delivery que permite evitar dichos sucesos y dando facilidades a nuestros clientes ahorrándose el tiempo invertido al buscar regalos de manera presencial. / Population growth is important for the development of a country. Most families organize to celebrate the arrival of the newborn, either in a big way by doing a babyshower or making a small family reunion. For such events, you have to choose a special gift that is useful and detailed. The guests or family members to these events have different options to choose the present, such as a set of clothes, cars for baby, diapers or toys. Although the first options are the most common that is given in this event, there is a more personalized option to take that gift out of the ordinary and turn it into the best detail to give on that occasion. "Jaiguawa" was born with the idea of providing the best gift for the mother and the newborn, an arrangement of baby clothes with different presentations using various quality accessories for their best presentation, these products can become personalized, in this way being able to create that gift or gift that our customer wants. People who usually work and don’t have time to find a good present, tend to buy the gift at the last minute even delaying their arrival at the event. Jaiguawa, being an online store, carries out shipments via delivery that allows to avoid such events and giving facilities to our clients saving the time invested when they look for a gift in person. / Trabajo de investigación Ropa para bebés Regalo personalizado Tienda online Diferenciación estratégica Clothes for babies Personalized gift Online store Differentiation strategy
544	Resiliency Mechanisms for In-Memory Column Stores Kolditz, Till 15 February 2019 (has links) The key objective of database systems is to reliably manage data, while high query throughput and low query latency are core requirements. To date, database research activities mostly concentrated on the second part. However, due to the constant shrinking of transistor feature sizes, integrated circuits become more and more unreliable and transient hardware errors in the form of multi-bit flips become more and more prominent. In a more recent study (2013), in a large high-performance cluster with around 8500 nodes, a failure rate of 40 FIT per DRAM device was measured. For their system, this means that every 10 hours there occurs a single- or multi-bit flip, which is unacceptably high for enterprise and HPC scenarios. Causes can be cosmic rays, heat, or electrical crosstalk, with the latter being exploited actively through the RowHammer attack. It was shown that memory cells are more prone to bit flips than logic gates and several surveys found multi-bit flip events in main memory modules of today's data centers. Due to the shift towards in-memory data management systems, where all business related data and query intermediate results are kept solely in fast main memory, such systems are in great danger to deliver corrupt results to their users. Hardware techniques can not be scaled to compensate the exponentially increasing error rates. In other domains, there is an increasing interest in software-based solutions to this problem, but these proposed methods come along with huge runtime and/or storage overheads. These are unacceptable for in-memory data management systems. In this thesis, we investigate how to integrate bit flip detection mechanisms into in-memory data management systems. To achieve this goal, we first build an understanding of bit flip detection techniques and select two error codes, AN codes and XOR checksums, suitable to the requirements of in-memory data management systems. The most important requirement is effectiveness of the codes to detect bit flips. We meet this goal through AN codes, which exhibit better and adaptable error detection capabilities than those found in today's hardware. The second most important goal is efficiency in terms of coding latency. We meet this by introducing a fundamental performance improvements to AN codes, and by vectorizing both chosen codes' operations. We integrate bit flip detection mechanisms into the lowest storage layer and the query processing layer in such a way that the remaining data management system and the user can stay oblivious of any error detection. This includes both base columns and pointer-heavy index structures such as the ubiquitous B-Tree. Additionally, our approach allows adaptable, on-the-fly bit flip detection during query processing, with only very little impact on query latency. AN coding allows to recode intermediate results with virtually no performance penalty. We support our claims by providing exhaustive runtime and throughput measurements throughout the whole thesis and with an end-to-end evaluation using the Star Schema Benchmark. To the best of our knowledge, we are the first to present such holistic and fast bit flip detection in a large software infrastructure such as in-memory data management systems. Finally, most of the source code fragments used to obtain the results in this thesis are open source and freely available.:1 INTRODUCTION 1.1 Contributions of this Thesis 1.2 Outline 2 PROBLEM DESCRIPTION AND RELATED WORK 2.1 Reliable Data Management on Reliable Hardware 2.2 The Shift Towards Unreliable Hardware 2.3 Hardware-Based Mitigation of Bit Flips 2.4 Data Management System Requirements 2.5 Software-Based Techniques For Handling Bit Flips 2.5.1 Operating System-Level Techniques 2.5.2 Compiler-Level Techniques 2.5.3 Application-Level Techniques 2.6 Summary and Conclusions 3 ANALYSIS OF CODING TECHNIQUES 3.1 Selection of Error Codes 3.1.1 Hamming Coding 3.1.2 XOR Checksums 3.1.3 AN Coding 3.1.4 Summary and Conclusions 3.2 Probabilities of Silent Data Corruption 3.2.1 Probabilities of Hamming Codes 3.2.2 Probabilities of XOR Checksums 3.2.3 Probabilities of AN Codes 3.2.4 Concrete Error Models 3.2.5 Summary and Conclusions 3.3 Throughput Considerations 3.3.1 Test Systems Descriptions 3.3.2 Vectorizing Hamming Coding 3.3.3 Vectorizing XOR Checksums 3.3.4 Vectorizing AN Coding 3.3.5 Summary and Conclusions 3.4 Comparison of Error Codes 3.4.1 Effectiveness 3.4.2 Efficiency 3.4.3 Runtime Adaptability 3.5 Performance Optimizations for AN Coding 3.5.1 The Modular Multiplicative Inverse 3.5.2 Faster Softening 3.5.3 Faster Error Detection 3.5.4 Comparison to Original AN Coding 3.5.5 The Multiplicative Inverse Anomaly 3.6 Summary 4 BIT FLIP DETECTING STORAGE 4.1 Column Store Architecture 4.1.1 Logical Data Types 4.1.2 Storage Model 4.1.3 Data Representation 4.1.4 Data Layout 4.1.5 Tree Index Structures 4.1.6 Summary 4.2 Hardened Data Storage 4.2.1 Hardened Physical Data Types 4.2.2 Hardened Lightweight Compression 4.2.3 Hardened Data Layout 4.2.4 UDI Operations 4.2.5 Summary and Conclusions 4.3 Hardened Tree Index Structures 4.3.1 B-Tree Verification Techniques 4.3.2 Justification For Further Techniques 4.3.3 The Error Detecting B-Tree 4.4 Summary 5 BIT FLIP DETECTING QUERY PROCESSING 5.1 Column Store Query Processing 5.2 Bit Flip Detection Opportunities 5.2.1 Early Onetime Detection 5.2.2 Late Onetime Detection 5.2.3 Continuous Detection 5.2.4 Miscellaneous Processing Aspects 5.2.5 Summary and Conclusions 5.3 Hardened Intermediate Results 5.3.1 Materialization of Hardened Intermediates 5.3.2 Hardened Bitmaps 5.4 Summary 6 END-TO-END EVALUATION 6.1 Prototype Implementation 6.1.1 AHEAD Architecture 6.1.2 Diversity of Physical Operators 6.1.3 One Concrete Operator Realization 6.1.4 Summary and Conclusions 6.2 Performance of Individual Operators 6.2.1 Selection on One Predicate 6.2.2 Selection on Two Predicates 6.2.3 Join Operators 6.2.4 Grouping and Aggregation 6.2.5 Delta Operator 6.2.6 Summary and Conclusions 6.3 Star Schema Benchmark Queries 6.3.1 Query Runtimes 6.3.2 Improvements Through Vectorization 6.3.3 Storage Overhead 6.3.4 Summary and Conclusions 6.4 Error Detecting B-Tree 6.4.1 Single Key Lookup 6.4.2 Key Value-Pair Insertion 6.5 Summary 7 SUMMARY AND CONCLUSIONS 7.1 Future Work A APPENDIX A.1 List of Golden As A.2 More on Hamming Coding A.2.1 Code examples A.2.2 Vectorization BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLES LIST OF LISTINGS LIST OF ACRONYMS LIST OF SYMBOLS LIST OF DEFINITIONS info:eu-repo/classification/ddc/004 ddc:004
545	Det är Lugnt, vi tar det Klarna! : A Qualitative Study of Gen Z’s Purchase Intentions for Fashion Using BNPL in an online and in-store context. Persson, Amanda, Millner, Alexandra January 2023 (has links) Abstract Background: The evolution of technology has transformed the way we shop, with BNPL services like Klarna and Qliro gaining popularity among consumers. This form of short-term financing offers flexibility by allowing customers to either pay later or divide their costs into interest-free installments. While BNPL is initially associated with online shopping, BNPL has expanded to physical stores, enabling customers to choose from even more payment options. The fashion industry has especially benefited from the evolving BNPL, as it facilitates easier exploration of new styles and product comparison from the comfort of one’s home. Furthermore, BNPL users are more likely to make purchases, spend more, and exhibit higher customer loyalty. Purpose: The purpose of this study is to explore the factors affecting the intention to use BNPL technology and how they differ in an online and in-store context. Method: For the researchers to accomplish the purpose of this study, a qualitative research strategy was applied. The empirical data was obtained through semi-structured interviews held with Gen Z participants residing in Jönköping, who had previous experience using BNPL either in-store, online, or both. The data was later analyzed and interpreted using an abductive approach, using thematic analysis. Conclusion: The research findings indicate that multiple factors influence purchase intention when using BNPL in both online and in-store contexts. A theoretical model, previous research, and empirical findings was incorporated for the study’s revised research framework including perceived usefulness, perceived ease of use, perceived risk, trust and security pain of payment and attitudes. For the online context, gen Z perceived all factors included in the revised research framework were found to have a noteworthy influence on purchase intentions using BNPL in the fashion industry. Moreover, the study identified both differences and similarities between the online and in-store context. For the in-store context, five out of the six factors in the revised research framework were perceived to be important for gen Z when purchasing fashion. Further the study suggests that there may be relational patterns between the factors, however the study did not examine relationships or degrees of associations between the factors, leaving room for future investigation. BNPL buy now pay later purchase intentions online context in-store context payment method fashion Gen Z. Business Administration Företagsekonomi
546	Contributions to Performance Modeling and Management of Data Centers Yanggratoke, Rerngvit January 2013 (has links) Over the last decade, Internet-based services, such as electronic-mail, music-on-demand, and social-network services, have changed the ways we communicate and access information. Usually, the key functionality of such a service is in backend components, which are located in a data center, a facility for hosting computing systems and related equipment. This thesis focuses on two fundamental problems related to the management, dimensioning, and provisioning of such backend components. The first problem centers around resource allocation for a large-scale cloud environment. Data centers have become very large; they often contain hundreds of thousands of machines and applications. In such a data center, resource allocation cannot be efficiently achieved through a traditional management system that is centralized in nature. Therefore, a more scalable solution is needed. To address this problem, we have developed and evaluated a scalable and generic protocol for resource allocation. The protocol is generic in the sense that it can be instantiated for different management objectives through objective functions. The protocol jointly allocates CPU, memory, and network resources to applications that are hosted by the cloud. We prove that the protocol converges to a solution, if an objective function satisfies a certain property. We perform a simulation study of the protocol for realistic scenarios. Simulation results suggest that the quality of the allocation is independent of the system size, up to 100,000 machines and applications, for the management objectives considered. The second problem is related to performance modeling of a distributed key-value store. The specific distributed key-value store we focus on in this thesis is the Spotify storage system. Understanding the performance of the Spotify storage system is essential for achieving a key quality of service objective, namely that the playback latency of a song is sufficiently low. To address this problem, we have developed and evaluated models for predicting the performance of a distributed key-value store for a lightly loaded system. First, we developed a model that allows us to predict the response time distribution of requests. Second, we modeled the capacity of the distributed key-value store for two different object allocation policies. We evaluate the models by comparing model predictions with measurements from two different environments: our lab testbed and a Spotify operational environment. We found that the models are accurate in the sense that the prediction error, i.e., the difference between the model predictions and the measurements from the real systems, is at most 11%. / <p>QC 20131001</p> Cloud computing distributed management resource allocation gossip protocols management objectives distributed object store object allocation policy Computer Systems Datorsystem
547	Determining the optimal location for a large organic food store in Montreal Li, Beibei, 1980- January 2007 (has links) No description available.
548	Efficient Transaction Processing in SAP HANA Database: The End of a Column Store Myth Sikka, Vishal, Färber, Franz, Lehner, Wolfgang, Cha, Sang Kyun, Peh, Thomas, Bornhövd, Christof 11 August 2022 (has links) The SAP HANA database is the core of SAP's new data management platform. The overall goal of the SAP HANA database is to provide a generic but powerful system for different query scenarios, both transactional and analytical, on the same data representation within a highly scalable execution environment. Within this paper, we highlight the main features that differentiate the SAP HANA database from classical relational database engines. Therefore, we outline the general architecture and design criteria of the SAP HANA in a first step. In a second step, we challenge the common belief that column store data structures are only superior in analytical workloads and not well suited for transactional workloads. We outline the concept of record life cycle management to use different storage formats for the different stages of a record. We not only discuss the general concept but also dive into some of the details of how to efficiently propagate records through their life cycle and moving database entries from write-optimized to read-optimized storage formats. In summary, the paper aims at illustrating how the SAP HANA database is able to efficiently work in analytical as well as transactional workload environments. info:eu-repo/classification/ddc/004 ddc:004
549	Exploring Designs for Enhancing the In-store Customer Experience through Digital Product Information in Fashion Retail / Undersökning av designförslag för att förstärka kundupplevelsen i fysiska butiker genom digital produktinformation i modedetaljhandeln Jonsson, Martina January 2018 (has links) The ongoing consumer transition from offline to online shopping in the fashion retail industry requires retailers to take action. Not only do consumers shop more online, they also go online for research of retail products. Forecasts tell that bringing the online experience to offline stores might bridge the gap between the two channels. The online experience provides high-end digital content, and puts a demand on the product information offline as this was found crucial for the customer experience. The marketing possibilities in-store was found to be an advantage to bricks-and-mortar retailers. Thus, this study aims to investigate how the customer experience can be enhanced in retail bricks-and-mortar stores through digital product information. A survey was conducted to identify user requirements in terms of product information. An augmented reality prototype was formed to satisfy the identified user requirements. The prototype was tested in two user studies that evaluated the content, visualization, interaction and satisfaction. The prototype was iterated between the two user studies. The most crucial parameters of fashion retail product information were established, together with implications for the visual representation and interaction. It was found that there were unfulfilled user needs with existing service options, which were satisfied with the use of an augmented reality prototype for product information retrieval. The use of AR for this purpose also proved to be able to contribute to an omnichannel solution for multi-channel retailers. The conclusion was thus that the customer in-store experience could be enhanced by the introduction of an augmented reality prototype for product information retrieval, taking into account the implications for content, visualization and interaction provided in this study. / Den pågående konsumentövergången från offline till online shopping i modedetaljhandeln kräver att detaljhandlare vidtar åtgärder. Förutom att konsumenterna handlar mer online, använder de också onlinebutiker allt mer för undersökning av produkter. Prognoser förtäljer att införandet av onlineupplevelsen till offline-butiker kan överbrygga klyftan mellan de två kanalerna. Onlineupplevelsen tillhandahåller högklassigt digitalt innehåll och ställer krav på produktinformationen offline, eftersom denna konstaterades vara en avgörande faktor för kundupplevelsen. Marknadsföringsmöjligheterna i fysiska butiker har visat sig vara en fördel för detaljhandlare som existerar i offlinekanalen. Således syftar denna studie till att undersöka hur kundupplevelsen kan förstärkas i fysiska detaljhandelsbutiker genom digital produktinformation. En enkätundersökning genomfördes för att identifiera användarnas krav när det gäller produktinformation. En augmented reality-prototyp formades i anspråk att tillfredsställa de identifierade användarkraven. Prototypen testades i två användarstudier, som utvärderade prototypens innehåll, visualisering, interaktion och tillfredsställelse. Prototypen itererades mellan de två användarstudierna. De mest kritiska parametrarna för produktinformation fastställdes, tillsammans med implikationer för visuell representation och interaktion. Det kunde konstateras att en AR-prototyp kunde tillfredsställa ännu omötta användarbehov för inhämtning av produktinformation. Användningen av AR för detta ändamål visade sig också ha möjligheten att bidra till en omnichannel-lösning för modehandlare som existerar i flera kanaler. Slutsatsen var således att kundupplevelsen i fysiska detaljhandelsbutiker kan förstärkas genom införandet av en augmented reality-prototyp för produktinformationsinhämtning, genom att ta hänsyn till de implikationer gällande innehåll, visualisering och interaktion tillhandahållna i denna studie. Media and Communication Technology Medieteknik
550	Faster Reading with DuckDB and Arrow Flight on Hopsworks : Benchmark and Performance Evaluation of Offline Feature Stores / Snabbare läsning med DuckDB och Arrow Flight på Hopsworks : Benchmark och prestandautvärdering av offline Feature Stores Khazanchi, Ayushman January 2023 (has links) Over the last few years, Machine Learning has become a huge field with “Big Tech” companies sharing their experiences building machine learning infrastructure. Feature Stores, used as centralized data repositories for machine learning features, are seen as a central component to operational and scalable machine learning. With the growth in machine learning, there is, naturally, a tremendous growth in data used for training. Most of this data tends to sit in Parquet files in cloud object stores or data lakes and is used either directly from files or in-memory where it is used in exploratory data analysis and small batches of training. A majority of the data science involved in machine learning is done in Python, but the infrastructure surrounding it is not always directly compatible with Python. Often, query processing engines and feature stores end up having their own Domain Specific Language or require data scientists to write SQL code, thus leading to some level of ‘transpilation’ overhead across the system. This overhead can not only introduce errors but can also add up to significant time and productivity cost down the line. In this thesis, we conduct a systems research on the performance of offline feature stores and identify ways that allow us to pull out data from feature stores in a fast and efficient way. We conduct a model evaluation based on benchmark tests that address common exploratory data analysis and training use cases. We find that in the Hopsworks feature store, with the use of state-of-the-art, storage-optimized, format-aware, and vector execution-based query processing engine as well as using Arrow protocol from start to finish, we are able to see significant improvements in both creating batch training data (feature value reads) and creating Point-In-Time Correct training data. For batch training data created in-memory, Hopsworks shows an average speedup of 27x over Databricks (5M and 10M scale factors), 18x over Vertex, and 8x over Sagemaker across all scale factors. For batch training data as parquet files, Hopsworks shows a speedup of 5x over Databricks (5M, 10M, and 20M scale factors), 13x over Vertex, and 6x over Sagemaker across all scale factors. For creating in-memory Point-In-Time Correct training data, Hopsworks shows an average speedup of 8x over Databricks, 6x over Vertex, and 3x over Sagemaker across all scale factors. Similary for PIT-Correct training data created as file, Hopsworks shows an average speedup of 9x over Databricks, 8x over Vertex, and 6x over Sagemaker across all scale factors. Through the analysis of these experimental results and the underlying infrastructure, we identify the reasons for this performance gap and examine the strengths and limitations of the design. / Under de senaste åren har maskininlärning blivit ett stort område där ”Big Tech”-företag delar med sig av sina erfarenheter av att bygga infrastruktur för maskininlärning. Feature Stores, som används som centraliserade datalager för maskininlärningsfunktioner, ses som en central komponent för operativ och skalbar maskininlärning. Med tillväxten inom maskininlärning följer naturligtvis en enorm tillväxt av data som används för utbildning. De flesta av dessa data finns i Parquet-filer som lagras i molnobjektsbutiker eller datasjöar och används antingen direkt från filer eller i minnet där de används i explorativ dataanalys och små utbildningsbatcher. En majoritet av datavetenskapen inom maskininlärning görs i Python, men den omgivande infrastrukturen är inte alltid direkt kompatibel med Python. Ofta har motorer för frågebehandling och feature stores sina egna domänspecifika språk eller kräver att datavetare skriver SQL-kod, vilket leder till en viss nivå av `transpileringsoverhead' i hela systemet. Denna overhead kan inte bara leda till fel utan också till betydande tids- och produktivitetskostnader i slutändan. I den här avhandlingen genomför vi en systemstudie av prestandan hos offline feature stores och identifierar sätt som gör att vi kan ta fram data från feature stores på ett snabbt och effektivt sätt. Vi genomför en modellutvärdering baserad på benchmarktester som tar upp vanliga användningsfall för explorativ dataanalys och utbildning. Vi konstaterar att vi i Hopsworks feature store, med hjälp av en toppmodern, lagringsoptimerad, formatmedveten och vektorexekveringsbaserad frågebehandlingsmotor samt Arrow-protokoll från början till slut, kan se betydande förbättringar både när det gäller att skapa batchutbildningsdata (läsa featurevärden) och skapa Point-In-Time Correct-utbildningsdata. För batchutbildningsdata som skapats i minnet visar Hopsworks en genomsnittlig hastighet på 27x över Databricks (5M och 10M skalfaktorer), 18x över Vertex och 8x över Sagemaker över alla skalfaktorer. För batch-träningsdata som parkettfiler visar Hopsworks en hastighetsökning på 5x över Databricks (5M, 10M och 20M skalfaktorer), 13x över Vertex och 6x över Sagemaker över alla skalfaktorer. För att skapa Point-In-Time Correct-träningsdata i minnet visar Hopsworks en genomsnittlig hastighet på 8x över Databricks, 6x över Vertex och 3x över Sagemaker över alla skalfaktorer. På samma sätt för PIT-Correct träningsdata som skapats som fil, visar Hopsworks en genomsnittlig hastighet på 9x över Databricks, 8x över Vertex och 6x över Sagemaker över alla skalfaktorer. Genom att analysera dessa experimentella resultat och den underliggande infrastrukturen identifierar vi orsakerna till denna prestandaklyfta och undersöker styrkorna och begränsningarna i designen. Machine Learning Feature Store Distributed Systems MLOps Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik Computer Engineering Datorteknik Computer Systems Datorsystem

Search results