Global ETD Search

131	Big Data Analytics-enabled Sensing Capability and Organizational Outcomes: Assessing the Mediating Effects of Business Analytics Culture Fosso Wamba, S., Queiroz, M.M., Wu, L., Sivarajah, Uthayasankar 14 October 2020 (has links) Yes / With the emergence of information and communication technologies, organizations worldwide have been putting in meaningful efforts towards developing and gaining business insights by combining technology capability, management capability and personnel capability to explore data potential, which is known as big data analytics (BDA) capability. In this context, variables such as sensing capability—which is related to the organization’s ability to explore the market and develop opportunities—and analytics culture—which refers to the organization’s practices and behavior patterns of its analytical principles—play a fundamental role in BDA initiatives. However, there is a considerable literature gap concerning the effects of BDA-enabled sensing capability and analytics culture on organizational outcomes (i.e., customer linking capability, financial performance, market performance, and strategic business value) and on how important the organization’s analytics culture is as a mediator in the relationship between BDA-enabled sensing capability and organizational outcomes. Therefore, this study aims to investigate these relationships. And to attain this goal, we developed a conceptual model supported by dynamics capabilities, BDA, and analytics culture. We then validated our model by applying partial least squares structural equation modeling. The findings showed not only the positive effect of the BDA-enabled sensing capability and analytics culture on organizational outcomes but also the mediation effect of the analytics culture. Such results bring valuable theoretical implications and contributions to managers and practitioners. Data analytics Dynamic capabilities Data-drive culture Organisational outcomes Sensing capabilities
132	Assessing the impact of big data analytics on decision-making processes, forecasting, and performance of a firm Chatterjee, S., Chaudhuri, R., Gupta, S., Sivarajah, Uthayasankar, Bag, S. 03 September 2023 (has links) Yes / There are various kinds of applications of BDA in the firms. Not many studies are there which deal with the impact of BDA towards issues like forecasting, decision-making, as well as performance of the firms simultaneously. So, there exists a gap in the research. In such a background, this study aims at examining the impacts of BDA on the process of decision-making, forecasting, as well as firm performance. Using resource-based view (RBV) as well as dynamic capability view (DCV) and related research studies, a research model was proposed conceptually. This conceptual model was validated taking help of PLS-SEM approach considering 366 respondents from Indian firms. This study has highlighted that smart decision making and accurate forecasting process can be achieved by using BDA. This research has demonstrated that there is a considerable influence of adoption of BDA on decision making process, forecasting process, as well as overall firm performance. However, the present study suffers from the fact that the study results depend on the cross-sectional data which could invite defects of causality and endogeneity bias. The present research work also found that there is no impact of different control variables on the firm's performance. Big data analytics Decision-making Forecasting Financial performance Operational performance Dynamic capability
133	Challenges in using a Mixed-Method approach to explore the relationship between big data analytics capabilities and market performance Olabode, Oluwaseun E., Boso, N., Hultman, M., Leonidou, C.N. 19 September 2023 (has links) No / This case study is based on a research study that examined the relationship between big data analytics capability and market performance. The study investigated the intervening role of disruptive business models and the contingency role of competitive intensity on the relationship between big data analytics capability and market performance using both qualitative and quantitative methods. This case-study will focus on the qualitative and quantitative methods utilised including NVivo and IBM SPSS to conduct qualitative analysis and quantitative analysis. You will learn the factors to consider when conducting a mixed-methods study and develop the ability to apply similar analytical techniques to your research context. Big data analytics Market performance Business models Mediation Organisations Scale Surveying
134	A study on big data analytics and innovation: From technological and business cycle perspectives Sivarajah, Uthayasankar, Kumar, S., Kumar, V., Chatterjee, S., Li, Jing 10 March 2024 (has links) Yes / In today’s rapidly changing business landscape, organizations increasingly invest in different technologies to enhance their innovation capabilities. Among the technological investment, a notable development is the applications of big data analytics (BDA), which plays a pivotal role in supporting firms’ decision-making processes. Big data technologies are important factors that could help both exploratory and exploitative innovation, which could affect the efforts to combat climate change and ease the shift to green energy. However, studies that comprehensively examine BDA’s impact on innovation capability and technological cycle remain scarce. This study therefore investigates the impact of BDA on innovation capability, technological cycle, and firm performance. It develops a conceptual model, validated using CB-SEM, through responses from 356 firms. It is found that both innovation capability and firm performance are significantly influenced by big data technology. This study highlights that BDA helps to address the pressing challenges of climate change mitigation and the transition to cleaner and more sustainable energy sources. However, our results are based on managerial perceptions in a single country. To enhance generalizability, future studies could employ a more objective approach and explore different contexts. Multidimensional constructs, moderating factors, and rival models could also be considered in future studies. Big data analytics Technological cycle Technological innovation Firm performance Business cycle Incremental change
135	Integrated Predictive Modeling and Analytics for Crisis Management Alhamadani, Abdulaziz Abdulrhman 15 May 2024 (has links) The surge in the application of big data and predictive analytics in fields of crisis management, such as pandemics and epidemics, highlights the vital need for advanced research in these areas, particularly in the wake of the COVID-19 pandemic. Traditional methods, which typically rely on historical data to forecast future trends, fall short in addressing the complex and ever-changing nature of challenges like pandemics and public health crises. This inadequacy is further underscored by the pandemic's significant impact on various sectors, notably healthcare, government, and the hotel industry. Current models often overlook key factors such as static spatial elements, socioeconomic conditions, and the wealth of data available from social media, which are crucial for a comprehensive understanding and effective response to these multifaceted crises. This thesis employs spatial forecasting and predictive analytics to address crisis management in several distinct but interrelated contexts: the COVID-19 pandemic, the opioid crisis, and the impact of the pandemic on the hotel industry. The first part of the study focuses on using big data analytics to explore the relationship between socioeconomic factors and the spread of COVID-19 at the zip code level, aiming to predict high-risk areas for infection. The second part delves into the opioid crisis, utilizing semi-supervised deep learning techniques to monitor and categorize drug-related discussions on Reddit. The third part concentrates on developing spatial forecasting and providing explanations of the rising epidemic of drug overdose fatalities. The fourth part of the study extends to the realm of the hotel industry, aiming to optimize customer experience by analyzing online reviews and employing a localized Large Language Model to generate future customer trends and scenarios. Across these studies, the thesis aims to provide actionable insights and comprehensive solutions for effectively managing these major crises. For the first work, the majority of current research in pandemic modeling primarily relies on historical data to predict dynamic trends such as COVID-19. This work makes the following contributions in spatial COVID-19 pandemic forecasting: 1) the development of a unique model solely employing a wide range of socioeconomic indicators to forecast areas most susceptible to COVID-19, using detailed static spatial analysis, 2) identification of the most and least influential socioeconomic variables affecting COVID-19 transmission within communities, 3) construction of a comprehensive dataset that merges state-level COVID-19 statistics with corresponding socioeconomic attributes, organized by zip code. For the second work, we make the following contributions in detecting drug Abuse crisis via social media: 1) enhancing the Dynamic Query Expansion (DQE) algorithm to dynamically detect and extract evolving drug names in Reddit comments, utilizing a list curated from government and healthcare agencies, 2) constructing a textual Graph Convolutional Network combined with word embeddings to achieve fine-grained drug abuse classification in Reddit comments, identifying seven specific drug classes for the first time, 3) conducting extensive experiments to validate the framework, outperforming six baseline models in drug abuse classification and demonstrating effectiveness across multiple types of embeddings. The third study focuses on developing spatial forecasting and providing explanations of the escalating epidemic of drug overdose fatalities. Current research in this field has shown a deficiency in comprehensive explanations of the crisis, spatial analyses, and predictions of high-risk zones for drug overdoses. Addressing these gaps, this study contributes in several key areas: 1) Establishing a framework for spatially forecasting drug overdose fatalities predominantly affecting U.S. counties, 2) Proposing solutions for dealing with scarce and heterogeneous data sets, 3) Developing an algorithm that offers clear and actionable insights into the crisis, and 4) Conducting extensive experiments to validate the effectiveness of our proposed framework. In the fourth study, we address the profound impact of the pandemic on the hotel industry, focusing on the optimization of customer experience. Traditional methodologies in this realm have predominantly relied on survey data and limited segments of social media analytics. Those methods are informative but fall short of providing a full picture due to their inability to include diverse perspectives and broader customer feedback. Our study aims to make the following contributions: 1) the development of an integrated platform that distinguishes and extracts positive and negative Memorable Experiences (MEs) from online customer reviews within the hotel industry, 2) The incorporation of an advanced analytical module that performs temporal trend analysis of MEs, utilizing sophisticated data mining algorithms to dissect customer feedback on a monthly and yearly scale, 3) the implementation of an advanced tool that generates prospective and unexplored Memorable Experiences (MEs) by utilizing a localized Large Language Model (LLM) with keywords extracted from authentic customer experiences to aid hotel management in preparing for future customer trends and scenarios. Building on the integrated predictive modeling approaches developed in the earlier parts of this dissertation, this final section explores the significant impacts of the COVID-19 pandemic on the airline industry. The pandemic has precipitated substantial financial losses and operational disruptions, necessitating innovative crisis management strategies within this sector. This study introduces a novel analytical framework, EAGLE (Enhancing Airline Groundtruth Labels and Review rating prediction), which utilizes Large Language Models (LLMs) to improve the accuracy and objectivity of customer sentiment analysis in strategic airline route planning. EAGLE leverages LLMs for zero-shot pseudo-labeling and zero-shot text classification, to enhance the processing of customer reviews without the biases of manual labeling. This approach streamlines data analysis, and refines decision-making processes which allows airlines to align route expansions with nuanced customer preferences and sentiments effectively. The comprehensive application of LLMs in this context underscores the potential of predictive analytics to transform traditional crisis management strategies by providing deeper, more actionable insights. / Doctor of Philosophy / In today's digital age, where vast amounts of data are generated every second, understanding and managing crises like pandemics or economic disruptions has become increasingly crucial. This dissertation explores the use of advanced predictive modeling and analytics to manage various crises, significantly enhancing how predictions and responses to these challenges are developed. The first part of the research uses data analysis to identify areas at higher risk during the COVID-19 pandemic, focusing on how different socioeconomic factors can affect virus spread at a local level. This approach moves beyond traditional methods that rely on past data, providing a more dynamic way to forecast and manage public health crises. The study then examines the opioid crisis by analyzing social media platforms like Reddit. Here, a method was developed to automatically detect and categorize discussions about drug abuse. This technique aids in understanding how drug-related conversations evolve online, providing insights that could guide public health responses and policy-making. In the hospitality sector, customer reviews were analyzed to improve service quality in hotels. By using advanced data analysis tools, key trends in customer experiences were identified, which can help businesses adapt and refine their services in real-time, enhancing guest satisfaction. Finally, the study extends to the airline industry, where a model was developed that uses customer feedback to improve airline services and route planning. This part of the research shows how sophisticated analytics can help airlines better understand and meet traveler needs, especially during disruptions like the pandemic. Overall, the dissertation provides methods to better manage crises and illustrates the vast potential of predictive analytics in making informed decisions that can significantly mitigate the impacts of future crises. This research is vital for anyone—from government officials to business leaders—looking to harness the power of data for crisis management and decision-making. Spatial forecasting Big data analytics social-media data mining pandemic forecasting crisis mitigation
136	Centralized and distributed learning methods for predictive health analytics Brisimi, Theodora S. 02 November 2017 (has links) The U.S. health care system is considered costly and highly inefficient, devoting substantial resources to the treatment of acute conditions in a hospital setting rather than focusing on prevention and keeping patients out of the hospital. The potential for cost savings is large; in the U.S. more than $30 billion are spent each year on hospitalizations deemed preventable, 31% of which is attributed to heart diseases and 20% to diabetes. Motivated by this, our work focuses on developing centralized and distributed learning methods to predict future heart- or diabetes- related hospitalizations based on patient Electronic Health Records (EHRs). We explore a variety of supervised classification methods and we present a novel likelihood ratio based method (K-LRT) that predicts hospitalizations and offers interpretability by identifying the K most significant features that lead to a positive prediction for each patient. Next, assuming that the positive class consists of multiple clusters (hospitalized patients due to different reasons), while the negative class is drawn from a single cluster (non-hospitalized patients healthy in every aspect), we present an alternating optimization approach, which jointly discovers the clusters in the positive class and optimizes the classifiers that separate each positive cluster from the negative samples. We establish the convergence of the method and characterize its VC dimension. Last, we develop a decentralized cluster Primal-Dual Splitting (cPDS) method for large-scale problems, that is computationally efficient and privacy-aware. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the agents to collaborate, while keeping every participant's data private. cPDS is proved to have an improved convergence rate compared to existing centralized and decentralized methods. We test all methods on real EHR data from the Boston Medical Center and compare results in terms of prediction accuracy and interpretability. Computer science Centralized and distributed methods Data analytics Diabetes hospitalizations Heart hospitalizations Machine learning Predictive health analytics
137	Forecasting Large-scale Time Series Data Hartmann, Claudio 03 December 2018 (has links) The forecasting of time series data is an integral component for management, planning, and decision making in many domains. The prediction of electricity demand and supply in the energy domain or sales figures in market research are just two of the many application scenarios that require thorough predictions. Many of these domains have in common that they are influenced by the Big Data trend which also affects the time series forecasting. Data sets consist of thousands of temporal fine grained time series and have to be predicted in reasonable time. The time series may suffer from noisy behavior and missing values which makes modeling these time series especially hard, nonetheless accurate predictions are required. Furthermore, data sets from different domains exhibit various characteristics. Therefore, forecast techniques have to be flexible and adaptable to these characteristics. Long-established forecast techniques like ARIMA and Exponential Smoothing do not fulfill these new requirements. Most of the traditional models only represent one individual time series. This makes the prediction of thousands of time series very time consuming, as an equally large number of models has to be created. Furthermore, these models do not incorporate additional data sources and are, therefore, not capable of compensating missing measurements or noisy behavior of individual time series. In this thesis, we introduce CSAR (Cross-Sectional AutoRegression Model), a new forecast technique which is designed to address the new requirements on forecasting large-scale time series data. It is based on the novel concept of cross-sectional forecasting that assumes that time series from the same domain follow a similar behavior and represents many time series with one common model. CSAR combines this new approach with the modeling concept of ARIMA to make the model adaptable to the various properties of data sets from different domains. Furthermore, we introduce auto.CSAR, that helps to configure the model and to choose the right model components for a specific data set and forecast task. With CSAR, we present a new forecast technique that is suited for the prediction of large-scale time series data. By representing many time series with one model, large data sets can be predicted in short time. Furthermore, using data from many time series in one model helps to compensate missing values and noisy behavior of individual series. The evaluation on three real world data sets shows that CSAR outperforms long-established forecast techniques in accuracy and execution time. Finally, with auto.CSAR, we create a way to apply CSAR to new data sets without requiring the user to have extensive knowledge about our new forecast technique and its configuration. info:eu-repo/classification/ddc/004 ddc:004
138	Navigating the Data Stream - Enhancing Inbound Logistics Processes through Big Data Analytics : A Study of Information Processing Capabilities facilitating Information Utilisation in Warehouse Resource Planning Zuber, Johannes, Hahnewald, Anton January 2024 (has links) Background: Nowadays an ever-increasing amount of data is generated which is why companies face the challenge of extracting valuable information from these data streams. An enhanced Information Utilisation carriers the opportunity for improved decision-making. This could address challenges that come along with delayed trucks in inbound logistics and associated warehouse resource planning. Purpose: This study aims to deepen the understanding of Big Data Analytics capabilities that foster Information Integration and decision support to facilitate Information Utilisation. We apply this to the context of warehouse resource replanning in inbound logistics in case of unexpected short-term deviations. Method: We conducted a qualitative research study, comprising a Ground Theory approach in combination with an abductive reasoning. Derived from a literature review we adapted a framework and proposed an own conceptual framework after conducting and analysing 14 semi-structured interviews with inbound logistics practitioners and experts. Conclusion: We identified four interconnected capabilities that facilitate Information Utilisation. Data Generation Capabilities and Data Integration & Management Capabilities contribute to improved Information Integration, establishing a base for subsequent data analytics. Consequently, Data Analytics Capabilities and Data Interpretation Capabilities lead to enhanced decision support, facilitating Information Utilisation. Big Data Analytics Inbound Logistics Information Integration Information Utilisation Business Administration Företagsekonomi
139	Development of systemic methods to improve management techniques based on Balanced Scorecard in Manufacturing Environment / Desarrollo de métodos sistémicos para la mejora de las técnicas de gestión basadas en el cuadro integral de mando en entornos de fabricación Sánchez Márquez, Rafael 07 January 2020 (has links) Tesis por compendio / [ES] El "Balanced Scorecard" (BSC) como "Performance Management System" (PMS) se ha difundido por todo el mundo desde que Kaplan y Norton (1992) establecieron sus fundamentos teóricos. Kaplan (2009) afirmó que el uso del BSC y, especialmente, la conversión de estrategias en acciones era más un arte que una ciencia. La falta de evidencia de la existencia de relaciones de causa-efecto entre Key Performance Indicatiors (KPIs) de diferentes perspectivas y de métodos sólidos y científicos para su uso, eran algunas de las causas de sus problemas. Kaplan emplazó a la comunidad científica a confirmar los fundamentos del BSC y a desarrollar métodos científicos. Varios trabajos han intentado mejorar el uso del BSC. Algunos utilizan herramientas heurísticas, que tratan con variables cualitativas. Otros, métodos estadísticos y datos reales de KPI, pero aplicados a un período específico, que es una visión estática y que requiere muestras a largo plazo y recursos muy especializados cada vez que los ejecutivos necesitan evaluar el impacto de las estrategias. Esta tesis también aborda el retraso entre variables de "entrada" y de "salida", además de la falta de trabajos centrados en el entorno de fabricación, que constituye su objetivo principal. El primer objetivo de este trabajo es desarrollar una metodología para evaluar y seleccionar los principales KPI de salida, que explican el desempeño de toda la compañía. Usa las relaciones entre variables de diferentes dimensiones descritas por Kaplan. Este método también considera el retraso entre las variables. El resultado es un conjunto de KPI principales de salida, que resume todo el BSC, lo que reduce drásticamente su complejidad. El segundo objetivo es desarrollar una metodología gráfica que utilice ese conjunto de KPI principales de salida para evaluar la efectividad de las estrategias. Actualmente, los gráficos son comunes entre los profesionales, pero solo Breyfogle (2003) ha intentado distinguir entre un cambio real significativo y un cambio debido a la incertidumbre de usar muestras. Este trabajo desarrolla aún más el método de Breyfogle para abordar sus limitaciones. El tercer objetivo es desarrollar un método que, una vez demostrada gráficamente la efectividad de las estrategias, cuantifique su impacto en el conjunto de KPI principales de salida. 10 El cuarto y último método desarrollado se centra en el diagnóstico del sistema de gestión de la calidad para revelar cómo funciona en términos de las relaciones entre los KPI internos (dentro de la empresa) y externos (relacionados con el cliente) para mejorar la satisfacción del cliente. La aplicación de los cuatro métodos en la secuencia correcta constituye una metodología completa que se puede aplicar en cualquier empresa de fabricación para mejorar el uso del cuadro de mando integral como herramienta científica. Sin embargo, los profesionales pueden optar por aplicar solo uno de los cuatro métodos o una combinación de ellos, ya que la aplicación de cada uno de ellos es independiente y tiene sus propios objetivos y resultados. / [CA] El "Balanced Scorecard" (BSC) com "Performance Management System" (PMS) s'ha difós per tot el món des que Kaplan i Norton (1992) van establir els seus fonaments teòrics. Kaplan (2009) va afirmar que l'ús del BSC i, especialment, la conversió d'estratègies en accions era més un art que una ciència. La manca d'evidència de l'existència de relacions de causa-efecte entre Key Performance Indicatiors (KPIs) de diferents perspectives i de mètodes sòlids i científics pel seu ús, eren algunes de les causes dels seus problemes. Kaplan va emplaçar a la comunitat científica a confirmar els fonaments del BSC i a desenvolupar mètodes científics. Diversos treballs han intentat millorar l'ús del BSC. Alguns utilitzen eines heurístiques, que tracten amb variables qualitatives. D'altres, mètodes estadístics i dades reals de KPI, però aplicats a un període específic, que és una visió estàtica i que requereix mostres a llarg termini i recursos molt especialitzats cada vegada que els executius necessiten avaluar l'impacte de les estratègies. Aquesta tesi també aborda el retard entre variables d ' "entrada" i de "eixida", a més de la manca de treballs centrats en l'entorn de fabricació, que és el seu objectiu principal. El primer objectiu d'aquest treball és desenvolupar una metodologia per avaluar i seleccionar els principals KPI d'eixida, que expliquen l'acompliment de tota la companyia. Es fa servir les relacions entre variables de diferents dimensions descrites per Kaplan. Aquest mètode també considera el retard entre les variables. El resultat és un conjunt de KPI principals d'eixida, que resumeix tot el BSC, i que redueix dràsticament la seua complexitat. El segon objectiu és desenvolupar una metodologia gràfica que utilitze aquest conjunt de KPI principals d'eixida per avaluar l'efectivitat de les estratègies. Actualment, els gràfics són comuns entre els professionals, però només Breyfogle (2003) ha intentat distingir entre un canvi real significatiu i un a causa de la incertesa d'utilitzar mostres. Aquest treball desenvolupa encara més el mètode de Breyfogle per abordar les seues limitacions. El tercer objectiu és desenvolupar un mètode que, una vegada demostrada gràficament l'efectivitat de les estratègies, quantifique el seu impacte en el conjunt de KPI principals d'exida. El quart i l'últim mètode es centra en el diagnòstic del sistema de gestió de la qualitat per a revelar com funcionen les relacions entre els KPI interns (dins de l'empresa) i externs (relacionats amb el client) per millorar la satisfacció del client. L'aplicació dels quatre mètodes en la seqüència correcta constitueix una metodologia completa que es pot aplicar en qualsevol empresa de fabricació per millorar l'ús del quadre de comandament integral com a eina científica. No obstant això, els professionals poden optar per aplicar només un dels quatre mètodes o una combinació d'ells, ja que l'aplicació de cada un d'ells és independent i té els seus propis objectius i resultats. / [EN] The Balanced Scorecard (BSC) as a Performance Management Method (PMS) has been spread worldwide since Kaplan and Norton (1992) established its theoretical foundations. Kaplan (2009) claimed that the use of the BSC and especially turning strategies into actions was more an art than a science. The lack of evidence of the existence of such cause and effect relationships between Key Performance Indicators (KPIs) from different perspectives and the lack of robust methods to use it as a scientific tool were some of the causes of its problems. Kaplan placed the scientific community to confirm the foundations of the BSC theory and to develop methods for its use as a scientific tool. Several works have attempted to enhance the use of the balanced scorecard. Some methods use heuristic tools, which deal with qualitative variables. Some others use statistical methods and actual KPIs data, but applied to a specific period, which is a static vision and needing long-term samples and expertise resources to apply advanced analytic methods each time executives need to assess the impact of strategies. This thesis also tackles the lag between "input" and "output" variables. Moreover, there is a lack of works focused on the manufacturing environment, which is its main objective. The first objective of this work is to develop a methodology to assess and select the main output KPIs, which explains the performance of the whole company. It is taking the advantage of the relationships between variables from different dimensions described by Kaplan. This method also considers the potential lag between variables. The result is a set of main output KPIs, which summarizes the whole BSC, thus dramatically reducing its complexity. The second objective is to develop a graphical methodology that uses that set of main output KPIs to assess the effectiveness of strategies. Currently, KPIs charts are common among practitioners, but only Breyfogle (2003) has attempted to distinguish between a significant actual change in the metrics and a change due to the uncertainty of using samples. This work further develops Breyfogle's method to tackle its limitations. The third objective is to develop a method that, once the effectiveness of those strategies and actions have been proved graphically, quantifies their impact on the set of main output KPIs. The ultimate goal was to develop a method that, using data analytics, will focus on the diagnosis of the quality management system to reveal how it works in terms of the relationships between internal (within the company) and external (costumer-related) KPIs to improve customer satisfaction. The application of the four methods in the right sequence makes up a comprehensive methodology that can be applied in any manufacturing company to enhance the use of the balanced scorecard as a scientific tool. However, professionals may choose to apply only one of the four methods or a combination of them, since the application of each of them is independent and has its own objectives and results. / Sánchez Márquez, R. (2019). Development of systemic methods to improve management techniques based on Balanced Scorecard in Manufacturing Environment [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/134022 / Compendio Balanced Scorecard Manufacturing Key Performance Indicators Data Analytics Management Engineering ORGANIZACION DE EMPRESAS
140	DataOps : Towards Understanding and Defining Data Analytics Approach Mainali, Kiran January 2020 (has links) Data collection and analysis approaches have changed drastically in the past few years. The reason behind adopting different approach is improved data availability and continuous change in analysis requirements. Data have been always there, but data management is vital nowadays due to rapid generation and availability of various formats. Big data has opened the possibility of dealing with potentially infinite amounts of data with numerous formats in a short time. The data analytics is becoming complex due to data characteristics, sophisticated tools and technologies, changing business needs, varied interests among stakeholders, and lack of a standardized process. DataOps is an emerging approach advocated by data practitioners to cater to the challenges in data analytics projects. Data analytics projects differ from software engineering in many aspects. DevOps is proven to be an efficient and practical approach to deliver the project in the Software Industry. However, DataOps is still in its infancy, being recognized as an independent and essential task data analytics. In this thesis paper, we uncover DataOps as a methodology to implement data pipelines by conducting a systematic search of research papers. As a result, we define DataOps outlining ambiguities and challenges. We also explore the coverage of DataOps to different stages of the data lifecycle. We created comparison matrixes of different tools and technologies categorizing them in different functional groups to demonstrate their usage in data lifecycle management. We followed DataOps implementation guidelines to implement data pipeline using Apache Airflow as workflow orchestrator inside Docker and compared with simple manual execution of a data analytics project. As per evaluation, the data pipeline with DataOps provided automation in task execution, orchestration in execution environment, testing and monitoring, communication and collaboration, and reduced end-to-end product delivery cycle time along with the reduction in pipeline execution time. / Datainsamling och analysmetoder har förändrats drastiskt under de senaste åren. Anledningen till ett annat tillvägagångssätt är förbättrad datatillgänglighet och kontinuerlig förändring av analyskraven. Data har alltid funnits, men datahantering är viktig idag på grund av snabb generering och tillgänglighet av olika format. Big data har öppnat möjligheten att hantera potentiellt oändliga mängder data med många format på kort tid. Dataanalysen blir komplex på grund av dataegenskaper, sofistikerade verktyg och teknologier, förändrade affärsbehov, olika intressen bland intressenter och brist på en standardiserad process. DataOps är en framväxande strategi som förespråkas av datautövare för att tillgodose utmaningarna i dataanalysprojekt. Dataanalysprojekt skiljer sig från programvaruteknik i många aspekter. DevOps har visat sig vara ett effektivt och praktiskt tillvägagångssätt för att leverera projektet i mjukvaruindustrin. DataOps är dock fortfarande i sin linda och erkänns som en oberoende och viktig uppgiftsanalys. I detta examensarbete avslöjar vi DataOps som en metod för att implementera datarörledningar genom att göra en systematisk sökning av forskningspapper. Som ett resultat definierar vi DataOps som beskriver tvetydigheter och utmaningar. Vi undersöker också täckningen av DataOps till olika stadier av datalivscykeln. Vi skapade jämförelsesmatriser med olika verktyg och teknologier som kategoriserade dem i olika funktionella grupper för att visa hur de används i datalivscykelhantering. Vi följde riktlinjerna för implementering av DataOps för att implementera datapipeline med Apache Airflow som arbetsflödesorkestrator i Docker och jämfört med enkel manuell körning av ett dataanalysprojekt. Enligt utvärderingen tillhandahöll datapipelinen med DataOps automatisering i uppgiftskörning, orkestrering i exekveringsmiljö, testning och övervakning, kommunikation och samarbete, och minskad leveranscykeltid från slut till produkt tillsammans med minskningen av tid för rörledningskörning. DataOps Data lifecycle Data analytics DataOps pipeline Data pipeline DataOps tools and technologies DataOps pipeline DataOps Data lifecycle Data analytics DataOps pipeline Data pipeline DataOps tools and technology DataOps pipeline Computer and Information Sciences Data- och informationsvetenskap

Search results