Global ETD Search

241	Driving Innovation through Big Open Linked Data (BOLD): Exploring Antecedents using Interpretive Structural Modelling Dwivedi, Y.K., Janssen, M., Slade, E.L., Rana, Nripendra P., Weerakkody, Vishanth J.P., Millard, J., Hidders, J., Snijders, D. 07 2016 (has links) Yes / Innovation is vital to find new solutions to problems, increase quality, and improve profitability. Big open linked data (BOLD) is a fledgling and rapidly evolving field that creates new opportunities for innovation. However, none of the existing literature has yet considered the interrelationships between antecedents of innovation through BOLD. This research contributes to knowledge building through utilising interpretive structural modelling to organise nineteen factors linked to innovation using BOLD identified by experts in the field. The findings show that almost all the variables fall within the linkage cluster, thus having high driving and dependence powers, demonstrating the volatility of the process. It was also found that technical infrastructure, data quality, and external pressure form the fundamental foundations for innovation through BOLD. Deriving a framework to encourage and manage innovation through BOLD offers important theoretical and practical contributions. Big data Open data Linked data Innovation Interpretive structural modelling
242	Decision analysis in Turkey Gonul, M.S., Soyer, E., Onkal, Dilek 05 1900 (has links) No Decision analysis Turkey DA community Decision making Big data Modeling
243	Big Data Analytics and Business Failures in Data-Rich Environments: An Organizing Framework Amankwah-Amoah, J., Adomako, Samuel 2018 December 1924 (has links) Yes / In view of the burgeoning scholarly works on big data and big data analytical capabilities, there remains limited research on how different access to big data and different big data analytic capabilities possessed by firms can generate diverse conditions leading to business failure. To fill this gap in the existing literature, an integrated framework was developed that entailed two approaches to big data as an asset (i.e. threshold resource and distinctive resource) and two types of competences in big data analytics (i.e. threshold competence and distinctive/core competence). The analysis provides insights into how ordinary big data analytic capability and mere possession of big data are more likely to create conditions for business failure. The study extends the existing streams of research by shedding light on decisions and processes in facilitating or hampering firms’ ability to harness big data to mitigate the cause of business failures. The analysis led to the categorization of a number of fruitful avenues for research on data-driven approaches to business failure. Big data analytics Technology Innovation management Business failure
244	Influencing subjective well-being for business and sustainable development using big data and predictive regression analysis Weerakkody, Vishanth J.P., Sivarajah, Uthayasankar, Mahroof, Kamran, Maruyama, Takao, Lu, Shan 21 August 2020 (has links) Yes / Business leaders and policymakers within service economies are placing greater emphasis on well-being, given the role of workers in such settings. Whilst people’s well-being can lead to economic growth, it can also have the opposite effect if overlooked. Therefore, enhancing subjective well-being (SWB) is pertinent for all organisations for the sustainable development of an economy. While health conditions were previously deemed the most reliable predictors, the availability of data on people’s personal lifestyles now offers a new dimension into well-being for organisations. Using open data available from the national Annual Population Survey in the UK, which measures SWB, this research uncovered that among several independent variables to predict varying levels of people's perceived well-being, long-term health conditions, one's marital status, and age played a key role in SWB. The proposed model provides the key indicators of measuring SWB for organisations using big data. Big data Regression Well-being UN SDG goals Public sector
245	Ranking online consumer reviews Saumya, S., Singh, J.P., Baabdullah, A.M., Rana, Nripendra P., Dwivedi, Y.K. 26 September 2020 (has links) Yes / Product reviews are posted online by the hundreds and thousands for popular products. Handling such a large volume of continuously generated online content is a challenging task for buyers, sellers and researchers. The purpose of this study is to rank the overwhelming number of reviews using their predicted helpfulness scores. The helpfulness score is predicted using features extracted from review text, product description, and customer question-answer data of a product using the random-forest classifier and gradient boosting regressor. The system classifies reviews into low or high quality with the random-forest classifier. The helpfulness scores of the high-quality reviews are only predicted using the gradient boosting regressor. The helpfulness scores of the low-quality reviews are not calculated because they are never going to be in the top k reviews. They are just added at the end of the review list to the review-listing website. The proposed system provides fair review placement on review listing pages and makes all high-quality reviews visible to customers on the top. The experimental results on data from two popular Indian e-commerce websites validate our claim, as 3–4 newer high-quality reviews are placed in the top ten reviews along with 5–6 older reviews based on review helpfulness. Our findings indicate that inclusion of features from product description data and customer question-answer data improves the prediction accuracy of the helpfulness score. / Ministry of Electronics and Information Technology (MeitY), Government of India for financial support during research work through “Visvesvaraya PhD Scheme for Electronics and IT”. Big data challenge E-commerce Helpfulness Machine learning Online reviews
246	An integrated artificial intelligence framework for knowledge creation and B2B marketing rational decision making for improving firm performance Bag, S., Gupta, S., Kumar, A., Sivarajah, Uthayasankar 23 December 2020 (has links) Yes / This study examines the effect of big data powered artificial intelligence on customer knowledge creation, user knowledge creation and external market knowledge creation to better understand its impact on B2B marketing rational decision making to influence firm performance. The theoretical model is grounded in Knowledge Management Theory (KMT) and the primary data was collected from B2B companies functioning in the South African mining industry. Findings point out that big data powered artificial intelligence and the path customer knowledge creation is significant. Secondly, big data powered artificial intelligence and the path user knowledge creation is significant. Thirdly, big data powered artificial intelligence and the path external market knowledge creation is significant. It was observed that customer knowledge creation, user knowledge creation and external market knowledge creation have significant effect on the B2B marketing-rational decision making. Finally, the path B2B marketing rational decision making has a significant effect on firm performance. Artificial intelligence Big data Knowledge management B2B marketing Firm performance
247	Co-creating social licence for sharing health and care data Fylan, F., Fylan, Beth 25 March 2021 (has links) Yes / Optimising the use of patient data has the potential to produce a transformational change in healthcare planning, treatment, condition prevention and understanding disease progression. Establishing how people's trust could be secured and a social licence to share data could be achieved is of paramount importance. The study took place across Yorkshire and the Humber, in the North of the England, using a sequential mixed methods approach comprising focus groups, surveys and co-design groups. Twelve focus groups explored people's response to how their health and social care data is, could, and should be used. A survey examined who should be able to see health and care records, acceptable uses of anonymous health and care records, and trust in different organisations. Case study cards addressed willingness for data to be used for different purposes. Co-creation workshops produced a set of guidelines for how data should be used. Focus group participants (n = 80) supported sharing health and care data for direct care and were surprised that this is not already happening. They discussed concerns about the currency and accuracy of their records and possible stigma associated with certain diagnoses, such as mental health conditions. They were less supportive of social care access to their records. They discussed three main concerns about their data being used for research or service planning: being identified; security limitations; and the potential rationing of care on the basis of information in their record such as their lifestyle choices. Survey respondents (n = 1031) agreed that their GP (98 %) and hospital doctors and nurses (93 %) should be able to see their health and care records. There was more limited support for pharmacists (37 %), care staff (36 %), social workers (24 %) and researchers (24 %). Respondents thought their health and social care records should be used to help plan services (88 %), to help people stay healthy (67 %), to help find cures for diseases (67 %), for research for the public good (58 %), but only 16 % for commercial research. Co-creation groups developed a set of principles for a social licence for data sharing based around good governance, effective processes, the type of organisation, and the ability to opt in and out. People support their data being shared for a range of purposes and co-designed a set of principles that would secure their trust and consent to data sharing. / This work was supported by Humber Teaching NHS Foundation Trust and the National Institute for Health Research (NIHR) Yorkshire and Humber Patient Safety Translational Research Centre (NIHR Yorkshire and Humber PSTRC). Patient Big data Health records Barriers Co-design
248	Traitement et raisonnement distribués des flux RDF / Distributed RDF stream processing and reasoning Ren, Xiangnan 19 November 2018 (has links) Le traitement en temps réel des flux de données émanant des capteurs est devenu une tâche courante dans de nombreux scénarios industriels. Dans le contexte de l'Internet des objets (IoT), les données sont émises par des sources de flux hétérogènes, c'est-à-dire provenant de domaines et de modèles de données différents. Cela impose aux applications de l'IoT de gérer efficacement l'intégration de données à partir de ressources diverses. Le traitement des flux RDF est dès lors devenu un domaine de recherche important. Cette démarche basée sur des technologies du Web Sémantique supporte actuellement de nombreuses applications innovantes où les notions de temps réel et de raisonnement sont prépondérantes. La recherche présentée dans ce manuscrit s'attaque à ce type d'application. En particulier, elle a pour objectif de gérer efficacement les flux de données massifs entrants et à avoir des services avancés d’analyse de données, e.g., la détection d’anomalie. Cependant, un moteur de RDF Stream Processing (RSP) moderne doit prendre en compte les caractéristiques de volume et de vitesse rencontrées à l'ère du Big Data. Dans un projet industriel d'envergure, nous avons découvert qu'un moteur de traitement de flux disponible 24/7 est généralement confronté à un volume de données massives, avec des changements dynamiques de la structure des données et les caractéristiques de la charge du système. Pour résoudre ces problèmes, nous proposons Strider, un moteur de traitement de flux RDF distribué, hybride et adaptatif qui optimise le plan de requête logique selon l’état des flux de données. Strider a été conçu pour garantir d'importantes propriétés industrielles telles que l'évolutivité, la haute disponibilité, la tolérance aux pannes, le haut débit et une latence acceptable. Ces garanties sont obtenues en concevant l'architecture du moteur avec des composants actuellement incontournables du Big Data: Apache Spark et Apache Kafka. De plus, un nombre croissant de traitements exécutés sur des moteurs RSP nécessitent des mécanismes de raisonnement. Ils se traduisent généralement par un compromis entre le débit de données, la latence et le coût computationnel des inférences. Par conséquent, nous avons étendu Strider pour prendre en charge la capacité de raisonnement en temps réel avec un support d'expressivité d'ontologies en RDFS + (i.e., RDFS + owl:sameAs). Nous combinons Strider avec une approche de réécriture de requêtes pour SPARQL qui bénéficie d'un encodage intelligent pour les bases de connaissances. Le système est évalué selon différentes dimensions et sur plusieurs jeux de données, pour mettre en évidence ses performances. Enfin, nous avons exploré le raisonnement du flux RDF dans un contexte d'ontologies exprimés avec un fragment d'ASP (Answer Set Programming). La considération de cette problématique de recherche est principalement motivée par le fait que de plus en plus d'applications de streaming nécessitent des tâches de raisonnement plus expressives et complexes. Le défi principal consiste à gérer les dimensions de débit et de latence avec des méthologies efficaces. Les efforts récents dans ce domaine ne considèrent pas l'aspect de passage à l'échelle du système pour le raisonnement des flux. Ainsi, nous visons à explorer la capacité des systèmes distribuées modernes à traiter des requêtes d'inférence hautement expressive sur des flux de données volumineux. Nous considérons les requêtes exprimées dans un fragment positif de LARS (un cadre logique temporel basé sur Answer Set Programming) et proposons des solutions pour traiter ces requêtes, basées sur les deux principaux modèles d’exécution adoptés par les principaux systèmes distribuées: Bulk Synchronous Parallel (BSP) et Record-at-A-Time (RAT). Nous mettons en œuvre notre solution nommée BigSR et effectuons une série d’évaluations. Nos expériences montrent que BigSR atteint un débit élevé au-delà du million de triplets par seconde en utilisant un petit groupe de machines / Real-time processing of data streams emanating from sensors is becoming a common task in industrial scenarios. In an Internet of Things (IoT) context, data are emitted from heterogeneous stream sources, i.e., coming from different domains and data models. This requires that IoT applications efficiently handle data integration mechanisms. The processing of RDF data streams hence became an important research field. This trend enables a wide range of innovative applications where the real-time and reasoning aspects are pervasive. The key implementation goal of such application consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. However, a modern RSP engine has to address volume and velocity characteristics encountered in the Big Data era. In an on-going industrial project, we found out that a 24/7 available stream processing engine usually faces massive data volume, dynamically changing data structure and workload characteristics. These facts impact the engine's performance and reliability. To address these issues, we propose Strider, a hybrid adaptive distributed RDF Stream Processing engine that optimizes logical query plan according to the state of data streams. Strider has been designed to guarantee important industrial properties such as scalability, high availability, fault-tolerant, high throughput and acceptable latency. These guarantees are obtained by designing the engine's architecture with state-of-the-art Apache components such as Spark and Kafka. Moreover, an increasing number of processing jobs executed over RSP engines are requiring reasoning mechanisms. It usually comes at the cost of finding a trade-off between data throughput, latency and the computational cost of expressive inferences. Therefore, we extend Strider to support real-time RDFS+ (i.e., RDFS + owl:sameAs) reasoning capability. We combine Strider with a query rewriting approach for SPARQL that benefits from an intelligent encoding of knowledge base. The system is evaluated along different dimensions and over multiple datasets to emphasize its performance. Finally, we have stepped further to exploratory RDF stream reasoning with a fragment of Answer Set Programming. This part of our research work is mainly motivated by the fact that more and more streaming applications require more expressive and complex reasoning tasks. The main challenge is to cope with the large volume and high-velocity dimensions in a scalable and inference-enabled manner. Recent efforts in this area still missing the aspect of system scalability for stream reasoning. Thus, we aim to explore the ability of modern distributed computing frameworks to process highly expressive knowledge inference queries over Big Data streams. To do so, we consider queries expressed as a positive fragment of LARS (a temporal logic framework based on Answer Set Programming) and propose solutions to process such queries, based on the two main execution models adopted by major parallel and distributed execution frameworks: Bulk Synchronous Parallel (BSP) and Record-at-A-Time (RAT). We implement our solution named BigSR and conduct a series of evaluations. Our experiments show that BigSR achieves high throughput beyond million-triples per second using a rather small cluster of machines Big Data Web Sémantique Sparql Système Distribué Traitement de Flux Big Data Web semantic Sparql Distributed System Stream Processing
249	Critérios de seleção de sistemas de gerenciamento de banco de dados não relacionais em organizações privadas / Selection criteria of non-relational database management systems data in private organizations Souza, Alexandre Morais de 31 October 2013 (has links) Sistemas de Gerenciamento de Banco de Dados Não Relacionais (SGBDs NoSQL) são pacotes de software para gerenciamento de dados utilizando um modelo não relacional. Dado o atual contexto de crescimento na geração de dados e a necessidade que as organizações possuem em coletar grande quantidade de informações de clientes, pesquisas científicas, vendas e outras informações para análises futuras, é importante repensar a forma de se definir um SGBD adequado levando em consideração fatores econômicos, técnicos e estratégicos da organização. Esta é uma pesquisa relacionada com o estudo do novo modelo de gerenciamento de banco de dados, conhecido como NoSQL e traz como contribuição apresentar critérios de seleção para auxiliar consumidores de serviços de banco de dados, em organizações privadas, a selecionar um SGBD NoSQL. Para atender a este objetivo foi realizada revisão da literatura com levantamento bibliográfico sobre processo de seleção de software e de SGBDs, levantando critérios utilizados para este fim. Feito o levantamento bibliográfico, definiu-se o método de pesquisa como sendo a aplicação de um Painel Delphi, na modalidade ranking form. Por meio do painel foi possível determinar, após a realização de duas rodadas e participando um grupo de especialistas misto formado por gerentes, fornecedores de SGBD, acadêmicos, desenvolvedores e DBAs e DAs, os critérios mais relevantes para a escolha de um SGBD NoSQL, ordenados conforme pontuação obtida para cada critério. Os dados foram coletados por meio de questionário. A partir dos critérios identificados, foram feitas análises sobre os principais critérios de seleção de SGBDs NoSQL. Posteriormente, as conclusões e considerações finais contemplaram a análise dos resultados obtidos com o Painel Delphi. Como principal resultado alcançado, este estudo oferece uma visão realística acerca do modelo não relacional para gerenciamento de dados e apresenta os critérios mais importantes que indicam plausível a adoção de SGBDs NoSQL. / Database Management Systems Not Relational (NoSQL DBMSs) are software packages for data management using a non-relational model. Given the current context of growth in data generation and the need that organizations have to collect vast amount of customer information, scientific research, sales and other information for further analysis, it is important to rethink how to define a suitable DBMS considering economic, technical and strategic organization. This research is concerned with the study of the new management model database, known as NoSQL, and brings the present contribution selection criteria to assist service consumers Database, private organizations, to select a NoSQL DBMS. To satisfy this objective was reviewed the literature with bibliographic on software selection process and DBMSs, identifying criteria used for this purpose. After completion of the literature, was defined the search method with application of a Delphi panel, by the ranking form mode. Through the panel could be determined, after the completion of two rounds and attending a mixed group of experts formed by managers, DBMS vendors, academics, developers, DBAs and DAs, the most relevant criteria for choosing a NoSQL DBMS, ordered according score for each criteria. Data were collected through a survey. From the identified criteria, analyzes were made on the main selection criteria of NoSQL DBMSs. Subsequently, the conclusions and final considerations were made with analysis of the results obtained with the Delphi panel. The main result achieved, this study offers a realistic view about the non-relational model for managing data and presents the most important criteria that indicate plausible the adoption of NoSQL DBMSs. Banco de dados Big data Big data Database Delphi technique Information technology NoSQL DBMSs SGBDs NoSQL Técnica delphi Tecnologia da informação
250	Data Warehouses na era do Big Data: processamento eficiente de Junções Estrela no Hadoop / Data Warehouses na era do Big Data: processamento eficiente de Junções Estrela no Hadoop Brito, Jaqueline Joice 12 December 2017 (has links) The era of Big Data is here: the combination of unprecedented amounts of data collected every day with the promotion of open source solutions for massively parallel processing has shifted the industry in the direction of data-driven solutions. From recommendation systems that help you find your next significant one to the dawn of self-driving cars, Cloud Computing has enabled companies of all sizes and areas to achieve their full potential with minimal overhead. In particular, the use of these technologies for Data Warehousing applications has decreased costs greatly and provided remarkable scalability, empowering business-oriented applications such as Online Analytical Processing (OLAP). One of the most essential primitives in Data Warehouses are the Star Joins, i.e. joins of a central table with satellite dimensions. As the volume of the database scales, Star Joins become unpractical and may seriously limit applications. In this thesis, we proposed specialized solutions to optimize the processing of Star Joins. To achieve this, we used the Hadoop software family on a cluster of 21 nodes. We showed that the primary bottleneck in the computation of Star Joins on Hadoop lies in the excessive disk spill and overhead due to network communication. To mitigate these negative effects, we proposed two solutions based on a combination of the Spark framework with either Bloom filters or the Broadcast technique. This reduced the computation time by at least 38%. Furthermore, we showed that the use of full scan may significantly hinder the performance of queries with low selectivity. Thus, we proposed a distributed Bitmap Join Index that can be processed as a secondary index with loose-binding and can be used with random access in the Hadoop Distributed File System (HDFS). We also implemented three versions (one in MapReduce and two in Spark) of our processing algorithm that uses the distributed index, which reduced the total computation time up to 88% for Star Joins with low selectivity from the Star Schema Benchmark (SSB). Because, ideally, the system should be able to perform both random access and full scan, our solution was designed to rely on a two-layer architecture that is framework-agnostic and enables the use of a query optimizer to select which approaches should be used as a function of the query. Due to the ubiquity of joins as primitive queries, our solutions are likely to fit a broad range of applications. Our contributions not only leverage the strengths of massively parallel frameworks but also exploit more efficient access methods to provide scalable and robust solutions to Star Joins with a significant drop in total computation time. / A era do Big Data chegou: a combinação entre o volume dados coletados diarimente com o surgimento de soluções de código aberto para o processamento massivo de dados mudou para sempre a indústria. De sistemas de recomendação que assistem às pessoas a encontrarem seus pares românticos à criação de carros auto-dirigidos, a Computação em Nuvem permitiu que empresas de todos os tamanhos e áreas alcançassem o seu pleno potencial com custos reduzidos. Em particular, o uso dessas tecnologias em aplicações de Data Warehousing reduziu custos e proporcionou alta escalabilidade para aplicações orientadas a negócios, como em processamento on-line analítico (Online Analytical Processing- OLAP). Junções Estrelas são das primitivas mais essenciais em Data Warehouses, ou seja, consultas que realizam a junções de tabelas de fato com tabelas de dimensões. Conforme o volume de dados aumenta, Junções Estrela tornam-se custosas e podem limitar o desempenho das aplicações. Nesta tese são propostas soluções especializadas para otimizar o processamento de Junções Estrela. Para isso, utilizamos a família de software Hadoop em um cluster de 21 nós. Nós mostramos que o gargalo primário na computação de Junções Estrelas no Hadoop reside no excesso de operações escrita do disco (disk spill) e na sobrecarga da rede devido a comunicação excessiva entre os nós. Para reduzir estes efeitos negativos, são propostas duas soluções em Spark baseadas nas técnicas Bloom filters ou Broadcast, reduzindo o tempo total de computação em pelo menos 38%. Além disso, mostramos que a realização de uma leitura completa das tables (full table scan) pode prejudicar significativamente o desempenho de consultas com baixa seletividade. Assim, nós propomos um Índice Bitmap de Junção distribuído que é implementado como um índice secundário que pode ser combinado com acesso aleatório no Hadoop Distributed File System (HDFS). Nós implementamos três versões (uma em MapReduce e duas em Spark) do nosso algoritmo de processamento baseado nesse índice distribuído, os quais reduziram o tempo de computação em até 77% para Junções Estrelas de baixa seletividade do Star Schema Benchmark (SSB). Como idealmente o sistema deve ser capaz de executar tanto acesso aleatório quanto full scan, nós também propusemos uma arquitetura genérica que permite a inserção de um otimizador de consultas capaz de selecionar quais abordagens devem ser usadas dependendo da consulta. Devido ao fato de consultas de junção serem frequentes, nossas soluções são pertinentes a uma ampla gama de aplicações. A contribuições desta tese não só fortalecem o uso de frameworks de processamento de código aberto, como também exploram métodos mais eficientes de acesso aos dados para promover uma melhora significativa no desempenho Junções Estrela. Big Data Big Data Cloud Computing Computação em Nuvem Data Warehouse Data Warehouse Hadoop Hadoop Junção Estrela Star Join

Search results