Global ETD Search

31	Fostering collaboration amongst business intelligence, business decision makers and statisticians for the optimal use of big data in marketing strategies De Koker, Louise January 2019 (has links) Philosophiae Doctor - PhD / The aim of this study was to propose a model of collaboration adaptable for the optimal use of big data in an organisational environment. There is a paucity of knowledge on such collaboration and the research addressed this gap. More specifically, the research attempted to establish whether leadership, trust and knowledge sharing influence collaboration among the stakeholders identified at large organisations. The conceptual framework underlying this research was informed by collaboration theory and organisational theory. It was assumed that effective collaboration in the optimal use of big data possibly is associated with leadership, knowledge sharing and trust. These concepts were scientifically hypothesised to determine whether such associations exist within the context of big data. The study used a mixed methods approach, combining a qualitative with a quantitative study. The qualitative study was in the form of in-depth interviews with senior managers from different business units at a retail organisation in Cape Town. The quantitative study was an online survey conducted with senior marketing personnel at JSE-listed companies from various industries in Cape Town. A triangulation methodology was adopted, with additional in-depth interviews of big data and analytics experts from both South Africa and abroad, to strengthen the research. The findings of the research indicate the changing role of the statistician in the era of big data and the new discipline of data science. They also confirm the importance of leadership, trust and knowledge sharing in ensuring effective collaboration. Of the three hypotheses tested, two were confirmed. Collaboration has been applied in many areas. Unexpected findings of the research were the role the chief data officer plays in fostering collaboration among stakeholders in the optimal use of big data in marketing strategies, as well as the importance of organisational structure and culture in effective collaboration in the context of big data and data science in large organisations. The research has contributed to knowledge by extending the theory of collaboration to the domain of big data in the organisational context, with the proposal of an integrated model of collaboration in the context of big data. This model was grounded in the data collected from various sources, establishing the crucial new role of the chief data officer as part of the executive leadership and main facilitator of collaboration in the organisation. Collaboration among the specified stakeholders, led by the chief data officer, occurs both horizontally with peers and vertically with specialists at different levels within the organisation in the proposed model. The application of such a model of collaboration should facilitate the successful outcome of the collaborative efforts in data science in the form of financial benefits to the organisation through the optimal use of big data. Big data Data science Collaboration Leadership Knowledge sharing
32	Measuring metadata quality Király, Péter 24 June 2019 (has links) No description available. 000 metadata cultural heritage data science Big Data
33	Is operational research in UK universities fit-for-purpose for the growing field of analytics? Mortenson, Michael J. January 2018 (has links) Over the last decade considerable interest has been generated into the use of analytical methods in organisations. Along with this, many have reported a significant gap between organisational demand for analytical-trained staff, and the number of potential recruits qualified for such roles. This interest is of high relevance to the operational research discipline, both in terms of raising the profile of the field, as well as in the teaching and training of graduates to fill these roles. However, what is less clear, is the extent to which operational research teaching in universities, or indeed teaching on the various courses labelled as analytics , are offering a curriculum that can prepare graduates for these roles. It is within this space that this research is positioned, specifically seeking to analyse the suitability of current provisions, limited to master s education in UK universities, and to make recommendations on how curricula may be developed. To do so, a mixed methods research design, in the pragmatic tradition, is presented. This includes a variety of research instruments. Firstly, a computational literature review is presented on analytics, assessing (amongst other things) the amount of research into analytics from a range of disciplines. Secondly, a historical analysis is performed of the literature regarding elements that can be seen as the pre-cursor of analytics, such as management information systems, decision support systems and business intelligence. Thirdly, an analysis of job adverts is included, utilising an online topic model and correlations analyses. Fourthly, online materials from UK universities concerning relevant degrees are analysed using a bagged support vector classifier and a bespoke module analysis algorithm. Finally, interviews with both potential employers of graduates, and also academics involved in analytics courses, are presented. The results of these separate analyses are synthesised and contrasted. The outcome of this is an assessment of the current state of the market, some reflections on the role operational research make have, and a framework for the development of analytics curricula. The principal contribution of this work is practical; providing tangible recommendations on curricula design and development, as well as to the operational research community in general in respect to how it may react to the growth of analytics. Additional contributions are made in respect to methodology, with a novel, mixed-method approach employed, and to theory, with insights as to the nature of how trends develop in both the jobs market and in academia. It is hoped that the insights here, may be of value to course designers seeking to react to similar trends in a wide range of disciplines and fields.
34	A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure Suthakar, Uthayanath January 2017 (has links) Monitoring data-intensive scientific infrastructures in real-time such as jobs, data transfers, and hardware failures is vital for efficient operation. Due to the high volume and velocity of events that are produced, traditional methods are no longer optimal. Several techniques, as well as enabling architectures, are available to support the Big Data issue. In this respect, this thesis complements existing survey work by contributing an extensive literature review of both traditional and emerging Big Data architecture. Scalability, low-latency, fault-tolerance, and intelligence are key challenges of the traditional architecture. However, Big Data technologies and approaches have become increasingly popular for use cases that demand the use of scalable, data intensive processing (parallel), and fault-tolerance (data replication) and support for low-latency computations. In the context of a scalable data store and analytics platform for monitoring data-intensive scientific infrastructure, Lambda Architecture was adapted and evaluated on the Worldwide LHC Computing Grid, which has been proven effective. This is especially true for computationally and data-intensive use cases. In this thesis, an efficient strategy for the collection and storage of large volumes of data for computation is presented. By moving the transformation logic out from the data pipeline and moving to analytics layers, it simplifies the architecture and overall process. Time utilised is reduced, untampered raw data are kept at storage level for fault-tolerance, and the required transformation can be done when needed. An optimised Lambda Architecture (OLA), which involved modelling an efficient way of joining batch layer and streaming layer with minimum code duplications in order to support scalability, low-latency, and fault-tolerance is presented. A few models were evaluated; pure streaming layer, pure batch layer and the combination of both batch and streaming layers. Experimental results demonstrate that OLA performed better than the traditional architecture as well the Lambda Architecture. The OLA was also enhanced by adding an intelligence layer for predicting data access pattern. The intelligence layer actively adapts and updates the model built by the batch layer, which eliminates the re-training time while providing a high level of accuracy using the Deep Learning technique. The fundamental contribution to knowledge is a scalable, low-latency, fault-tolerant, intelligent, and heterogeneous-based architecture for monitoring a data-intensive scientific infrastructure, that can benefit from Big Data, technologies and approaches.
35	Data-driven modelling for demand response from large consumer energy assets Krishnadas, Gautham January 2018 (has links) Demand response (DR) is one of the integral mechanisms of today's smart grids. It enables consumer energy assets such as flexible loads, standby generators and storage systems to add value to the grid by providing cost-effective flexibility. With increasing renewable generation and impending electric vehicle deployment, there is a critical need for large volumes of reliable and responsive flexibility through DR. This poses a new challenge for the electricity sector. Smart grid development has resulted in the availability of large amounts of data from different physical segments of the grid such as generation, transmission, distribution and consumption. For instance, smart meter data carrying valuable information is increasingly available from the consumers. Parallel to this, the domain of data analytics and machine learning (ML) is making immense progress. Data-driven modelling based on ML algorithms offers new opportunities to utilise the smart grid data and address the DR challenge. The thesis demonstrates the use of data-driven models for enhancing DR from large consumers such as commercial and industrial (C&I) buildings. A reliable, computationally efficient, cost-effective and deployable data-driven model is developed for large consumer building load estimation. The selection of data pre-processing and model development methods are guided by these design criteria. Based on this model, DR operational tasks such as capacity scheduling, performance evaluation and reliable operation are demonstrated for consumer energy assets such as flexible loads, standby generators and storage systems. Case studies are designed based on the frameworks of ongoing DR programs in different electricity markets. In these contexts, data-driven modelling shows substantial improvement over the conventional models and promises more automation in DR operations. The thesis also conceptualises an emissions-based DR program based on emissions intensity data and consumer load flexibility to demonstrate the use of smart grid data in encouraging renewable energy consumption. Going forward, the thesis advocates data-informed thinking for utilising smart grid data towards solving problems faced by the electricity sector.
36	Analyzing collaboration with large-scale scholarly data Zuo, Zhiya 01 August 2019 (has links) We have never stopped in the pursuit of science. Standing on the shoulders of the giants, we gradually make our path to build a systematic and testable body of knowledge to explain and predict the universe. Emerging from researchers’ interactions and self-organizing behaviors, scientific communities feature intensive collaborative practice. Indeed, the era of lone genius has long gone. Teams have now dominated the production and diffusion of scientific ideas. In order to understand how collaboration shapes and evolves organizations as well as individuals’ careers, this dissertation conducts analyses at both macroscopic and microscopic levels utilizing large-scale scholarly data. As self-organizing behaviors, collaborations boil down to the interactions among researchers. Understanding collaboration at individual level, as a result, is in fact a preliminary and crucial step to better understand the collective outcome at group and organization level. To start, I investigate the role of research collaboration in researchers’ careers by leveraging person-organization fit theory. Specifically, I propose prospective social ties based on faculty candidates’ future collaboration potential with future colleagues, which manifests diminishing returns on the placement quality. Moving forward, I address the question of how individual success can be better understood and accurately predicted utilizing their collaboration experience data. Findings reveal potential regularities in career trajectories for early-stage, mid-career, and senior researchers, highlighting the importance of various aspects of social capital. With large-scale scholarly data, I propose a data-driven analytics approach that leads to a deeper understanding of collaboration for both organizations and individuals. Managerial and policy implications are discussed for organizations to stimulate interdisciplinary research and for individuals to achieve better placement as well as short and long term scientific impact. Additionally, while analyzed in the context of academia, the proposed methods and implications can be generalized to knowledge-intensive industries, where collaboration are key factors to performance such as innovation and creativity. collaboration data science organizational diversity social networks text mining Bioinformatics
37	Tools and theory to improve data analysis Grolemund, Garrett 24 July 2013 (has links) This thesis proposes a scientific model to explain the data analysis process. I argue that data analysis is primarily a procedure to build un- derstanding and as such, it dovetails with the cognitive processes of the human mind. Data analysis tasks closely resemble the cognitive process known as sensemaking. I demonstrate how data analysis is a sensemaking task adapted to use quantitative data. This identification highlights a uni- versal structure within data analysis activities and provides a foundation for a theory of data analysis. The model identifies two competing chal- lenges within data analysis: the need to make sense of information that we cannot know and the need to make sense of information that we can- not attend to. Classical statistics provides solutions to the first challenge, but has little to say about the second. However, managing attention is the primary obstacle when analyzing big data. I introduce three tools for managing attention during data analysis. Each tool is built upon a different method for managing attention. ggsubplot creates embedded plots, which transform data into a format that can be easily processed by the human mind. lubridate helps users automate sensemaking out- side of the mind by improving the way computers handle date-time data. Visual Inference Tools develop expertise in young statisticians that can later be used to efficiently direct attention. The insights of this thesis are especially helpful for consultants, applied statisticians, and teachers of data analysis.
38	Smart Urban Metabolism : Toward a New Understanding of Causalities in Cities Shahrokni, Hossein January 2015 (has links) For half a century, urban metabolism has been used to provide insights to support transitions to sustainable urban development (SUD). Internet and Communication Technology (ICT) has recently been recognized as a potential technology enabler to advance this transition. This thesis explored the potential for an ICT-enabled urban metabolism framework aimed at improving resource efficiency in urban areas by supporting decision-making processes. Three research objectives were identified: i) investigation of how the urban metabolism framework, aided by ICT, could be utilized to support decision-making processes; ii) development of an ICT platform that manages real-time, high spatial and temporal resolution urban metabolism data and evaluation of its implementation; and iii) identification of the potential for efficiency improvements through the use of resulting high spatial and temporal resolution urban metabolism data. The work to achieve these objectives was based on literature reviews, single-case study research in Stockholm, software engineering research, and big data analytics of resulting data. The evolved framework, Smart Urban Metabolism (SUM), enabled by the emerging context of smart cities, operates at higher temporal (up to real-time), and spatial (up to household/individual) data resolution. A key finding was that the new framework overcomes some of the barriers identified for the conventional urban metabolism framework. The results confirm that there are hidden urban patterns that may be uncovered by analyzing structured big urban data. Some of those patterns may lead to the identification of appropriate intervention measures for SUD. / <p>QC 20151120</p> / Smart City SRS industrial ecology urban metabolism smart cities big data data science
39	k-Nearest Neighbour Classification of Datasets with a Family of Distances Hatko, Stan January 2015 (has links) The k-nearest neighbour (k-NN) classifier is one of the oldest and most important supervised learning algorithms for classifying datasets. Traditionally the Euclidean norm is used as the distance for the k-NN classifier. In this thesis we investigate the use of alternative distances for the k-NN classifier. We start by introducing some background notions in statistical machine learning. We define the k-NN classifier and discuss Stone's theorem and the proof that k-NN is universally consistent on the normed space R^d. We then prove that k-NN is universally consistent if we take a sequence of random norms (that are independent of the sample and the query) from a family of norms that satisfies a particular boundedness condition. We extend this result by replacing norms with distances based on uniformly locally Lipschitz functions that satisfy certain conditions. We discuss the limitations of Stone's lemma and Stone's theorem, particularly with respect to quasinorms and adaptively choosing a distance for k-NN based on the labelled sample. We show the universal consistency of a two stage k-NN type classifier where we select the distance adaptively based on a split labelled sample and the query. We conclude by giving some examples of improvements of the accuracy of classifying various datasets using the above techniques. Machine Learning k-Nearest Neighbour Classifier Universal Consistency Data Science
40	Enabling statistical analysis of the main ionospheric trough with computer vision Starr, Gregory Walter Sidor 25 September 2021 (has links) The main ionospheric trough (MIT) is a key density feature in the mid-latitude ionosphere and characterizing its structure is important for understanding GPS radio signal scintillation and HF wave propagation. While a number of previous studies have statistically investigated the properties of the trough, they have only examined its latitudinal cross sections, and have not considered the instantaneous two-dimensional structure of the trough. In this work, we developed an automatic optimization-based method for identifying the trough in Total Electron Content (TEC) maps and quantified its agreement with the algorithm developed in (Aa et al., 2020). Using the newly developed method, we created a labeled dataset and statistically examined the two-dimensional structure of the trough. Specifically, we investigated how Kp affects the trough’s occurrence probability at different local times. At low Kp, the trough tends to form in the postmidnight sector, and with increasing Kp, the trough occurrence probability increases and shifts premidnight. We explore the possibility that this is due to increased occurrence of troughs formed by subauroral polarization streams (SAPS). Additionally, using SuperDARN convection maps and solar wind data, we characterized the MIT's dependence on the interplanetary magnetic field (IMF) clock angle. Electrical engineering Data science Ionosphere Machine learning Magnetosphere

Search results