Spelling suggestions: "subject:"bigdata"" "subject:"bølgedata""
551 |
An open health platform for the early detection of complex diseases: the case of breast cancerMOHAMMADHASSAN MOHAMMADI, MAX January 2015 (has links)
Complex diseases such as cancer, cardiovascular diseases and diabetes are often diagnosed too late, which significantly impairs treatment options and, in turn, lowers patient’s survival rate drastically and increases the costs significantly. Moreover, the growth of medical data is faster than the ability of healthcare systems to utilize them. Almost 80% of medical data are unstructured, but they are clinically relevant. On the other hand, technological advancements have made it possible to create different igital health solutions where healthcare and ICT meet. Also, some individuals have already started to measure their body function parameters, track their health status, research their symptoms and even intervene in treatment options which means a great deal of data is being produced and also indicates that patient-driven health care models are transforming how health care functions. These models include quantified self-tracking, consumer-personalized-medicine and health social networks. This research aims to present an open innovation digital health platform which creates value y using the overlaps between healthcare, information technology and artificial intelligence. This platform could potentially be utilized for early detection of complex diseases by leveraging Big Data technology which could improve awareness by recognizing pooled symptoms of a specific disease. This would enable individuals to effortlessly and quantitatively track and become aware of changes in their health, and through a dialog with a doctor, achieve diagnosis at a significantly earlier stage. This thesis focuses on a case study of the platform for detecting breast cancer at a ignificantly earlier stage. A qualitative research method is implemented through reviewing the literature, determining the knowledge gap, evaluating the need, performing market research, developing a conceptual prototype and presenting the open innovation platform. Finally, the value creation, applications and challenges of such platform are investigated, analysed and discussed based on the collected data from interviews and surveys. This study combines an explanatory and an analytical research approach, as it aims not only to describe the case, but also to explain the value creation for different stakeholders in the value chain. The findings indicate that there is an urgent need for early diagnosis of complex diseases such as breast cancer) and also handling direct and indirect consequences of late diagnosis. A significant outcome of this research is the conceptual prototype which was developed based on the general proposed concept through a customer development process. According to the conducted surveys, 95% of the cancer patients and 84% of the healthy individuals are willing to use the proposed platform. The results indicate that it can create significant values for patients, doctors, academic institutions, hospitals and even healthy individuals.
|
552 |
The democratisation of decision-makers in data-driven decision-making in a Big Data environment: The case of a financial services organisation in South AfricaHassa, Ishmael January 2020 (has links)
Big Data refers to large unstructured datasets from multiple dissimilar sources. Using Big Data Analytics (BDA), insights can be gained that cannot be obtained by other means, allowing better decision-making. Big Data is disruptive, and because it is vast and complex, it is difficult to manage from technological, regulatory, and social perspectives. Big Data can provide decision-makers (knowledge workers) with bottom-up access to information for decision-making, thus providing potential benefits due to the democratisation of decision-makers in data-driven decision-making (DDD). The workforce is enabled to make better decisions, thereby improving participation and productivity. Enterprises that enable DDD are more successful than firms that are solely dependent on management's perception and intuition. Understanding the links between key concepts (Big Data, democratisation, and DDD) and decision-makers are important, because the use of Big Data is growing, the workforce is continually evolving, and effective decision-making based on Big Data insights is critical to a firm's competitiveness. This research investigates the influence of Big Data on the democratisation of decision-makers in data-driven decision-making. A Grounded Theory Method (GTM) was adopted due to the scarcity of literature around the interrelationships between the key concepts. An empirical study was undertaken, based on a case study of a large and leading financial services organisation in South Africa. The case study participants were diverse and represented three different departments. GTM facilitates emergence of novel theory that is grounded in empirical data. Theoretical elaboration of new concepts with existing literature permits the comparison of the emergent or substantive theory for similarities, differences, and uniqueness. By applying the GTM principles of constant comparison, theoretical sampling and emergence, decision-makers (people, knowledge workers) became the focal point of study rather than organisational decision-making processes or decision support systems. The concentrate of the thesis is therefore on the democratisation of decision-makers in a Big Data environment. The findings suggest that the influence of Big Data on the democratisation of the decisionmaker in relation to DDD is dependent on the completeness and quality of the Information Systems (IS) artefact. The IS artefact results from, and is comprised of, information that is extracted from Big Data through Big Data Analytics (BDA) and decision-making indicators (DMI). DMI are contributions of valuable decision-making parameters by actors that include Big Data, People, The Organisation, and Organisational Structures. DMI is an aspect of knowledge management as it contains both the story behind the decision and the knowledge that was used to decide. The IS artefact is intended to provide a better and more complete picture of the decision-making landscape, which adds to the confidence of decision-makers and promotes participation in DDD which, in turn, exemplifies democratisation of the decisionmaker. Therefore, the main theoretical contribution is that the democratisation of the decisionmaker in DDD is based on the completeness of the IS artefact, which is assessed within the democratisation inflection point (DIP). The DIP is the point at which the decision-maker evaluates the IS artefact. When the IS artefact is complete, meaning that all the parameters that are pertinent to a decision for specific information is available, then democratisation of the decision-maker is realised. When the IS artefact is incomplete, meaning that all the parameters that are pertinent to a decision for specific information is unavailable, then democratisation of the decision-maker breaks down. The research contributes new knowledge in the form of a substantive theory, grounded in empirical findings, to the academic field of IS. The IS artefact constitutes a contribution to practice: it highlights the importance of interrelationships and contributions of DMI by actors within an organisation, based on information extracted through BDA, that promote decisionmaker confidence and participation in DDD. DMI, within the IS artefact, are critical to decision-making, the lack of which has implications for the democratisation of the decisionmaker in DDD. The study has uncovered the need to further investigate the extent of each actor's contribution (agency) to DMI, the implications of generational characteristics on adoption and use of Big Data and an in-depth understanding of the relationships between individual differences, Big Data and decision-making. Research is also recommended to better explain democratisation as it relates to data-driven decision-making processes.
|
553 |
Communicating big data in the healthcare industryCastaño Martínez, María, Johnson, Elizabeth January 2020 (has links)
In recent years nearly every aspect of how we function as a society has transformed from analogue to digital. This has spurred extraordinary change and acted as a catalyst for technology innovation, as well as big data generation. Big data is characterized by its constantly growing volume, wide variety, high velocity, and powerful veracity. With the emergence of COVID-19, the global pandemic has demonstrated the profound impact, and often dangerous consequences, when communicating health information derived from data. Healthcare companies have access to enormous data assets, yet communicating information from their data sources is complex as they also operate in one of the most highly regulated business environments where data privacy and legal requirements vary significantly from one country to another. The purpose of this study is to understand how global healthcare companies communicate information derived from data to their internal and external audiences. The research proposes a model for how marketing communications, public relations, and internal communications practitioners can address the challenges of utilizing data in communications in order to advance organizational priorities and achieve business goals. The conceptual framework is based on a closed-loop communication flow and includes an encoding process specialized for incorporating big data into communications. The results of the findings reveal tactical communication strategies, as well as organizational and managerial practices that can position practitioners best for communicating big data. The study concludes by proposing recommendations for future research, particularly from interdisciplinary scholars, to address the research gaps.
|
554 |
Three Essays in Factor Analysis of Asset PricingWang, Wenzhi January 2018 (has links)
Thesis advisor: Robert Taggart / My dissertation is comprised of three chapters. The first chapter is motivated by many lowfrequency sources of systemic risk in the economy. We propose a two-stage learning procedure to construct a high-frequency (i.e., daily) systemic risk factor from a cross-section of low-frequency (i.e., monthly) risk sources. In the first stage, we use a Kalman-Filter approach to synthesize the information about systemic risk contained in 19 different proxies for systemic risk. The low frequency (i.e., monthly) Bayesian factor can predict the cross-section of stock returns out of sample. In particular, a strategy that goes long the quintile portfolio with the highest exposure to the Bayesian factor and short the quintile portfolio with the lowest exposure to the Bayesian factor yields a Fama–French–Carhart alpha of 1.7% per month (20.4% annualized). The second stage is to convert this low frequency Bayesian factor into a high-frequency factor. We use textual analysis Word2Vec that reads the headlines and abstracts of all daily articles from the business section of the New York Times from 1980 to 2016 to collect distributional information on a per word basis and store it in high-dimensional vectors. These vectors are then used in a LASSO model to predict the Bayesian factor. The result is a series of coefficients that can then be used to produce a high-frequency estimate of the Bayesian factor of systemic risk. This high-frequency indicator is validated in several ways including by showing how well it captures the 2008 crisis. We also find that the high frequency factor is priced in the cross-section of stock returns and able to predict large swings in the VIX using a quantile regression approach, which sheds some light on the puzzling relation between the macro-economy and stock market volatility. The second chapter of my dissertation provides a basic quantitative description of a compendium of macro economic variables based on their ability to predict bond returns and stock returns . We use three methods( asymptotic PCA, LASSO and Support Vector Machine) to construct factors out of 133 monthly time series of economic activity spanning a period from 1996:1 to 2015:12 and classify these factors into two groups: bond demand factors and bond supply factors. In PCA regression, we find both demand factors and supply factors are unspanned by bond yields and have stronger predictability power for future bond excess returns than CP factors. This predictability finding is confirmed and enhanced by machine learning technique LASSO and Support Vector Machine. More interestingly, LASSO can be used to identify 15 most important economic variables and give direct economic explanations of predictors for bond returns. Regarding to stock predictability, we find both demand and supply PC factors are priced by the cross-section of stock returns. In particular, portfolios with highest exposure to aggregate supply factor outperform portfolios with lowest exposure to aggregate supply factor 1.8% per month while portfolios with lowest exposure to aggregate demand factor outperform portfolios with highest exposure to aggregate demand factor 2.1% per month. The finding is consistent with ”fly to safety” explanation. Furthermore, variance decomposition from VAR shows that demand factors are much more important than supply factors in explaining asset returns. Finally, we incorporate demand factors and supply factors into macrofinance affine term structure (MTSMs) to estimate market price of risk of factors and find that demand factors affect level risk and supply factors affect slope risk. Moreover, MTSMs enable us to decompose bond yields into expectation component and yield risk premium component and we find MTSMs without macro factors under-estimate yield risk premium. The third chapter,coauthored with Dmitriy Muravyev and Aurelio Vasquez, is motived from the fact that a typical stock has hundreds of listed options. We use principal component analysis (PCA) to preserve their rich information content while reducing dimensionality. Applying PCA to implied volatility surfaces across all US stocks, we find that the first five components capture most of the variation. The aggregate PC factor that combines only the first three components predicts future stock returns up to six months with a monthly alpha of about 1%; results are similar out-of-sample. In joint regressions, the aggregate PC factor drives out all of the popular option-based predictors of stock returns. Perhaps, the aggregate factor better aggregates option price information. However, shorting costs in the underlying drive out the aggregate factor’s predictive ability. This result is consistent with the hypothesis that option prices predict future stock returns primarily because they reflect short sale constraints. / Thesis (PhD) — Boston College, 2018. / Submitted to: Boston College. Carroll School of Management. / Discipline: Finance.
|
555 |
Scalable sparse machine learning methods for big dataZeng, Yaohui 15 December 2017 (has links)
Sparse machine learning models have become increasingly popular in analyzing high-dimensional data. With the evolving era of Big Data, ultrahigh-dimensional, large-scale data sets are constantly collected in many areas such as genetics, genomics, biomedical imaging, social media analysis, and high-frequency finance. Mining valuable information efficiently from these massive data sets requires not only novel statistical models but also advanced computational techniques. This thesis focuses on the development of scalable sparse machine learning methods to facilitate Big Data analytics.
Built upon the feature screening technique, the first part of this thesis proposes a family of hybrid safe-strong rules (HSSR) that incorporate safe screening rules into the sequential strong rule to remove unnecessary computational burden for solving the \textit{lasso-type} models. We present two instances of HSSR, namely SSR-Dome and SSR-BEDPP, for the standard lasso problem. We further extend SSR-BEDPP to the elastic net and group lasso problems to demonstrate the generalizability of the hybrid screening idea. In the second part, we design and implement an R package called \texttt{biglasso} to extend the lasso model fitting to Big Data in R. Our package \texttt{biglasso} utilizes memory-mapped files to store the massive data on the disk, only reading data into memory when necessary during model fitting, and is thus able to handle \textit{data-larger-than-RAM} cases seamlessly. Moreover, it's built upon our redesigned algorithm incorporated with the proposed HSSR screening, making it much more memory- and computation-efficient than existing R packages. Extensive numerical experiments with synthetic and real data sets are conducted in both parts to show the effectiveness of the proposed methods.
In the third part, we consider a novel statistical model, namely the overlapping group logistic regression model, that allows for selecting important groups of features that are associated with binary outcomes in the setting where the features belong to overlapping groups. We conduct systematic simulations and real-data studies to show its advantages in the application of genetic pathway selection. We implement an R package called \texttt{grpregOverlap} that has HSSR screening built in for fitting overlapping group lasso models.
|
556 |
Open for whose benefit? Exploring assumptions, power relations and development paradigms framing the GIZ Open Resources Incubator (ORI) pilot for open voice data in RwandaBrumund, Daniel January 2020 (has links)
Since February 2019, the Kigali-based start-up Digital Umuganda has been coordinating the crowdsourcing of the first openly available voice dataset for Rwanda’s official language Kinyarwanda. This process originated from a pilot project of the Open Resources Incubator (ORI), an emergent service designed by GIZ staff and the author as consultant. ORI aims to facilitate the collective provision of open content, thereby affording previously inaccessible opportunities for local innovation and value creation. It promotes the community-based stewardship of open resources by (inter-)national actors who share responsibilities for their production, distribution and use. ORI’s pilot project cooperates with Mozilla’s team behind Common Voice, a platform to crowdsource open voice data, and has attracted Rwandan public and private actors’ interest in voice technology to improve their products and services.Informed by research on ICTs, datafication and big data in development discourses, and using the ICT4D approach ‘open development’ as its analytical lens, this thesis examines inherent conceptual aspects and socio-technical dynamics of the ORI pilot project. An in-depth analysis of qualitative data gathered through participant observation, interviews and focus groups explores assumed developmental benefits which international and Rwandan actors involved in the project associate with open voice data, power relations manifesting between these actors as well as underlying development paradigms.The analysis shows how the project established a global-local datafication infrastructure sourcing voice data from Rwandan volunteers via technically, legally and socially formalised mechanisms. By placing the dataset in the public domain, the decision as to how it will be used is left to the discretion of intermediaries such as data scientists, IT developers and funders. This arrangement calls into question the basic assumption that the open Kinya-rwanda dataset will yield social impact because its open access is insufficient to direct its usage towards socially beneficial, rather than solely profit-oriented, purposes. In view of this, the thesis proposes the joint negotiation of a ‘stewardship agreement’ to define how value created from the open voice data will benefit its community and Rwanda at large.
|
557 |
Raw Data for Peace and Security - The Extraction and Mining of People's BehaviourDeller, Yannick January 2020 (has links)
In 2015, the United Nations Global Pulse launched an experimentation process assessing the viability of big data and artificial intelligence analysis to support peace and security. The proposition of using such analysis, and thereby creating early warning systems based on real-time monitoring, warrants a critical assessment. This thesis engages in an explanatory critique of the discursive (re-)definitions of peace and security as well as big data and artificial intelligence in the United Nations Global Pulse Lab Kampala report Experimenting with Big Data and Artificial Intelligence to Support Peace and Security. The paper follows a qualitative design and utilises critical discourse analysis as its methodology while using instrumentarian violence as a theoretical lens. The study argues that the use of big data and artificial intelligence analysis, in conjunction with data mining on social media and radio broadcasts for the purposes of early warning systems, creates and manifests social relations marked by asymmetric power and knowledge dynamics. The analysis suggests that the report’s discursive and social practices indicate a conceptualisation of peace and security rooted in the notion of social control through prediction. The study reflects on the consequences for social identities, social relations, and the social world itself and suggests potential areas for future research.
|
558 |
Diseño de un curso teórico y práctico sobre: Big DataLaverde Salazar, María Fernanda January 2015 (has links)
Magíster en Ingeniería de Redes de Comunicaciones / La veloz expansión del uso de la tecnología, genera un conjunto de desafíos en cuanto al manejo y análisis de grandes cantidades de datos que se generan a una gran velocidad, ya que se debe lidiar con situaciones vinculadas tanto con los datos, el software & hardware y además la relaciones entre clientes y proveedores de servicios.
El Big Data es una etapa en la era digital, y no representa un concepto aislado, ya que para su correcto aprovechamiento es necesario establecer una integración con los métodos de análisis de datos que permitirán sacar provecho a la información recolectada. La posibilidad de tomar decisiones y luego llevar a cabo acciones útiles a través de los resultados obtenidos, mediante herramientas de análisis de datos, es lo que constituye el núcleo del Big Data Analytics.
Tal como lo expresa Michael Minelli (coautor del libro Big Data, Big Analytics:
Big Data no es sólo un proceso para almacenar enormes cantidades de data en un data warehouse ( ) Es la habilidad de tomar mejores decisiones y tomar acciones útiles en el momento preciso .
El trabajo de grado que se desarrolla a continuación corresponde al Diseño e implementación de un curso teórico y práctico sobre Big Data . Dicho curso está orientado a alumnos de pregrado de la Universidad de Chile, y se basa en un diseño curricular siguiendo una metodología docente específica la cual se estructura en bloques de planificación y desarrollo. El programa se divide en módulos de aprendizaje definidos mediante temas y objetivos, y tiene una duración de cuarenta (40) horas en total entre clases teóricas (20 horas divididas en 10 clases) y prácticas (20 horas divididas en 5 laboratorios).
El objetivo general del curso, es integrar conocimientos relacionados con la forma de almacenar, administrar y aprovechar mediante herramientas específicas, el incremento sustancial del volumen de datos que se manejan diariamente, e inclusive cada segundo, en las empresas de tecnología y comunicación de las cuales, en su mayoría, día a día somos los principales generadores de data.
|
559 |
Self-Service Business Intelligence success factors that create value for businessSinaj, Jonida January 2020 (has links)
Business Intelligence and Analytics have changed the business needs, but the market requires a more data-driven decision-making environment. Self-Service Business Intelligence initiatives are currently providing more competitive advantages. The role of the users and freedom of access is one of the essential advantages that SSBI holds. Despite this fact, there is still needed analysis on how business can gain more value from SSBI, based on the technological, operational and organizational aspects. The work in this thesis serves to analysis on the SSBI requirements that bring value to business. The paper is organized starting from building knowledge on the existing literature and exploring the domain. Data will be collected by interviewing experts within the BI, SSBI and IT fields. The main findings of the study show that on the technological aspect, data is more governed and its quality is improved by implementing SSBI. Visualization is one of the features of SSBI that boosts quality and governance. On the digital capability aspect, the end-users need training and there is found a rate of impact of SSBI on the main departments in an organization. It is discussed how SSBI implementation affects the companies that do not have BI solution. The final conclusions show that in order for SSBI to be successful, a solid BI environment is necessary. This research will provide future suggestions related to the topic and the results will serve both, the companies that have implemented SSBI and the ones that want to see it as a perspective in the future.
|
560 |
System Support for Large-scale Geospatial Data AnalyticsJanuary 2020 (has links)
abstract: The volume of available spatial data has increased tremendously. Such data includes but is not limited to: weather maps, socioeconomic data, vegetation indices, geotagged social media, and more. These applications need a powerful data management platform to support scalable and interactive analytics on big spatial data. Even though existing single-node spatial database systems (DBMSs) provide support for spatial data, they suffer from performance issues when dealing with big spatial data. Challenges to building large-scale spatial data systems are as follows: (1) System Scalability: The massive-scale of available spatial data hinders making sense of it using traditional spatial database management systems. Moreover, large-scale spatial data, besides its tremendous storage footprint, may be extremely difficult to manage and maintain due to the heterogeneous shapes, skewed data distribution and complex spatial relationship. (2) Fast analytics: When the user runs spatial data analytics applications using graphical analytics tools, she does not tolerate delays introduced by the underlying spatial database system. Instead, the user needs to see useful information quickly.
In this dissertation, I focus on designing efficient data systems and data indexing mechanisms to bolster scalable and interactive analytics on large-scale geospatial data. I first propose a cluster computing system GeoSpark which extends the core engine of Apache Spark and Spark SQL to support spatial data types, indexes, and geometrical operations at scale. In order to reduce the indexing overhead, I propose Hippo, a fast, yet scalable, sparse database indexing approach. In contrast to existing tree index structures, Hippo stores disk page ranges (each works as a pointer of one or many pages) instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. Moreover, I present Tabula, a middleware framework that sits between a SQL data system and a spatial visualization dashboard to make the user experience with the dashboard more seamless and interactive. Tabula adopts a materialized sampling cube approach, which pre-materializes samples, not for the entire table as in the SampleFirst approach, but for the results of potentially unforeseen queries (represented by an OLAP cube cell). / Dissertation/Thesis / Doctoral Dissertation Computer Science 2020
|
Page generated in 0.0671 seconds