Spelling suggestions: "subject:"data heterogeneity"" "subject:"mata heterogeneity""
1 |
Learning from Task Heterogeneity in Social MediaJanuary 2019 (has links)
abstract: In recent years, the rise in social media usage both vertically in terms of the number of users by platform and horizontally in terms of the number of platforms per user has led to data explosion.
User-generated social media content provides an excellent opportunity to mine data of interest and to build resourceful applications. The rise in the number of healthcare-related social media platforms and the volume of healthcare knowledge available online in the last decade has resulted in increased social media usage for personal healthcare. In the United States, nearly ninety percent of adults, in the age group 50-75, have used social media to seek and share health information. Motivated by the growth of social media usage, this thesis focuses on healthcare-related applications, study various challenges posed by social media data, and address them through novel and effective machine learning algorithms.
The major challenges for effectively and efficiently mining social media data to build functional applications include: (1) Data reliability and acceptance: most social media data (especially in the context of healthcare-related social media) is not regulated and little has been studied on the benefits of healthcare-specific social media; (2) Data heterogeneity: social media data is generated by users with both demographic and geographic diversity; (3) Model transparency and trustworthiness: most existing machine learning models for addressing heterogeneity are considered as black box models, not many providing explanations for why they do what they do to trust them.
In response to these challenges, three main research directions have been investigated in this thesis: (1) Analyzing social media influence on healthcare: to study the real world impact of social media as a source to offer or seek support for patients with chronic health conditions; (2) Learning from task heterogeneity: to propose various models and algorithms that are adaptable to new social media platforms and robust to dynamic social media data, specifically on modeling user behaviors, identifying similar actors across platforms, and adapting black box models to a specific learning scenario; (3) Explaining heterogeneous models: to interpret predictive models in the presence of task heterogeneity. In this thesis, novel algorithms with theoretical analysis from various aspects (e.g., time complexity, convergence properties) have been proposed. The effectiveness and efficiency of the proposed algorithms is demonstrated by comparison with state-of-the-art methods and relevant case studies. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2019
|
2 |
Efficient Decentralized Learning Methods for Deep Neural NetworksSai Aparna Aketi (18258529) 26 March 2024 (has links)
<p dir="ltr">Decentralized learning is the key to training deep neural networks (DNNs) over large distributed datasets generated at different devices and locations, without the need for a central server. They enable next-generation applications that require DNNs to interact and learn from their environment continuously. The practical implementation of decentralized algorithms brings about its unique set of challenges. In particular, these algorithms should be (a) compatible with time-varying graph structures, (b) compute and communication efficient, and (c) resilient to heterogeneous data distributions. The objective of this thesis is to enable efficient decentralized learning in deep neural networks addressing the abovementioned challenges. Towards this, firstly a communication-efficient decentralized algorithm (Sparse-Push) that supports directed and time-varying graphs with error-compensated communication compression is proposed. Second, a low-precision decentralized training that aims to reduce memory requirements and computational complexity is proposed. Here, we design ”Range-EvoNorm” as the normalization activation layer which is better suited for low-precision decentralized training. Finally, addressing the problem of data heterogeneity, three impactful advancements namely Neighborhood Gradient Mean (NGM), Global Update Tracking (GUT), and Cross-feature Contrastive Loss (CCL) are proposed. NGM utilizes extra communication rounds to obtain cross-agent gradient information whereas GUT tracks global update information with no communication overhead, improving the performance on heterogeneous data. CCL explores an orthogonal direction of using a data-free knowledge distillation approach to handle heterogeneous data in decentralized setups. All the algorithms are evaluated on computer vision tasks using standard image-classification datasets. We conclude this dissertation by presenting a summary of the proposed decentralized methods and their trade-offs for heterogeneous data distributions. Overall, the methods proposed in this thesis address the critical limitations of training deep neural networks in a decentralized setup and advance the state-of-the-art in this domain.</p>
|
3 |
A unified framework for real-time streaming and processing of IoT dataZamam, Mohamad January 2017 (has links)
The emergence of the Internet of Things (IoT) is introducing a new era to the realm of computing and technology. The proliferation of sensors and actuators that are embedded in things enables these devices to understand the environments and respond accordingly more than ever before. Additionally, it opens the space to unlimited possibilities for building applications that turn this sensation into big benefits, and within various domains. From smart cities to smart transportation and smart environment and the list is quite long. However, this revolutionary spread of IoT devices and technologies rises big challenges. One major challenge is the diversity in IoT vendors that results in data heterogeneity. This research tackles this problem by developing a data management tool that normalizes IoT data. Another important challenge is the lack of practical IoT technology with low cost and low maintenance. That has often limited large-scale deployments and mainstream adoption. This work utilizes open-source data analytics in one unified IoT framework in order to address this challenge. What is more, billions of connected things are generating unprecedented amounts of data from which intelligence must be derived in real-time. This unified framework processes real-time streams of data from IoT. A questionnaire that involved participants with background knowledge in IoT was conducted in order to collect feedback about the proposed framework. The aspects of the framework were presented to the participants in a form of demonstration video describing the work that has been done. Finally, using the participants’ feedback, the contribution of the developed framework to the IoT was discussed and presented.
|
4 |
Integrating Heterogeneous DataNieva, Gabriel January 2016 (has links)
Technological advances, particularly in the areas of processing and storage have made it possible to gather an unprecedented vast and heterogeneous amount of data. The evolution of the internet, particularly Social media, the internet of things, and mobile technology together with new business trends has precipitated us in the age of Big data and add complexity to the integration task. The objective of this study has been to explore the question of data heterogeneity trough the deployment of a systematic literature review methodology. The study surveys the drivers of this data heterogeneity, the inner workings of it, and it explores the interrelated fields and technologies that deal with the capture, organization and mining of this data and their limitations. Developments such as Hadoop and its suit components together with new computing paradigms such as cloud computing and virtualization help palliate the unprecedented amount of rapidly changing, heterogeneous data which we see today. Despite these dramatic developments, the study shows that there are gaps which need to be filled in order to tackle the challenges of Web 3.0.
|
5 |
The Analysis of Big Data on Cites and Regions - Some Computational and Statistical ChallengesSchintler, Laurie A., Fischer, Manfred M. 28 October 2018 (has links) (PDF)
Big Data on cities and regions bring new opportunities and challenges to data analysts and city planners. On the one side, they hold great promise to combine increasingly detailed data for each citizen with critical infrastructures to plan, govern and manage cities and regions, improve their sustainability, optimize processes and maximize the provision of public and private services. On the other side, the massive sample size and high-dimensionality of Big Data and their geo-temporal character introduce unique computational and statistical challenges. This chapter provides overviews on the salient characteristics of Big Data and how these features impact on paradigm change of data management and analysis, and also on the computing environment. / Series: Working Papers in Regional Science
|
6 |
Automatisation du raisonnement et décision juridiques basés sur les ontologies / Automation of legal reasoning and decision based on ontologiesEl Ghosh, Mirna 24 September 2018 (has links)
Le but essentiel de la thèse est de développer une ontologie juridique bien fondée pour l'utiliser dans le raisonnement à base des règles. Pour cela, une approche middle-out, collaborative et modulaire est proposée ou des ontologies fondationnelles et core ont été réutilisées pour simplifier le développement de l'ontologie. L’ontologie résultante est adoptée dans une approche homogène a base des ontologies pour formaliser la liste des règles juridiques du code pénal en utilisant le langage logique SWRL. / This thesis analyses the problem of building well-founded domain ontologies for reasoning and decision support purposes. Specifically, it discusses the building of legal ontologies for rule-based reasoning. In fact, building well-founded legal domain ontologies is considered as a difficult and complex process due to the complexity of the legal domain and the lack of methodologies. For this purpose, a novel middle-out approach called MIROCL is proposed. MIROCL tends to enhance the building process of well-founded domain ontologies by incorporating several support processes such as reuse, modularization, integration and learning. MIROCL is a novel modular middle-out approach for building well-founded domain ontologies. By applying the modularization process, a multi-layered modular architecture of the ontology is outlined. Thus, the intended ontology will be composed of four modules located at different abstraction levels. These modules are, from the most abstract to the most specific, UOM(Upper Ontology Module), COM(Core Ontology Module), DOM(Domain Ontology Module) and DSOM(Domain-Specific Ontology Module). The middle-out strategy is composed of two complementary strategies: top-down and bottom-up. The top-down tends to apply ODCM (Ontology-Driven Conceptual Modeling) and ontology reuse starting from the most abstract categories for building UOM and COM. Meanwhile, the bottom-up starts from textual resources, by applying ontology learning process, in order to extract the most specific categories for building DOM and DSOM. After building the different modules, an integration process is performed for composing the whole ontology. The MIROCL approach is applied in the criminal domain for modeling legal norms. A well-founded legal domain ontology called CriMOnto (Criminal Modular Ontology) is obtained. Therefore, CriMOnto has been used for modeling the procedural aspect of the legal norms by the integration with a logic rule language (SWRL). Finally, an hybrid approach is applied for building a rule-based system called CORBS. This system is grounded on CriMOnto and the set of formalized rules.
|
Page generated in 0.0619 seconds