Industrial Cyber-physical Systems (ICPSs) connect industrial equipment and manufacturing processes via ubiquitous sensors, actuators, and computer units, forming the Manufacturing Industrial Internet (MII). With the data generated from MII, Artificial Intelligence (AI) greatly advances the data-driven decision making for manufacturing efficiency, quality improvement, and cost reduction. However, data with poor quality have posed significant challenges to the incubation (i.e., training, validation, and deployment) of AI models. In the offline training phase, training data with poor quality will result in inaccurate AI models. In the online training and deployment phases, high-volume and informative-poor data lead to the discrepancy of the AI modeling performance in different phases, and also lead to high communication and computation workload, and high cost in data acquisition and storage. In the incubation of AI models for multiple manufacturing stages or systems, exchanging and sharing datasets can significantly improve the efficiency of data collection for single manufacturing enterprise, and improve the quality of training datasets. However, inaccurate estimation of the value of datasets can cause ineffective dataset exchange and hamper the scaling up of AI systems. High-quality and high-value data not only enhance the modeling performance during AI incubation, but also contribute to effective data exchange for potential synergistic intelligence in MII. Therefore, it is important to assess and ensure the data quality in terms of its value for AI models. In this dissertation, our ultimate goal is to establish a data exchange paradigm to provide high-quality and high-value data for AI incubation in MII. To achieve the goal, three research tasks are proposed for different phases in AI incubation: (1) a prediction-oriented data generation method to actively generate highly informative data in the offline training phase for high prediction performance (Chapter 2); (2) an ensemble active learning by contextual bandits framework for acquisition and evaluation of passively collected online data for the continuous improvement and resilient modeling performance during the online training and deployment phases (Chapter 3); and (3) a context-aware, performance-oriented, and privacy-preserving dataset-sharing framework to efficiently share and exchange small-but-high-quality datasets between trusted stakeholders to allow their on-demand usage (Chapter 4). All the proposed methodologies have been evaluated and validated through simulation studies and applications to real manufacturing case studies. In Chapter 5, the contribution of the work is summarized and the future research directions are proposed. / Doctor of Philosophy / With the data collected in manufacturing processes, Artficial Intelligence (AI) methods greatly improve the data-driven decision making to improve the manufacturing efficiency, product quality, and cost. However, the advancement of AI methods heavily replies on the quality and amount of available datasets. In this dissertation, we focus on the impact of data in three stages of the development of AI models: (1) In the offline training phases (i.e., during design prototyping), limited data with poor quality will result in AI models with poor performance; (2) In the online training and deployment phases (i.e., during mass production), large-volume but poor-quality data will cause the discrepancy of AI modeling performance between training phase and deployment phase, and also result in high labelling and storage cost; (3) In the scaling up phase of AI models across multiple manufacturing stages or systems, it takes a long time and intensive effort for a single manufacturing enterprise to collect sufficient data to train advanced AI models. By exchanging datasets between manufacturers, the time and cost can be saved while the quality of training datasets can be improved. However, without accurately estimating the value of datasets, the exchange will be ineffective. To address these challenges in data for the AI models, this dissertation improves the quality and enables the exchange of data in the aforementioned three stages by: (1) a prediction-oriented data generation method to actively generate highly informative data in the offline training phase for high prediction performance (Chapter 2); (2) an ensemble active learning by contextual bandits framework for data acquisition and evaluation for the continuous improvement and resilient modeling performance during the online training and deployment phases (Chapter 3); and (3) a context-aware, performance-oriented, and privacy-preserving dataset-sharing framework to efficiently share and exchange small-but-high- quality datasets to allow their on-demand usage (Chapter 4). Finally, in Chapter 5, the contribution of the work is summarized and future research directions are proposed.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/120982 |
Date | 21 August 2024 |
Creators | Zeng, Yingyan |
Contributors | Industrial and Systems Engineering, Jin, Ran, Deng, Xinwei, Johnson, Blake, Ellis, Kimberly P. |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0024 seconds