Spelling suggestions: "subject:"data evaluatuation"" "subject:"data evalualuation""
1 |
ModelPred: A Framework for Predicting Trained Model from Training DataZeng, Yingyan 06 June 2024 (has links)
In this work, we propose ModelPred, a framework that helps to understand the impact of changes in training data on a trained model. This is critical for building trust in various stages of a machine learning pipeline: from cleaning poor-quality samples and tracking important ones to be collected during data preparation, to calibrating uncertainty of model prediction, to interpreting why certain behaviors of a model emerge during deployment. Specifically, ModelPred learns a parameterized function that takes a dataset S as the input and predicts the model obtained by training on S. Our work differs from the recent work of Datamodels as we aim for predicting the trained model parameters directly instead of the trained model behaviors. We demonstrate that a neural network-based set function class is capable of learning the complex relationships between the training data and model parameters. We introduce novel global and local regularization techniques to prevent overfitting and we rigorously characterize the expressive power of neural networks (NN) in approximating the end-to-end training process. Through extensive empirical investigations, we show that ModelPred enables a variety of applications that boost the interpretability and accountability of machine learning (ML), such as data valuation, data selection, memorization quantification, and model calibration. / Amazon-Virginia Tech Initiative in Efficient and Robust Machine Learning / Master of Science / Also published as Zeng, Y., Wang, J. T., Chen, S., Just, H. A., Jin, R., & Jia, R. (2023, February). ModelPred: A Framework for Predicting Trained Model from Training Data. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) (pp. 432-449). IEEE. https://doi.org/10.1109/SaTML54575.2023.00037 / With the prevalence of large and complicated Artificial Intelligence (AI) models, it is important to build trust in the various stages of a machine learning model pipeline, from cleaning poor-quality samples and tracking important ones to be collected during the training data preparation, to calibrating uncertainty of model prediction during the inference stage, to interpreting why certain behaviors of a model emerge during deployment. In this work, we propose ModelPred, a framework that helps to understand the impact of changes in training data on a trained model. To achieve this, ModelPred learns a parameterized function that takes a dataset S as the input and predicts the model obtained by training on S, thus learning the impact from data on the model efficiently. Our work differs from the recent work of Datamodels [28] as we aim for predicting the trained model parameters directly instead of the trained model behaviors. We demonstrate that a neural network-based set function class is capable of learning the complex relationships between the training data and model parameters. We introduce novel global and local regularization techniques to enhance the generalizability and prevent overfitting. We also rigorously characterize the expressive power of neural networks (NN) in approximating the end-to-end training process. Through extensive empirical investigations, we show that ModelPred enables a variety of applications that boost the interpretability and accountability of machine learning (ML), such as data valuation, data selection, memorization quantification, and model calibration. This greatly enhances the trustworthy of machine learning models.
|
2 |
Towards a value theory for personal dataSpiekermann-Hoff, Sarah, Korunovska, Jana 03 1900 (has links) (PDF)
Analysts, investors and entrepreneurs have recognized the value of personal data for Internet economics. Personal data is viewed as the "oil" of the digital economy. Yet, ordinary people are barely aware of this. Marketers collect personal data at minimal cost in exchange for free services. But will this be possible in the long term, especially in the face of privacy concerns? Little is known about how users really value their personal data. In this paper, we build a user-centered value theory for personal data. On the basis of a survey experiment with 1269 Facebook users, we identify core constructs that drive the value of volunteered personal data. We find that privacy concerns are less influential than expected and influence data value mainly when people become aware of data markets. In fact, the consciousness of data being a tradable asset is the single most influential factor driving willingness-to-pay for data. Furthermore, we find that people build a sense of psychological ownership for their data and hence value it more. Finally, our value theory helps to unveil market design mechanisms that will influence how personal data markets thrive: First, we observe a majority of users become reactant if they are consciously deprived of control over their personal data; many drop out of the market. We therefore advice companies to consider user-centered data control tools to have them participate in personal data markets. Second, we find that in order to create scarcity in the market, centralized IT architectures (reducing multiple data copies) may be beneficial.
|
3 |
Data as a production factor: A model to measure the value of big data through business process managementZipf, Torsten 04 July 2022 (has links)
Big Data has been among the most innovative topics in literature sources and among organizations for years. Even though only few organizations realized the significant value potentials described by contemporary literature sources, it is widely acknowledged that data assets can provide significant competitive benefits. Given the promises regarding value increases and competitiveness, practitioners as well as academia desire systematic approaches to transform the data sets into measurable assets.
This dissertation investigates the current state of literature, conducts an empirical investigation through a structural equation modeling and applies existing theory to develop a model that allows organizations to apply a systematic approach to measure the value of Big Data specifically to their organization. With Business Process Management as the foundation of the model, IT as well as business functions will be able to successfully apply the model. Based on the assumption that Data is acknowledged as a production factor, the developed model supports organizations to justify Big Data investment decisions and thereby to contribute to competitiveness and company value. Furthermore, the findings and the model equip future researchers with a framework that can be adapted for industry-specific purposes, validated in different organizational contexts or dismantled to investigate specific success factors.
|
Page generated in 0.1117 seconds