Global ETD Search

Return to search

Offline Reinforcement Learning from Imperfect Human Guidance / 不完全な人間の誘導からのオフライン強化学習

京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24856号 / 情博第838号 / 新制||情||140(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授鹿島, 久嗣, 教授河原, 達也, 教授森本, 淳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM

Offline Reinforcement Learning

Preference-based Reinforcement Learning

Human-in-the-loop Reinforcement Learning

Identifer	oai:union.ndltd.org:kyoto-u.ac.jp/oai:repository.kulib.kyoto-u.ac.jp:2433/284789
Date	24 July 2023
Creators	Zhang, Guoxi
Contributors	张, 国熙, チョウ, コクキ
Publisher	Kyoto University, 京都大学
Source Sets	Kyoto University
Language	English
Detected Language	English
Type	doctoral thesis, Thesis or Dissertation
Rights	3章は1及び2に基づく。4章は3に基づく。5章は4及び5に基づく。1. G. Zhang and H. Kashima. Batch reinforcement learning from crowds. In Machine Learning and Knowledge Discovery in Databases, pages 38–51. Springer Cham, 2023. https://doi.org/10.1007/978-3-031-26412-2_3 2. G. Zhang, J. Li, and H. Kashima. Improving pairwise rank aggregation via querying for rank difference. In Proceedings of the Ninth IEEE International Conference on Data Science and Advanced Analytics, IEEE, 2022. https://doi.org/10.1109/DSAA54385.2022.10032454 3. G. Zhang and H. Kashima. Learning state importance for preference-based reinforcement learning. Machine Learning, 2023. https://doi.org/10.1007/s10994-022-06295-5 4. G. Zhang and H. Kashima. Behavior estimation from multi-source data for offline reinforcement learning. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence. AAAI Press, 2023. 5. G. Zhang, X. Yao, and X. Xiao. On modeling long-term user engagement from stochastic feedback. In Companion Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, 2023. https://doi.org/10.1145/3543873.3587626

Page generated in 0.0024 seconds