Return to search

賽局理論與學習模型的實證研究 / An empirical study of game theory and learning model

賽局理論(Game Theory)大多假設理性決策,單一回合賽局通常可由理論證明均衡(Equilibrium)或是最佳決策,然而如果賽局重複進行,不見得只存在單一均衡,光從理論推導可能無法找到所有均衡。以囚犯困境(Prisoner Dilemma)為例,理論均衡為不合作,若重複的賽局中存有互利關係,不合作可能不是最佳選擇。近年來,經濟學家藉由和統計實驗設計類似的賽局實驗(Game Experiment),探討賽局在理論與實際間的差異,並以學習模型(Learning Model)描述參賽者的決策及行為,但學習模型的優劣大多依賴誤差大小判定,但誤差分析結果可能與資料有關(Data Dependent)。有鑑於學習模型在模型選取上的不足,本文引進統計分析的模型選取及殘差檢定,以實證資料、配合電腦模擬評估學習模型。
本文使用的實證資料,屬於囚犯困境的重複賽局(Repeated Game),包括四種不同的實驗設定,參加賽局實驗者(或是「玩家」)為政治大學大學部學生;比較學習模型有四種:增強學習模型(Reinforcement Learning model)、延伸的增強學習模型(Extend Reinforcement Learning Model)、信念學習模型(Belief Learning Model)、加權經驗吸引模型(Experience-Weighted Attraction Model)。實證及模擬分析發現,增強學習模型較適合用於描述囚犯困境資料,無論是較小的誤差或是適合度分析,增強學習模型都有較佳的結果;另外,也發現玩家在不同實驗設定中的反應並不一致,將玩家分類後會有較佳的結果。 / In game theory, the optimal strategy (or equilibrium) of one-shot games usually can be solved theoretically. But, the optimal strategies of repeated games are likely not unique and are more difficult to find. For example, the defection is the optimal decision for the one-shot Prisoner Dilemma (PD) game. But for the repeated PD game, if the players can benefit from cooperation between rounds then the defection won’t be the only optimal rule. In recent years, economists design game experiments to explore the behavior in repeated games and use the learning models to evaluate the player’s choices. Most of the evaluation criteria are based on the estimation and prediction errors, but the results are likely to be data dependent. In this study, we adapt the model selection process in regression analysis and apply the idea to evaluate learning models. We use empirical data, together with Monte Carlo simulation, to demonstrate the evaluation process.
The empirical data used are repeated PD game, including four different experimental settings, and the players of the game are from National Chengchi University in Taiwan. Also, we consider four learning models: Reinforcement learning (RL) model, Extend Reinforcement learning (ERL) model, Belief Learning (BL) model, and Experience-weighted attraction (EWA) model. We found that the RL model is more appropriate to describe the PD data. In addition, the behaviors of players in a group can be quite different and separating the players into different sets can reduce the estimation errors.

Identiferoai:union.ndltd.org:CHENGCHI/G0098354008
Creators陳冠儒, Chen, Kuan Lu
Publisher國立政治大學
Source SetsNational Chengchi University Libraries
Language中文
Detected LanguageEnglish
Typetext
RightsCopyright © nccu library on behalf of the copyright holders

Page generated in 0.002 seconds