1 |
變數遺漏值的多重插補應用於條件評估法 / Multiple imputation for missing covariates in contingent valua-tion survey費詩元, Fei, Shih Yuan Unknown Date (has links)
多數關於願付價格(WTP)之研究中,遺漏資料通常被視為完全隨機遺漏(MCAR)並刪除之。然而,研究中的某些重要變數若具有過高的遺漏比例時,則可能造成分析上的偏誤。
收入在許多條件評估(Contingent Valuation)調查中經常扮演著一個重要的角色,同時其也是受訪者最傾向於遺漏的變項之一。在這份研究中,我們將透過模擬的方式來評估多重插補法(Multiple Imputa- tion) 於插補願付價格調查中之遺漏收入之表現。我們考慮三種資料情況:刪除遺漏資料後所剩餘之完整資料、一次插補資料、以及多重插補資料,針對這三種情況,藉由三要素混合模型(Three-Component Mixture Model)所進行之分析來評估其優劣。模擬結果顯示,多重插補法之分析結果優於僅利用刪除遺漏資料所剩餘之完整資料進行分析之結果,並且隨著遺漏比例上升,其優劣更是明顯。我們也發現多重插補法之結果也比起一次插補來的更加可靠、穩定。因此如果資料遺漏機制非完全隨機遺漏之機制時,我們認為多重插補法是一個值得信任且表現不錯的處理方法。
此外,文中也透過「竹東及朴子地區心臟血管疾病長期追蹤研究」(Cardio Vascular Disease risk FACtor Two-township Study,簡稱CVDFACTS) 之資料來進行實證分析。文中示範一些評估遺漏機制的技巧,包括比較存活曲線以及邏輯斯迴歸。透過實證分析,我們發現插補前後的確造成模型分析及估計上的差異。 / Most often, studies focus on willingness to pay (WTP) simply ignore the missing values and treat them as if they were missing completely at random. It is well-known that such a practice might cause serious bias and lead to incorrect results.
Income is one of the most influential variables in CV (contingent valuation) study and is also the variable that respondents most likely fail to respond. In the present study, we evaluate the performance of multiple imputation (MI) on missing income in the analysis of WTP through a series of simulation experiments. Several approaches such as complete-case analysis, single imputation, and MI are considered and com-pared. We show that performance with MI is always better than complete-case analy-sis, especially when the missing rate gets high. We also show that MI is more stable and reliable than single imputation.
As an illustration, we use data from Cardio Vascular Disease risk FACtor Two-township Study (CVDFACTS). We demonstrate how to determine the missing mechanism through comparing the survival curves and a logistic regression model fitting. Based on the empirical study, we find that discarding cases with missing in-come can lead to something different from that with multiple imputation. If the dis-carded cases are not missing complete at random, the remaining samples will be biased. That can be a serious problem in CV research. To conclude, MI is a useful method to deal with missing value problems and it should be worthwhile to give it a try in CV studies.
|
2 |
葛特曼量表之拒答插補研究左宗光 Unknown Date (has links)
在抽樣調查的資料中,可能因為題意不清、關係到個人隱私,或是議題太過於敏感而導致受訪者「拒答」。透過存在拒答的樣本資料來做分析探討時,很可能會造成偏誤的研究結果,因此如何處理無反應的資料常常是一項研究結果是否可信的重要關鍵之一。常見的處理方式通常是設法對這些拒答資料進行插補。然而插補的好壞一直沒有一個判定準則,分析結果亦常因此受到質疑。
本研究將針對葛特曼量表的資料型態,利用「正確率」的概念,用不同的插補方式,包括社會科學研究常使用的簡易插補法,以及多重插補法與最鄰近插補法等方法,透過計算正確率來比較插補的好壞以及推論適用的時機。本研究以「台灣社會變遷基本調查」第四期第三次的調查資料中,有關性態度的題目做為例子,將其中符合葛特曼量表的資料視為「黃金標準」,並按照其中拒答部分的形態,從黃金標準中製造拒答資料。隨著拒答率的上升,每種拒答形態對應的個數將等量放大。
研究結果發現,簡易插補法的正確率可以透過公式推導求出。在這筆資料之下,不論何種簡易插補方法,其正確率都不超過32%,但隨著拒答型態與社會開放程度的不同,拒答率會有很大的變化。多重插補法之下的結果比簡易插補法略好一些,有接近33%的正確率,但從便利性來看使用簡易插補法就比多重插補法來的高。最鄰近插補法的正確率是相對比較高的,最高可以達到約47%,然而執行上比較花費時間,以及正確率有隨著拒答率的上升而下降的趨勢都是最鄰近插補法可能的問題。 / In a questionnaire survey、respondents may refuse to answer certain items since the questions themselves are unclear、sensitive、or relating to personal privacy. An analysis result using a data set containing refusal responses might be biased、how to deal with survey refusals have thus drawn much attention of late. One popular approach is through the use of imputation. However、lacking a criterion to evaluate its performances、there exist debates concerning the usefulness of this approach.
In this study、we compare Simple imputation Method、Multiple Imputation Method、and Nearest Neighbor Method to deal with refusals in a set of survey items forming a Gittman scale in terms of imputation accuracy. Data are taken from the 2002 Taiwan Social Change Survey (TSCS)、and the items of interest are about sexual attitude. The parts of data that satisfy perfect Guttman scale are treat as 「Gold Standard」、and refusals are generated according to the original refusal pattern appear in the data.
The result shows that the accuracy associated with Simple Imputation can actually be derived theoretically. No matter which version of Simple Imputation is applied、the accuracy is no more than 32%. Multiple Imputations performs slightly better than Simple Imputation、the accuracy is about 33%. However、it is less efficient in terms of computer time. Although Nearest Neighbor Method has the best performance the three、and its accuracy can reach as 47%、it requires much more computer time than the other two methods、and the accuracy would decrease as the refusal rate goes up.
|
3 |
多重插補法在線上使用者評分之應用 / Managing online user-generated product reviews using multiple imputation methods李岑志, Li, Cen Jhih Unknown Date (has links)
隨著網路普及,人們越來越常在網路上購物並在線上評價商品,產生了非常大的口碑效應。不論對廠商或對消費者來說,線上商品評論都已經變得非常重要;消費者能藉由他人購買經驗判斷產品優劣,廠商能藉由消費者評價來提升產品品質,目前已有許多電子商務網站都有蒐集消費者購買產品後的意見回饋。
這些網站中有些提供消費者能對產品打一個總分並寫一段文字評論,然而每個消費者所評論的產品特徵通常各有不同,尤其是較晚購買的消費者更可能因為自己的意見已經有人提過而省略。將每個人提到的文字敘述量化為數字分數時,沒有寫到的特徵將會使量化後的資料存在許多遺漏值。
同時消費者也有可能提到一些不重要的特徵,若能找到消費者評論中,各個特徵影響消費者的多寡,廠商就能針對產品較重要的缺點改進。本研究將會著重探討消費者所提到的特徵對產品總分的影響,以及這些遺漏值填補後是否能接近消費者真實意見。
過去許多填補遺漏值的方法都是一次填補全部資料,並沒有考慮消費者會受到時間較早的評論影響。本研究設計一套多重插補的方法並透過模擬驗證,以之填補亞馬遜網站的Canon 系列 SX210、SX230、SX260等三個世代數位相機之消費者評論資料。研究結果指出此方法能夠準確估計各項特徵對產品總分的影響。 / Online user-generated product reviews have become a rich source of product quality information for both producers and customers. As a result, many E-commerce websites allow customers to rate products using scores, and some together with text comments. However, people usually comment only on the features they care about and might omit those have been mentioned by previous customers. Consequently, missing data occur when analyzing comments.
In addition, customers may comment the features which influence neither their satisfaction nor sales volume. Thus, it is important to find the significant features so that manufacturers can improve the main defects. Our research focuses on modeling customer reviews and their influence on predicting overall ratings. We aim to understand whether, by filling up missing values, the critical features can be identified and the features rating authentically reflect customer opinion.
Many previous studies fill whole the dataset, but not consider that customer reviews might be influenced by the foregoing reviews. We propose a method based on multiple imputation and fill the costumer reviews of Canon digital camera (SX210, SX230, SX260 generations) on Amazon. We design a simulation to verify the method’s effectiveness and the method get a great result on identifying the critical features.
|
4 |
家庭作業與學習成就關係之研究—以TIMSS與TEPS臺灣學生為例 / The Relationship between Homework and Learning Achievements: An Example of Taiwan Students from TIMSS and TEPS陳俊瑋 Unknown Date (has links)
本研究旨在了解家庭作業與學習成就的關係。為達研究目的,本研究以階層線性模式分析「國際數學與科學教育成就趨勢調查」2007年4年級學生資料;2007年8年級學生資料;以及2011年8年級學生資料,接著,本研究再以結構方程模式的長期追蹤交叉延宕模式,分析「臺灣教育長期追蹤資料庫」2001年、2003年及2005年追蹤樣本學生資料,本研究主要發現:
一、臺灣4年級學生的學生層次數學家庭作業時間對數學學習成就有顯著負向地影響效果;學生層次科學家庭作業時間對科學學習成就也有顯著負向地影響效果。
二、臺灣4年級學生的班級層次數學家庭作業頻率對數學學習成就沒有顯著地影響效果;班級層次科學家庭作業頻率對科學學習成就也沒有顯著地影響效果。
三、臺灣8年級學生的學生層次數學家庭作業時間對數學學習成就有顯著正向地影響效果;學生層次科學家庭作業時間對科學學習成就也有顯著正向地影響效果。
四、臺灣8年級學生的班級層次數學家庭作業頻率對數學學習成就有顯著正向地影響效果;班級層次科學家庭作業頻率對科學學習成就也有顯著正向地影響效果。
五、臺灣2001年7年級陸續追蹤至2005年11年級的學生,其家庭作業時間與學習成就有顯著正向地相互影響效果。 / This study aimed analyze the relationship between homework and learning achievements. Hierarchical linear modeling was used to analyze the 4th grade of elementary school students from Trends in International Mathematics and Science Study (TIMSS) 2007, 8th grade of junior high school students from TIMSS 2007, and 8th grade of junior high school students from TIMSS 2011. Moreover, structural equation modeling with cross-lagged panel modeling was used to analyze the core panel sample data from Taiwan Education Panel Survey (TEPS) in 2001, 2003, and 2005. The major findings were as follows:
1. Taiwan 4th grade of elementary school students’ student-level mathematic homework time could negative predict the mathematic learning achievements significantly, and student-level science homework time could also negative predict the science learning achievements significantly.
2. Taiwan 4th grade of elementary school students’ class-level mathematic homework frequency could not predict the mathematic learning achievements significantly, and class-level science homework frequency could also not predict the science learning achievements significantly.
3. Taiwan 8th grade of junior high school students’ student-level mathematic homework time could positive predict the mathematic learning achievements significantly, and student-level science homework time could also positive predict the science learning achievements significantly.
4. Taiwan 8th grade of junior high school students’ class-level mathematic homework frequency could positive predict the mathematic learning achievements significantly, and class-level science homework frequency could also positive predict the science learning achievements significantly.
5. Taiwan 7th grade of junior high school students to 11th grade of senior high school students’ homework time could positive predict the subsequent learning achievements significantly, and learning achievements could also positive predict the subsequent homework time significantly.
|
Page generated in 0.0193 seconds