Global ETD Search

1	多重插補法在線上使用者評分之應用 / Managing online user-generated product reviews using multiple imputation methods 李岑志, Li, Cen Jhih Unknown Date (has links) 隨著網路普及，人們越來越常在網路上購物並在線上評價商品，產生了非常大的口碑效應。不論對廠商或對消費者來說，線上商品評論都已經變得非常重要；消費者能藉由他人購買經驗判斷產品優劣，廠商能藉由消費者評價來提升產品品質，目前已有許多電子商務網站都有蒐集消費者購買產品後的意見回饋。這些網站中有些提供消費者能對產品打一個總分並寫一段文字評論，然而每個消費者所評論的產品特徵通常各有不同，尤其是較晚購買的消費者更可能因為自己的意見已經有人提過而省略。將每個人提到的文字敘述量化為數字分數時，沒有寫到的特徵將會使量化後的資料存在許多遺漏值。同時消費者也有可能提到一些不重要的特徵，若能找到消費者評論中，各個特徵影響消費者的多寡，廠商就能針對產品較重要的缺點改進。本研究將會著重探討消費者所提到的特徵對產品總分的影響，以及這些遺漏值填補後是否能接近消費者真實意見。過去許多填補遺漏值的方法都是一次填補全部資料，並沒有考慮消費者會受到時間較早的評論影響。本研究設計一套多重插補的方法並透過模擬驗證，以之填補亞馬遜網站的Canon 系列 SX210、SX230、SX260等三個世代數位相機之消費者評論資料。研究結果指出此方法能夠準確估計各項特徵對產品總分的影響。 / Online user-generated product reviews have become a rich source of product quality information for both producers and customers. As a result, many E-commerce websites allow customers to rate products using scores, and some together with text comments. However, people usually comment only on the features they care about and might omit those have been mentioned by previous customers. Consequently, missing data occur when analyzing comments. In addition, customers may comment the features which influence neither their satisfaction nor sales volume. Thus, it is important to find the significant features so that manufacturers can improve the main defects. Our research focuses on modeling customer reviews and their influence on predicting overall ratings. We aim to understand whether, by filling up missing values, the critical features can be identified and the features rating authentically reflect customer opinion. Many previous studies fill whole the dataset, but not consider that customer reviews might be influenced by the foregoing reviews. We propose a method based on multiple imputation and fill the costumer reviews of Canon digital camera (SX210, SX230, SX260 generations) on Amazon. We design a simulation to verify the method’s effectiveness and the method get a great result on identifying the critical features. 意見探勘遺漏值多重插補 Opinion mining Missing data Multiple imputation
2	應用資料採礦技術於資料庫加值中的插補方法比較 / Imputation of value-added database in data mining 黃雅芳 Unknown Date (has links) 資料在企業資訊來源中扮演了極為重要的角色，特別是在現今知識與技術的世代裡。如果對於一個有意義且具有代表性資料庫中的遺漏值能夠正確的處理，那麼對於企業資訊而言，是一個大有可為的突破。然而，有時我們或許會遇到一些不是那麼完善的資料庫，當資料庫中的資料有遺漏值時，從這樣資料庫中所獲得的結果，或許會是一些有偏差或容易令人誤解的結果。因此，本研究的目的在於插補遺漏值為資料庫加值，進而根據遺漏值類型建立插補模型。如果遺漏值為連續型，用迴歸模型和倒傳遞類神經模型來進行插補；如果遺漏值為類別型，採用邏輯斯迴歸、倒傳遞類神經和決策樹進行插補分析。經由模擬的結果顯示，對於連續型的遺漏值，迴歸模型提供了最佳的插補估計；而類別型的遺漏值，C5.0決策樹是最佳的選擇。此外，對於資料庫中的稀少資料，當連續型的遺漏值，倒傳遞類神經模型提供了最佳的插補估計；而類別型的遺漏值，亦是C5.0決策樹是最佳的選擇。 / Data plays a vital role as source of information to the organization especially in the era of information and technology. A meaningful, qualitative and representative database if properly handled could mean a promising breakthrough to the organizations. However, from time to time, we may encounter a not so perfect database, that is we have the situation where the data in the database is missing. With the incomplete database, the results obtained from such database may provide biased or misleading solutions. Therefore, the purpose of this research is to place its emphasis on imputing missing data of the value-added database then builds the model in accordance to the type of data. If the missing data type is continuous, regression model and BPNN neural network is applied. If the missing data type is categorical, logistic regression, BPNN neural network and decision tree is chosen for the application. Our result has shown that for the continuous missing data, the regression model proved to deliver the best estimate. For the categorical missing data, C5.0 decision tree model is the chosen one. Besides, as regards the rare data missing in the database, our result has shown that for the continuous missing data, the BPNN neural network proved to deliver the best estimate. For the categorical missing data, C5.0 decision tree model is the chosen one. 資料採礦資料庫加值稀少資料遺漏值插補 data mining value-added database rare data missing data imputation
3	複雜抽樣下反應變數遺漏時之迴歸分析 / Regression Analysis with Missing Value of Responses under Complex Survey 許正宏, Hsu, Cheng-Hung Unknown Date (has links) Gelman, King, 及Liu(1998)針對一連串且互相獨立的橫斷面調查提出多重設算程序，且對不同調查的參數以階層模式(hierarchical model)連結。本文為介紹複雜抽樣(分層或群集抽樣)之下，若Q個連續變數有遺漏現象時，如何結合對象之個別特性，各層或各群集的參數，以及連結各層或各群集參數的階層模式，以設算遺漏值及估計模式中之參數。對遺漏值的處理採用單調資料擴展演算法，只需對破壞單調資料型態的遺漏值進行設算。由於考慮到不同的群集或層往往呈現不同的特性，因而以階層模式連絡各群集或各層的參數，並將Gelman, King, Liu(1998)的推導結果擴展到將個別對象之特性納入考量之上。對各群集而言，他們的共變異數矩陣Ψ及Σ為影響群內其他參數的收斂情形，由模擬獲得的結果，沒有證據顯示應懷疑收斂的問題。 / Gelman, king, and Liu (1998) use multiple imputation for a series of cross section survey, and link the parameter of different survey by hierarchical model. This text introduces a method to impute missing value and estimate the parameters affected by hierarchical model if Q continuous variables has missing value under complex survey. For each cluster, the parameters are influenced by their variance-covariance matrix Ψ and Σ. The result obtained from the simulation have no clear evidence to doubt the convergence of parameters. 分層抽樣群集抽樣遺漏值多重設算單調資料型態階層模式 stratified sampling cluster sampling missing value multiple imputation monotone data pattern hierarchical model
4	資料採礦中的資料純化過程之效果評估楊惠如 Unknown Date (has links) 數年來台灣金控公司已如雨後春筍般冒出來，在金控公司底下含有產險公司、銀行、證券以及人壽公司等許多金融相關公司，因此，原本各自擺放於各子公司的資料庫可以通通整合在一起，當高階主管想提出決策時可利用資料庫進行資料採礦，以獲取有用的資訊。然而資料採礦的效果再怎麼神奇，也必須先有一個好的、完整的資料庫供使用，如果資料品質太差或者資料內容與研究目標無關，這是無法達成完美的資料採礦工作。透過抽樣調查與函數映射的方法使得資料庫得以加值，因此當有目標資料庫與輔助資料庫時，可以利用函數映射方法使資料庫整合為一個大資料庫，再將資料庫中遺失值或稀少值作插補得到增值後的資料庫。在此給予這個整個流程一個名詞 ”Data SPA(Data Systematic Purifying Analysis)”，即「資料純化」。在本研究中，主要就是針對純化完成的資料進行結構的確認，確認經過這些過程之後的資料是效用且正確的。在本研究採用了橫向評估、縱向評估與全面性評估三種方法來檢驗資料。資料純化後的資料經過三項評估後，可以發現資料以每個變數或者每筆觀察樣本的角度去查驗資料時，資料的表現並不理想，但是，資料的整體性卻是相當不錯。雖然以橫向評估和縱向評估來看，資料純化後的資料無法與原本完整的資料完全一致，但是透過資料純化的過程，資料得以插補且欄位得以擴增，這樣使得資料的資訊量增加，所以，資料純化確實有其效果，因為資訊量的增加對於要進行資料採礦的資料庫是一大助益。 / For the past few years, Taiwan has experienced a tremendous growth in its financial industry namely in banks, life and property insurances, brokerages and security firms. Needless to say the need to store the data produced in this industry has become an important and a primary task to accomplish. Originally, firms store the data in their own database. With the progressive development of data management, the data now can be combined and stored into one large database that allows the users an easy access for data retrieval. However, if the quality of the data is questionable, then the existence of database would not provide much insightful information to the users. To tackle the fore mentioned problem, this research uses functional mapping combining the goal and auxiliary database and then imputes the missing data or the rare data from the combined database. This whole process is called Data Systematic Purifying Analysis (Data SPA). The purpose of this research is to evaluate whether there is any improvement of the structure of the data when the data has gone through the process of systematic purifying analysis. Generally the resulting data should be within good quality and useful. After the assessments of the data structure, the behavior of the data with respect to their added variables and observations is unsatisfactory. However the manifestation of the data as a whole has seen an improvement. The modified database through Data SPA has augmented the database making it more efficient to the usage of data mining techniques. 資料純化資料採礦遺漏值插補函數映射資料庫加值 Data Systematic Purifying Analysis Data Mining Missing Data Rare Data Imputation Functional Mapping Database Value-Added
5	遺漏值存在時羅吉斯迴歸模式分析之研究 / Logistic Regression Analysis with Missing Value 劉昌明, Liu, Chang Ming Unknown Date (has links) 無遺漏值羅吉斯迴歸模式 EM方法廣義線性模式對數線性模式最大概似估計法 missing value logistic regression EM algorithm GLM log-linear model MLE

1

Page generated in 0.0215 seconds