資料品質的好壞會影響決策品質以及各種行動的執行成果,所以資料品質在近年來越來越受到重視。本研究包含了兩個資料庫,一個是產業創新調查資料庫,一個是95年工商及服務業普查資料庫,資料品質的好壞對一個資料庫來說也是一個相當重要的議題,資料庫中往往都含有錯誤的資料,錯誤的資料會導致分析結果出現偏差的狀況,所以在進行資料分析之前,資料清理與整理是必要的事前處理工作。
我們從母體資料分佈與樣本資料分佈得知,在清理與整理資料之前,平均創新員工人數為92.08,平均工商員工人數為135.54;在清理與整理資料之後,我們比較兩個資料庫員工人數的相關性、相似性、距離等性質,結果顯示兩個資料庫的資料一致性極高,平均創新員工人數與平均工商員工人數分別為39.01與42.12,跟母體平均員工人數7.05較為接近,也顯示出資料清理的重要性。
本研究使用的方法為事後分層抽樣,主要研究目的是要利用產業創新調查樣本來推估95年工商及服務業普查母體資料的準確性。產業創新調查樣本在推估母體從業員工人數與母體營業收入方面皆出現高估的狀況,推測出現高估的原因是產業創新調查母體為前中華徵信所出版的五千大企業名冊為母體底冊,而工商及服務業普查企業資料為一般企業母體底冊。因此,我們利用和產業創新調查樣本所相對應的工商普查樣本做驗證,發現95年工商及服務業普查樣本與產業創新調查樣本的資料一致性極高。 / Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary.
We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning.
Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high.
Identifer | oai:union.ndltd.org:CHENGCHI/G0098354020 |
Creators | 邱詠翔 |
Publisher | 國立政治大學 |
Source Sets | National Chengchi University Libraries |
Language | 中文 |
Detected Language | English |
Type | text |
Rights | Copyright © nccu library on behalf of the copyright holders |
Page generated in 0.0016 seconds