1 |
台灣省各地區普查資料之統計分析莊靖芬 Unknown Date (has links)
本研究的目的為研究台灣省在1990年之15-17歲的在學率,在找出可能影響因素並蒐集好相關的資料後,我們將蒐集到的資料分成兩個部份,一個部份用來建造模型,而另一個部份則用來測試所建立出來的模型。主要的過程是:先利用簡單迴歸模型了解各個可能的因素對於15-17歲的在學率的影響程度,經過許多分析及了解後再對這些變數採取可能的變數轉換(variable transformations),而後再利用三種常用的統計迴歸方法﹝包含有逐步迴歸(stepwise regression)方法、前進選擇(forward selection)方法以及後退消除(backward elimination)方法﹞去發展出一個適當的複迴歸模型(multiple regression model)。對於這個模型,以實際的台灣在學情況來看,我們看不出它有任何的不合理;同時也利用圖形及檢定去驗證模型的假設,其次還做有關迴歸參數的推論(inferences about regression parameters)。再其次,我們運用變異數分析的結果(analysis of variance results)以及新觀察值的預測情形(predictions of new observations)來評估模型的預測能力。最後並利用所得到的最適當的模型,對如何提昇15-17歲青少年的在學率給予適當的建議。 / The objective of this research is to study what factors may affect the schooling rates of 15-17 years old in Taiwan province in 1990. After finding out some possible factors and collecting those data regarding those factors, we separate the data (by stratified random sampling) into two sets. One set is used to construct the model, and the other set shall be used to test the model. The main process to build a regression model is as follows. First, we shall use simple linear regression models to help us to see if each factor may have relation with the schooling rates. With the analysis of residuals and so on, we then make appropriate transformations on each of these factors. Finally, we use three common statistical regression techniques (including stepwise regression, forward selection, and backward elimination methods) to develop a suitable multiple regression model. It seems that, by our understanding of schooling rates in Taiwan, this model is not unreasonable. In addition, we verify the assumptions of the model by graphical methods and statistical tests. We also do the inferences about regression parameters. Furthermore, ye use the results of the analysis of variance and predictions of new observations to evaluate the prediction ability of the model. Finally, we use the most appropriate multiple regression model to give some suggestions to improve (or keep) the schooling rates of 15-17 years old.
|
2 |
電路設計中電流值之罕見事件的統計估計探討 / A study of statistical method on estimating rare event in IC Current彭亞凌, Peng, Ya Ling Unknown Date (has links)
距離期望值4至6倍標準差以外的罕見機率電流值,是當前積體電路設計品質的關鍵之一,但隨著精確度的標準提升,實務上以蒙地卡羅方法模擬電路資料,因曠日廢時愈發不可行,而過去透過參數模型外插估計或迴歸分析方法,也因變數蒐集不易、操作電壓減小使得電流值尾端估計產生偏差,上述原因使得尾端電流值估計困難。因此本文引進統計方法改善罕見機率電流值的估計:先以Box-Cox轉換觀察值為近似常態,改善尾端分配值的估計,再以加權迴歸方法估計罕見電流值,其中迴歸解釋變數為Log或Z分數轉換的經驗累積機率,而加權方法採用Down-weight加重極值樣本資訊的重要性,此外,本研究也考慮能蒐集完整變數的情況,改以電路資料作為解釋變數進行加權迴歸。另一方面,本研究也採用極值理論作為估計方法。
本文先以電腦模擬評估各方法的優劣,假設母體分配為常態、T分配、Gamma分配,以均方誤差作為衡量指標,模擬結果驗證了加權迴歸方法的可行性。而後參考模擬結果決定篩選樣本方式進行實證研究,資料來源為新竹某科技公司,實證結果顯示加權迴歸配合Box-Cox轉換能以十萬筆樣本數,準確估計左、右尾機率10^(-4) 、10^(-5)、10^(-6)、10^(-7)極端電流值。其中右尾部分的加權迴歸解釋變數採用對數轉換,而左尾部分的加權迴歸解釋變數採用Z分數轉換,估計結果較為準確,又若能蒐集電路資訊作為解釋變數,在左尾部份可以有最準確的估計結果;而篩選樣本尾端1%和整筆資料的方式對於不同方法的估計準確度各有利弊,皆可考慮。另外,1%門檻值比例的極值理論能穩定且中等程度的估計不同電壓下的電流值,且有短程估計最準的趨勢。 / To obtain the tail distribution of current beyond 4 to 6 sigma is nowadays a key issue in integrated circuit (IC) design and computer simulation is a popular tool to estimate the tail values. Since creating rare events via simulation is time-consuming, often the linear extrapolation methods (such as regression analysis) are applied to enhance efficiency. However, it is shown from past work that the tail values is likely to behave differently if the operating voltage is getting lower. In this study, a statistical method is introduced to deal with the lower voltage case. The data are evaluated via the Box-Cox (or power) transformation and see if they need to be transformed into normally distributed data, following by weighted regression to extrapolate the tail values. In specific, the independent variable is the empirical CDF with logarithm or z-score transformation, and the weight is down-weight in order to emphasize the information of extreme values observations. In addition to regression analysis, Extreme Value Theory (EVT) is also adopted in the research.
The computer simulation and data sets from a famous IC manufacturer in Hsinchu are used to evaluate the proposed method, with respect to mean squared error. In computer simulation, the data are assumed to be generated from normal, student t, or Gamma distribution. For empirical data, there are 10^8 observations and tail values with probabilities 10^(-4),10^(-5),10^(-6),10^(-7) are set to be the study goal given that only 10^5 observations are available. Comparing to the traditional methods and EVT, the proposed method has the best performance in estimating the tail probabilities. If the IC current is produced from regression equation and the information of independent variables can be provided, using the weighted regression can reach the best estimation for the left-tailed rare events. Also, using EVT can also produce accurate estimates provided that the tail probabilities to be estimated and the observations available are on the similar scale, e.g., probabilities 10^(-5)~10^(-7) vs.10^5 observations.
|
Page generated in 0.0262 seconds