Global ETD Search

71	The application of geostatistical techniques in the analysis of joint data Grady, Lenard Alden 22 January 2015 (has links) No description available. Joints (Geology) Geology--Statistical methods
72	New statistics to compare two groups with heterogeneous skewness. January 2012 (has links) 筆者在論文中引入一個名為加權距離檢驗的雙變項統計。此檢驗方法用於比較兩個隨機變數的集中趨勢。其優勢在於在有偏度的數據中，仍能穩定地控制第一型錯誤，並同時提供可觀的統計檢力。加權距離檢驗利用冪函數修正在偏度數據中的不對稱現象。與一般的冪函數轉換法不同，加權距離檢驗將冪值限制在0和1之間。文中亦提供了一個有效決定冪值的方法，以方便在實際運算中使用。 / 筆者總結了四個主流的雙變項統計方法，並利用蒙地卡羅模擬法在正態分佈、同程度偏度分佈以及不同程度偏度分佈三個情況中比較了它們與加權距離檢驗的表現。結果顯示，加權距離檢驗雖然沒有在任何一個情況中勝出，但卻於兩方面表現了其優勢。首先，它在任何情況下都能把第一型錯誤控制在合理水平之下；其次，它在任何情況下都不至於表現得太差。反觀其他四個檢驗方法總會在某些情況下表現失敗。由此可見，加權距離檢驗比起其他檢驗方法更能提供一個穩定而簡單的方法去比較集中趨勢。 / A new bivariant statistics, namely the weighted distance test, for comparing two groups were introduced. The test aims at providing reliable type I error control and reasonable statistical power across different types of skewed data. It corrects the skewness of the data by applying power transformation with power index ranged between 0 to 1. I also proposed in this thesis a possible way of deciding the power index by considering the skewness difference between the two groups under comparison. / I reviewed 4 commonly used inferential statistics for two-group comparison and compared their performances with the weighted distance test under 1) normal distribution, 2) skewed distribution with equal skewness across groups, and 3) skewed distribution with unequal skewness across groups. Monte Carlo simulations were ran to evaluate the 5 tests. Results showed that the weighted distance test was not the best test in any particular situation, but was the most stable test in the sense that 1) it provided accurate type I error control and 2) it did not produce catastrophically small power in any scenario. All other 4 tests failed in some of the simulated scenario for either inflated type I error, or unsatisfactory power. Therefore, I suggested that the weighted distance test could be one easy-to-use test that works fairly well across a wide range of situation. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Lee, Yung Ho. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 31-33). / Abstracts also in Chinese. / Chapter Chapter One --- : Introduction --- p.1 / Common methods in comparing central tendency --- p.2 / T-test --- p.2 / Median and rank --- p.3 / Trimming --- p.3 / Power transformation --- p.4 / Chapter Chapter Two --- : Weighted distance statistic --- p.5 / Definition --- p.5 / Statistical properties --- p.5 / Specification of Lambda λ --- p.7 / Estimation and inference --- p.7 / Chapter Chapter Three --- : Simulation --- p.10 / Study 1 --- p.12 / Study 2 --- p.15 / Study 3 --- p.18 / Chapter Chapter Four --- : Discussion --- p.21 / Summary --- p.21 / Limitation --- p.22 / Further development --- p.23 / Chapter Appendix I --- : Proofs of theorems of weighted distance statistic --- p.24 / Chapter Appendix II --- : Table of numerical results of simulations --- p.26 / Bibliography --- p.30 Psychometrics
73	The analysis of protein sequences: statistical modeling for short structural motifs. / Statistical modeling for short structural motifs January 1997 (has links) by Sze-wan Man. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (leaves 41-42). / Chapter Chapter 1 --- Introduction --- p.1 / Chapter Chapter 2 --- The probability model --- p.8 / Chapter 2.1 --- Introduction --- p.8 / Chapter 2.2 --- The coding system --- p.11 / Chapter 2.3 --- The likelihood estimates of hexamer codes --- p.13 / Chapter 2.4 --- A cross validation study --- p.15 / Chapter Chapter 3 --- An application of the likelihood ratio --- p.21 / Chapter 3.1 --- Introduction --- p.21 / Chapter 3.2 --- The Needleman-Wunsch algorithm --- p.21 / Chapter 3.2.1 --- Background --- p.21 / Chapter 3.2.2 --- The principle of the algorithm --- p.21 / Chapter 3.2.3 --- The algorithm --- p.23 / Chapter 3.3 --- Application of the structural information --- p.25 / Chapter 3.3.1 --- Basic idea --- p.25 / Chapter 3.3.2 --- Comparison between pairs of hexamer sequences --- p.25 / Chapter 3.3.3 --- The score of similarity of a pair of hexamer sequences --- p.26 / Chapter Chapter 4 --- Application of the modified Needleman-Wunsch algorithm --- p.27 / Chapter 4.1 --- The structurally homologous pair --- p.27 / Chapter 4.1.1 --- The horse hemoglobin beta chain --- p.28 / Chapter 4.1.2 --- The antartic fish hemoglobin beta chain --- p.29 / Chapter 4.2 --- Other proteins --- p.31 / Chapter 4.2.1 --- The acetylchoinesterase --- p.31 / Chapter 4.2.2 --- The lipase --- p.32 / Chapter 4.2.3 --- The two A and D chains of the deoxyribonuclease --- p.33 / Chapter 4.3 --- Evaluation of the significance of the maximum match --- p.36 / Chapter Chapter 5 --- Conclusion and discussion --- p.38 / References --- p.41 / Tables --- p.43 Amino acid sequence--Statistical methods
74	Reliable techniques for survey with sensitive question Wu, Qin 01 January 2013 (has links) No description available. Sampling (Statistics) Statistical methods Surveys
75	Multi-period value-at-risk scaling rules: calculations and approximations. / CUHK electronic theses & dissertations collection January 2011 (has links) Zhou, Pengpeng. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 76-89). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Probabilities Risk assessment--Statistical methods
76	Fault probability and confidence interval estimation of random defects seen in integrated circuit processing Hu, David T. 11 September 2003 (has links) Various methods of estimating the fault probabilities based on defect data of random defects seen in integrated circuit manufacturing are examined. Estimates of fault probabilities based on defect data are less costly than those based on critical area analysis and are potentially more reliable because they are based on actual manufacturing data. Due to limited sample size, means of estimating the confidence interval associated with these estimates are also examined. Because the mathematical expressions associated with defect data-based estimates of the fault probabilities are not amenable to analytical means of obtaining confidence intervals, bootstrapping was employed. The results show that one method of estimating the fault probabilities based on defect data proposed previously is not applicable when using typical in-line data. Furthermore, the results indicate that under typical fab conditions, the assumption of a Poisson random defect distribution gives accurate fault probabilities. The yields as predicted by the fault probabilities estimated from the limited yield concept and kill ratio and those estimated from critical area simulation are shown to be comparable to actual yields observed in the fab. It is also shown that with in-line data, the FP estimated for a given inspection step is a weighted average of the fault probabilities of the defect mechanisms operating at that inspection step. Four bootstrapped based methods of confidence interval estimation for fault probabilities of random defects are examined. The study is based on computer simulation of randomly distributed defects with pre-assigned fault probabilities on dice and the resulting count of different categories of die. The results show that all four methods perform well when the number of fatal defects is reasonably high but deteriorate in performance as the number of fatal defects decrease. The results also show that the BCA (bias-corrected and accelerated) method is more likely to succeed with a smaller number of fatal defects. This success is attributed to its ability to account for change of the standard deviation of the sampling distribution of the FP estimates with the PP of the population, and to account for median bias in the sampling distribution. / Graduation date: 2004
77	An Evaluation of Projection Techniques for Document Clustering: Latent Semantic Analysis and Independent Component Analysis Jonathan L. Elsas 6 July 2005 (has links) Dimensionality reduction in the bag-of-words vector space document representation model has been widely studied for the purposes of improving accuracy and reducing computational load of document retrieval tasks. These techniques, however, have not been studied to the same degree with regard to document clustering tasks. This study evaluates the effectiveness of two popular dimensionality reduction techniques for clustering, and their effect on discovering accurate and understandable topical groupings of documents. The two techniques studied are Latent Semantic Analysis and Independent Component Analysis, each of which have been shown to be effective in the past for retrieval purposes.
78	A Granger causality approach to gene regulatory network reconstructionbased on data from multiple experiments Tam, Hak-fui., 譚克奎. January 2012 (has links) The discovery of gene regulatory network (GRN) using gene expression data is one of the promising directions for deciphering biological mechanisms, which underlie many basic aspects of scientific and medical advances. In this thesis, we focus on the reconstruction of GRN from time-series data using a Granger causality (GC) approach. As there is little existing research on combining data from multiple time-series experiments, we identify the need for developing a methodology with underlying theory to combine multiple experiments for statistical significant discovery. We derive a statistical theory for intersection of two discovered networks. Such a statistical framework is novel and intended for our GRN discovery problem. However, this theory is not limited to GRN or GC, and may be applied to other problems as long as one can take the intersection of discoveries obtained from multiple experiments (or datasets). We propose a number of novel methods for combining data from multiple experiments. Our single underlying model (SUM) method regresses data of multiple experiments in one go, enabling GC to fully utilize the information in the original data. Based on our statistical theory and SUM, we develop new meta-analysis methods, including union of pairwise common edges (UPCE) and leave-one-out hybrid of SUM and UPCE (LOOHSU). Applications on synthetic data and real data show that our new methods give discoveries of substantially higher precision than traditional meta-analysis. We also propose methods for estimating the precision of GC-discovered networks and thus fill in an important gap not considered in the literature. This allows us to assess how good a discovered network is in the case of unknown ground truth, which is typical in most biological applications. Our precision estimation by half-half splitting with combinations (HHSC) gives an estimate much closer to the true value compared with that computed from the Benjamini-Hochberg false discovery rate controlling procedure. Furthermore, using a network covering notion, we design a method that can identify a small number of links with high precision of around 0.8-0.9, which may relieve the burden of testing many hypothetical interactions of low precision in biological experiments. For the situation where the number of genes is much larger than the data length, in which case full-model GC cannot be applied, GC is often applied to the genes pairwisely. We analyze how spurious causalities (false discoveries) may arise. Consequently, we demonstrate that model validation can effectively remove spurious discoveries. With our proposed implementation that model orders are fixed by the Akaike information criterion and every model is subject to validation, we report a new observation that network hubs tend to act as sources rather than receivers of interactions. / published_or_final_version / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy
79	Design and analysis of household studies of influenza Klick, Brendan. January 2013 (has links) Background: Influenza viruses cause substantial mortality and morbidity both worldwide and in Hong Kong. Furthermore, the possible emergence of future influenza pandemics remains a major threat to public health. Some studies have estimated that one third of all influenza transmission occurs in households. Household studies have been an important means of studying influenza transmissions and evaluating the efficacy of influenza control measures including vaccination, antiviral therapy and prophylaxis and non-pharmaceutical interventions. Household studies of influenza can be categorized as pertaining to one of two designs: household cohort and case-ascertained. In household cohort studies households are recruited before the start of an influenza season and then monitored during the influenza season for influenza infection. In case-ascertained studies a household is enrolled once influenza infection is identified in a household member. Objectives: This thesis comprises of two parts. The objective of the first part is to evaluate the resource efficiency of different designs for conducting household studies. The objective of the second part is to estimate community and household transmission parameters during the 2009 A(H1N1) pandemic in Hong Kong. Methods: Monte Carlo simulation parameterized with data from influenza studies in Hong Kong was used to compare the resource efficiency of competing study designs evaluating the efficacy of an influenza control intervention. Approaches to ascertaining infections in different types of studies, and their implications for resource efficiency were compared. With regard to the second part, extended Longini-Koopman models within a Bayesian framework were used on data from a Hong Kong household cohort study conducted from December 2008 to October 2009. Household and community transmission parameters were estimated by age-groups for two seasonal influenza strains circulating in the winter of 2008-09 and two seasonal and one pandemic strain circulating in the summer of 2009. Results: Simulations showed that RT-PCR outperformed both serology and self-report of symptoms as a resource efficient means of identifying influenza in household studies. Identification of influenza using self-report of symptomatology performed particularly poorly in terms of resource efficiency due to its low sensitivity and specificity when compared to laboratory methods. Case-ascertained studies appeared more resource efficient than cohort studies but the results were sensitive to the choice of parameter values particularly the serial interval of influenza. In statistical analyses of household data during the winter of 2008-09, it was found that transmissibility of seasonal influenza strains were similar to those previously reported in the literature. Analysis also showed for the summer 2009 the estimates of household transmissibility were similar for seasonal A(H3N2) and pandemic A(H1N1) especially after taking into account that some individuals were likely immune to infection. Conclusions: Careful consideration of study design can ensure that studies are resource efficient and have sufficient statistical power. Data from a household study suggested that during 2009 seasonal and pandemic influenza had similar transmission patterns. / published_or_final_version / Community Medicine / Doctoral / Doctor of Philosophy Influenza - Statistical methods. Household surveys.
80	Investigating statistical techniques to infer interwell connectivity from production and injection rate fluctuations Al-Yousef, Ali Abdallah 28 August 2008 (has links) Not available / text

Search results