51 |
Mixture models for ROC curve and spatio-temporal clusteringCheam, Amay SM January 2016 (has links)
Finite mixture models have had a profound impact on the history of statistics, contributing to modelling heterogeneous populations, generalizing distributional assumptions, and lately, presenting a convenient framework for classification and clustering.
A novel approach, via Gaussian mixture distribution, is introduced for modelling receiver operating characteristic curves. The absence of a closed-form for a functional form leads to employing the Monte Carlo method. This approach performs excellently compared to the existing methods when applied to real data.
In practice, the data are often non-normal, atypical, or skewed. It is apparent that non-Gaussian distributions be introduced in order to better fit these data. Two non-Gaussian mixtures, i.e., t distribution and skew t distribution, are proposed and applied to real data.
A novel mixture is presented to cluster spatial and temporal data. The proposed model defines each mixture component as a mixture of autoregressive polynomial with logistic links. The new model performs significantly better compared to the most well known model-based clustering techniques when applied to real data. / Thesis / Doctor of Philosophy (PhD)
|
52 |
Methods for Designing and Forming Predictive Genetic TestsLu, Qing 24 June 2008 (has links)
No description available.
|
53 |
Selection of Optimal Threshold and Near-Optimal Interval Using Profit Function and ROC Curve: A Risk Management ApplicationCHEN, JINGRU January 2011 (has links)
The ongoing financial crisis has had major adverse impact on the credit market. As the financial crisis progresses, the skyrocketing unemployment rate puts more and more customers in such a position that they cannot pay back their credit debts. The deteriorating economic environment and growing pressures for revenue generation have led creditors to re-assess their existing portfolios. The credit re-assessment is to accurately estimate customers' behavior and distill information for credit decisions that differentiate bad customers from good customers. Lending institutions often need a specific rule for defining an optimal cut-off value to maximize revenue and minimize risk. In this dissertation research, I consider a problem in the broad area of credit risk management: the selection of critical thresholds, which comprises of the "optimal cut-off point" and an interval containing cut-off points near the optimal cut-off point (a "near-optimal interval"). These critical thresholds can be used in practice to adjust credit lines, to close accounts involuntarily, to re-price, etc. Better credit re-assessment practices are essential for banks to prevent loan loss in the future and restore the flow of credit to entrepreneurs and individuals. The Profit Function is introduced to estimate the optimal cut-off and the near-optimal interval, which are used to manage the credit risk in the financial industry. The credit scores of the good population and bad population are assumed from two distributions, with the same or different dispersion parameters. In a homoscedastic Normal-Normal model, a closed-form solution of optimal cut-off and some properties of optimal cut-off are provided for three possible shapes of the Profit Functions. The same methodology can be generalized to other distributions in the exponential family, including the heteroscedastic Normal-Normal Profit Function and the Gamma-Gamma Profit Function. It is shown that a Profit Function is a comprehensive tool in the selection of critical thresholds, and its solution can be found using easily implemented computing algorithms. The estimation of near-optimal interval is developed in three possible shapes of the bi-distributional Profit Function. The optimal cut-off has a closed-form formula, and the estimation results of near-optimal intervals can be simplified to this closed-form formula when the tolerance level is zero. Two nonparametric methods are introduced to estimate critical thresholds if the latent risk score is not from some known distribution. One method uses the Kernel density estimation method to derive a tabulated table, which is used to estimate the values of critical thresholds. A ROC Graphical method is also developed to estimate critical thresholds. In the theoretical portion of the dissertation, we use Taylor Series and the Delta method to develop the asymptotic distribution of the non-constrained optimal cut-off. We also use the Kernel density estimator to derive the asymptotic variance of the Profit function. / Statistics
|
54 |
Alternative Summary Indices: PLC and ASC for the Summary Receiver Operating Charcteristic (SROC) CurveZhang, Xuan 12 1900 (has links)
Thesis / Master of Science (MS)
|
55 |
Comparison of Discrimination between Logistic Model with Distance Indicator and Regularized Function for Cardiology Ultrasound in Left VentricleKao, Li-wen 08 July 2011 (has links)
Most of the cardiac structural abnormalities will be examined by echocardiography. With more understanding of heart diseases, it is commonly recognized that heart failures are closely related to left ventricular systolic and diastolic functions. This work discusses the association between gray-scale differences and the risk of heart disease from the changes in left ventricular systole and diastole of ultrasound image. Owing to the large dimension
of data matrix, following Chen (2011), we also simplify the influence factors by factor analysis and calculate factor scores to present the characteristics of subjects.
Two kinds of classification criteria are used in this work, namely logistic model with distance indicator and discriminant function. According to Guo et al. (2001), we calculate the Mahalanobis distance from each subject to the center of normal and abnormal group, then use logistic model to fit the distances for classification later. This is called logistic model with distance indicator. For the discriminant analysis, the regularized method by Friedman (1989) for estimation of covariance matrix is used, which is more flexible and can improve the covariance matrix estimates when the sample size is small. As far as the
cut-point of ROC curve, following the approach as in Hanley et al. (1982), we find the most appropriate cut-point which has good performances for both sensitivity and specificity under the same classification criteria. Then the regularized method and the cut-point of ROC curve are combined to be a new classification criterion. The results under the new
classification criterion are presented to classify normal and abnormal groups.
|
56 |
Developing a School Social Work Model for Predicting Academic Risk: School Factors and Academic AchievementLucio, Robert 21 October 2008 (has links)
The impact of school factors on academic achievement has become an important focus for school social work and revealed the need for a comprehensive school social work model that allows for the identification of critical areas to apply social work services. This study was designed to develop and test a more comprehensive school social work model. Specifically, the relationship between cumulative grade point average (GPA) and the cumulative risk index (CRI) and an additive risk index (ARI) were tested and a comparison of the two models was presented. Over 20,000 abstracts were reviewed in order to create a list of factors which have been shown in previous research to impact academic achievement. These factors were divided into the broad domains of personal factors, family factors, peer factors, school factors, and neighborhood or community factors. Factors that were placed under the school domain were tested and those factors which met all three criteria were included in the overall model. Consistent with previous research, both the CRI and ARI were shown to be related to cumulative GPA. As the number of risk factors increased, GPA decreased. After a discussion of the results, a case was made for the use of an additive risk index approach fitting more with the current state of social work. In addition, selecting cutoff points for determining risk and non-risk students was accomplished using an ROC analysis. Finally, implications for school social work practice on the macro-, meso-, and micro- levels were discussed.
|
57 |
An Analysis of Fourier Transform Infrared Spectroscopy Data to Predict Herpes Simplex Virus 1 InfectionChampion, Patrick D 20 November 2008 (has links)
The purpose of this analysis is to evaluate the usefulness of Fourier Transform Infrared (FTIR) spectroscopy in the detection of Herpes Simplex Virus 1 (hsv1) infection at an early stage. The raw absorption values were standardized to eliminate inter-sampling error. Wilcoxon-Mann-Whitney (WMW) statistic's Z score was calculated to select significant spectral regions. Partial least squares modeling was performed because of multicollinearity. Kolmogorov-Smirnov statistic showed models for healthy tissues from different time groups were not from same distribution. The additional 24 hour dataset was evaluated using the following methods. Variables were selected by WMW Z score. Difference of Composites statistic, DC, was created as a disease indicator and evaluated using area under the ROC curve, specificities, and confidence intervals using bootstrap algorithm. The specificity of DC was high, however the confidence intervals were large. Future studies are required with larger sample sizes to test this statistic's usefulness.
|
58 |
違約戶稀少時之估計條件違約機率 / Estimating Conditional PD when Defaults Number is Small唐延新, Tang,yan hsin Unknown Date (has links)
新版巴賽爾資本協定的內部評等法中,銀行可自行對借貸戶進行評分,並且根據
評分估算信用風險以提領準備金,因此估算借貸戶評分分數的違約機率(PD)是相當
重要的一環。過去估算違約機率的研究中,大多假定評分分數為離散型式,本文針對
評分分數為連續形式時,提出一種利用曲線函數來配適估計模型。估計模型是使用伽
瑪的截尾分配去配適ROC曲線函數,再利用此ROC曲線函數來估計各評分分數下的
違約機率P(D|S),在伽瑪分配中的兩參數則是用兩階段的方法求解。本文所提的估
計方法並無假設評分分數的分配,因此在數值方法中使用不同的分配、參數設定、違
約機率等,來驗證此方法的準確度與穩定度,並且與Van der Burgt (2008)、Tasche(2009)的估計方法比較。 / By the internal rating-based approach of Basel II, banks estimate borrowers' default risks to withdraw reserves independently. Hence, estimating default probability (PD) of borrowers is important. Most of previous studies estimating PD assume that evaluation scores are discrete, In this study, we use curve function to t estimation model in the condition that the evaluation scores are continuous
. We use truncated gamma distribution to t ROC curve function. And we use the ROC curve function to estimate PD of dierent scores. And use two-step method to nd the value of two parameters in gamma distribution. The estimation method in this study doesn't assume the distribution of estimation scores,so we use dierent distributions, parameters, and default probabilities to test the
accuracy and stability of this method. In the end, we also compare our methods with Van der Burgt (2008) and Tasche (2009)' methods.
|
59 |
Estudo comparativo de tres sistemas digitais sem cabo no diagnostico de caries proximais / Comparative study of three wireless digital systems for approximal caries diagnosisPontual, Andrea dos Anjos 28 February 2007 (has links)
Orientador: Francisco Haiter Neto / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Odontologia de Piracicaba / Made available in DSpace on 2018-08-09T19:20:36Z (GMT). No. of bitstreams: 1
Pontual_AndreadosAnjos_D.pdf: 4409755 bytes, checksum: 49aa0ed4836e883074310730a70bc34b (MD5)
Previous issue date: 2007 / Resumo: O objetivo, nesse trabalho, foi comparar, de forma objetiva e subjetiva, dois sistemas digitais de placa de armazenamento de fósforo (Digora Optime® e DenOptix®), com o sistema CDR Wireless®, utilizando o filme radiográfico como método radiográfico de referência. Para a análise objetiva, foram obtidas imagens radiográficas de uma escala de densidade de Alumínio utilizando os três sistemas digitais. Posteriormente, obtiveram-se os valores do pixel por meio de ferramenta apropriada do software EMAGO®/Advanced. A comparação dos valores médios do pixel dos sistemas digitais foi realizada pelos testes de Kruskal-Wallis e de comparações múltiplas de Dunn (p<0,01). Para a avaliação subjetiva, foram obtidas imagens radiográficas de 20 phantoms constituídos de dentes posteriores, as quais foram avaliadas quanto à presença de cáries por seis radiologistas. Posteriormente, os dentes foram seccionados e analisados microscopicamente para obtenção do padrão ouro. Foram realizados a análise de variância e o teste-t (p<0,05) com o objetivo de verificar a ocorrência de diferença estatisticamente significativa entre os valores de sensibilidade, especificidade, acurácia (área sob a curva ROC), valores preditivos negativos e positivos das modalidades de imagem. Os resultados mostraram diferenças significativas nos valores médios do pixel para os três sistemas digitais, sendo que o Digora Optime® (194,46) apresentou o maior valor, seguido pelo DenOptix® (168,34) e pelo CDR Wireless® (109,44). Os sistemas CDR Wireless® e Digora Optime® obtiveram maiores valores de sensibilidade em relação às demais modalidades de imagem, sendo estatisticamente significativa a diferença entre esses sistemas e o filme radiográfico convencional (p=0,032). O Digora Optime® revelou o menor valor de especificidade e acurácia, o qual foi significativamente inferior ao do filme convencional (p<0,013). O sistema digital CDR Wireless® demonstrou desempenho semelhante ao filme radiográfico Insight® na detecção de cáries proximais incipientes. Por conseguinte, no tocante à qualidade da imagem, o novo CDR Wireless® pode ser uma alternativa viável para a utilização na clínica como método auxiliar de diagnóstico / Abstract: The aim of this study was to compare, both objectively and subjectively, the radiographic image quality of two storage phosphor plate systems (Digora Optime® e DenOptix®) with the results of the new complementary metal oxide silicon system, the CDR Wireless®. For the objective analysis, radiographs of an aluminum step wedge were obtained using the tree digital systems. This analysis was carried out by pixel density measurements using the appropriate tool from the EMAGO®/Advanced software. The data of pixel measurements was analyzed statistically using Kruskal-Walils test and Dunn multiple comparisons test (p<0.01). For the subjective analysis, under in vitro and standardized conditions, twenty phantoms with posterior human teeth were radiographed using one conventional film (Insight® Kodak) and the tree digital systems. Six radiologists recorded small approximal caries lesions on a 5-point confidence scale. The presence of caries was validated histologically. Two-way analysis of variance and post hoc t-test tested differences in sensitivity, specificity, accuracy, and positive predictive and negative predictive values. Differences were considered statistically significant when p<0.05. The results showed significant differences in the pixel density values for the three digital systems, with the Digora Optime® presenting the greatest values (194.46), followed by the DenOptix® (168.34) and CDR Wireless® (109.44). The two-way analysis of variance and post hoc t-tests demonstrated that CDR Wireless® and Digora Optime® had higher sensitivity than almost all other image modalities, significantly higher than conventional film. Digora Optime® had the lowest specificity and accuracy of all systems. Statistically significant difference existed in specificity and accuracy between this system and the conventional film (p<0.05), among the others systems there were no significant differences (p>0.05). The results suggest that the performance of the new CDR Wireless® was comparable to those of the digital systems and that of the Insight® film. Therefore, regarding to image quality, the new CDR Wireless® system may be used as an alternative, in clinical activities, as a diagnostic complementary method / Doutorado / Radiologia Odontologica / Mestre em Radiologia Odontológica
|
60 |
Parâmetros bioinformáticos do contexto genômico como preditores do efeito funcional de substituições pontuais na sequência 5' UTR em genes humanos / Bioinformatic parameters of genomic context as predictors of functional impact in point substitutions of human gene 5' UTRUrioste, Eduardo Arcanjo, 1989- 22 August 2018 (has links)
Orientador: Sérgio Roberto Peres Line / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Odontologia de Piracicaba / Made available in DSpace on 2018-08-22T18:59:49Z (GMT). No. of bitstreams: 1
Urioste_EduardoArcanjo_M.pdf: 1274507 bytes, checksum: 0f7136d4dabaf0e810ad2bdf1b2ee815 (MD5)
Previous issue date: 2013 / Resumo: Estima-se que cada indivíduo carregue cerca de 120 a 430 variantes raras em regiões UTRs (Abecasis et al, 2012). Apesar da tolerância a variação na região 5' UTR, a patofisiologia de várias doenças está ligada a mutações na mesma (Cazzola & Skoda, 2000; Reynolds, 2002; Chatterjee & Pal, 2009; Wethmar et al 2010), sendo necessário o entendimento a determinação dos mecanismos regulatórios. O objetivo deste trabalho é descobrir assinaturas genéticas encontradas no contexto genômico de mutações pontuais de região 5' UTR que permitam prever o impacto funcional de outras variações pontuais na mesma região. As mutações, causadora de doença, foram selecionadas do banco de dados do Human Gene Mutation Database (HGMD) (Stenson et al, 2008); e os polimorfismos, de impacto funcional desconhecido, foram obtidos no banco de dados NHLBI Grand Opportunity Exome Sequencing Project (ESP), sendo originados do trabalho de Tenessen et al (2012). No total foram utilizadas 235 mutações e 21.542 polimorfismos. Para as variações foram calculados parâmetros de variação da estabilidade da estrutura secundária do contexto das variações (??Gfolding), presença de sítios de ligação de fatores de transcrição (JASPAR), tipo de variação (transição/transversão, tipoV), distância do início da sequência codificante (DiSC), distância do início de transcrição (DiTr) e conservação filogenética por distância de Levenshtein do contexto (Lev). A estatística foi calculada pelos testes de Wilcoxon e Binomial. A partir destes foram gerados modelos de regressão logísticos analisados através de curva ROC. Os parâmetros ??Gfolding máximo, tipoV, DiSC, e Lev permitiram a distinção significativa (? = 0,05) entres os polimorfismos e as mutações permitindo modelos explicativos, mas incompletos (área da Curva ROC 0, 772). ??Gfolding max. indicou uma relação entre as mutações e entre estruturas secundárias mais estáveis geradas pelas mesmas. Os parâmetros Lev e tipoV sugerem a origem das mutações como resultantes de hotspots. O parâmetro DiSC indicou regiões com provável funcionalidade. Apesar de não ter sido possível estabelecer relação causal entre os parâmetros e o impacto funcional das variações, encontrou-se correlações importantes / Abstract: It is estimated that each individual carries about 120 to 430 rare variante in the UTR regions (Abecasis et al, 2012). Despite the increased tolerance towards variations in 5' UTR region, the patho-phisiology of several diseases is linked to its mutations (Cazzola & Skoda, 2000; Reynolds, 2002; Chatterjee & Pal, 2009; Wethmar et al 2010). Therefore it is necessary the understanding and the determination of the regulatory elements. The objective of this study is the discovery of genetic signatures found in the genomic context of disease causing point mutations in 5' UTR, thus allowing the prediction of the functional impact of other point variations in the same region. The disease causing mutations were selected from Human Gene Mutation Database (HGMD) (Stenson et al, 2008). The polymorphisms of unknown functional impact were obtained from the NHLBI Grand Opportunity Exome Sequencing Project (ESP), originated from the work of Tenessen et al (2012). A total of 235 mutations and 21,542 polymorphisms were used. For each variation, parameters related with the differences of the variation's context folding stability (??Gfolding), presence of transcription factor binding sites (JASPAR), type of variation (transition/transversion, tipoV), distance from coding sequence start (DiSC), distance from transcription start site (DiTr) and phylogenetic conservations by distance of Levenshtein from wild type to variant context (Lev). The statistical test was done by Wilcoxon and Binomial. Logistical regressions models were generated from the parameters and its performance was evaluated by a ROC curve. The parameters maximal ??Gfolding, tipoV, logarithm of DiSC and Lev allowed a significant distinction (? = 0,05) between the groups, generating models of reasonable explanation but incomplete (area under the ROC curve 0,772). Maximal ??Gfolding showed a relationship between mutations and stable secondary structures generated by them. Lev and tipoV suggested the origin of the mutation from hotspots. The DiSC parameter identified regions with possible functionality. While it was not possible to establish any clear causal relationship between the parameters and the functional impact of the variations, important correlations were found / Mestrado / Histologia e Embriologia / Mestre em Biologia Buco-Dental
|
Page generated in 0.0407 seconds