Global ETD Search

1	Statistical Methods for High Dimensional Data in Environmental Genomics Sofer, Tamar January 2012 (has links) In this dissertation, we propose methodology to analyze high dimensional genomics data, in which the observations have large number of outcome variables, in addition to exposure variables. In the Chapter 1, we investigate methods for genetic pathway analysis, where we have a small number of exposure variables. We propose two Canonical Correlation Analysis based methods, that select outcomes either sequentially or by screening, and show that the performance of the proposed methods depend on the correlation between the genes in the pathway. We also propose and investigate criterion for fixing the number of outcomes, and a powerful test for the exposure effect on the pathway. The methodology is applied to show that air pollution exposure affects gene methylation of a few genes from the asthma pathway. In Chapter 2, we study penalized multivariate regression as an efficient and flexible method to study the relationship between large number of covariates and multiple outcomes. We use penalized likelihood to shrink model parameters to zero and to select only the important effects. We use the Bayesian Information Criterion (BIC) to select tuning parameters for the employed penalty and show that it chooses the right tuning parameter with high probability. These are combined in the “two-stage procedure”, and asymptotic results show that it yields consistent, sparse and asymptotically normal estimator of the regression parameters. The method is illustrated on gene expression data in normal and diabetic patients. In Chapter 3 we propose a method for estimation of covariates-dependent principal components analysis (PCA) and covariance matrices. Covariates, such as smoking habits, can affect the variation in a set of gene methylation values. We develop a penalized regression method that incorporates covariates in the estimation of principal components. We show that the parameter estimates are consistent and sparse, and show that using the BIC to select the tuning parameter for the penalty functions yields good models. We also propose the scree plot residual variance criterion for selecting the number of principal components. The proposed procedure is implemented to show that the first three principal components of genes methylation in the asthma pathway are different in people who did not smoke, and people who did. biostatistics Bayesian information criterion genetic pathway variable selection
2	Combinação de Características Para Segmentação em Transcrição de Locutores Neri, Leonardo Valeriano 21 February 2014 (has links) Submitted by Lucelia Lucena (lucelia.lucena@ufpe.br) on 2015-03-09T19:16:26Z No. of bitstreams: 2 DISSERTAÇÃO Leonardo Valeriano Neri.pdf: 1395784 bytes, checksum: f38db7dc7191951459624c0348b93e63 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-09T19:16:26Z (GMT). No. of bitstreams: 2 DISSERTAÇÃO Leonardo Valeriano Neri.pdf: 1395784 bytes, checksum: f38db7dc7191951459624c0348b93e63 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2014-02-21 / Neste trabalho é apresentada uma abordagem de combinação de características para a etapa de segmentação de locutores em um sistema de transcrição de locutores. Esta abordagem utiliza diferentes características acústicas extraídas da fonte de áudio com o objetivo de combinar as suas capacidades de discriminação para diferentes tipos de sons, aumentando assim, a precisão da segmentação. O Critério de Informação Bayesiana (BIC - Bayesian Information Criterion) é usado como uma medida de distância para verificar a propensão de junção de dois segmentos do áudio. Uma Rede Neural Artificial (RNA) combina as respostas obtidas por cada característica após a aplicação de um algoritmo que detecta se há mudança em um trecho do áudio. Os índices de tempo obtidos são usados como entrada da rede neural que estima o ponto de mudança do locutor no trecho de áudio. Um sistema de transcrição de locutores que inclui a abordagem proposta é desenvolvido para avaliar e comparar os resultados com os do sistema de transcrição que utiliza a abordagem clássica de segmentação de locutores Window-Growing de Chen e Gopalakrishnan, aplicada às diferentes características acústicas adotadas neste trabalho. Nos experimentos com o sistema de transcrição de locutores, uma base artificial contendo amostras com vários locutores é usada. A avaliação dos resultados da etapa de segmentação do sistema mostra um aprimoramento em ambas as taxas de perda de detecção (MDR - Miss Detection Rate) e de falsos alarmes (FAR - False Alarm Rate) se comparadas à abordagem Window-Growing. A avaliação dos resultados na etapa de agrupamento dos locutores mostra uma melhora significativa na pureza dos grupos de locutores formados, calculada como o percentual de amostras de um mesmo locutor no grupo, demostrando que os mesmos são mais homogêneos. Bayesian information criterion Segmentação de locutores Combinação de características Redes neurais artificiais
3	Segmentace mluvčích s využitím statistických metod klasifikace / Speaker Segmentation using statistical methods of classification Adamský, Aleš January 2011 (has links) The thesis discusses in detail some concepts of speech and prosody that can contribute to build a speech corpus for the speaker segmentation purpose. Moreover, the Elan multimedia annotator used for labeling is described. The theoretical part highlights some frequently used speech features such as MFCC, PLP and LPC and deals with currently most popular speech segmentation methods. Some classification algorithms are also mentioned. The practical part describes implementation of Bayesian information criterium algorithm in system for automatic speaker segmentation. For classification of speaker change point in speech, were used different speech features. The results of tests were evaluated by the graphic method of receiver operating characteristic (ROC) and his quantitative indices. As the best speech features for this system were provided MFCC and HFCC.
4	Model selection for discrete Markov random fields on graphs / Seleção de modelos para campos aleatórios Markovianos discretos sobre grafos Frondana, Iara Moreira 28 June 2016 (has links) In this thesis we propose to use a penalized maximum conditional likelihood criterion to estimate the graph of a general discrete Markov random field. We prove the almost sure convergence of the estimator of the graph in the case of a finite or countable infinite set of variables. Our method requires minimal assumptions on the probability distribution and contrary to other approaches in the literature, the usual positivity condition is not needed. We present several examples with a finite set of vertices and study the performance of the estimator on simulated data from theses examples. We also introduce an empirical procedure based on k-fold cross validation to select the best value of the constant in the estimators definition and show the application of this method in two real datasets. / Nesta tese propomos um critério de máxima verossimilhança penalizada para estimar o grafo de dependência condicional de um campo aleatório Markoviano discreto. Provamos a convergência quase certa do estimador do grafo no caso de um conjunto finito ou infinito enumerável de variáveis. Nosso método requer condições mínimas na distribuição de probabilidade e contrariamente a outras abordagens da literatura, a condição usual de positividade não é necessária. Introduzimos alguns exemplos com um conjunto finito de vértices e estudamos o desempenho do estimador em dados simulados desses exemplos. Também propomos um procedimento empírico baseado no método de validação cruzada para selecionar o melhor valor da constante na definição do estimador, e mostramos a aplicação deste procedimento em dois conjuntos de dados reais. Campos aleatórios Markovianos discretos Critério de Informação Bayesiano Discrete Markov random fields Grafos simples não-dirigidos Seleção de modelos Simlple undirected graphs
5	一種基於BIC的B-Spline節點估計方式何昕燁, Ho, Hsin Yeh Unknown Date (has links) 在迴歸分析中，若變數間具有非線性的關係時，B-Spline線性迴歸是以無母數的方式建立模型。B-Spline函數為具有節點(knots)的分段多項式，選取合適節點的位置對B-Spline的估計有重要的影響，在近年來許多的文獻中已提出一些尋找節點位置的估計方法，而本文中我們提出了一種基於Bayesian information criterion(BIC)的節點估計方式。我們想要深入了解在不同類型的迴歸函數間，各種選取節點方法的配適效果與模擬時間，並且加以比較，在使用B-Spline函數估計時，能夠使用合適的方法尋找節點。 / In regression analysis, when the relation between the response variable and the explanatory variable is nonlinear, one can use nonparametric methods to estimate the regression function. B-Spline regression is one of the popular nonparametric regression methods. B-Splines are piecewise polynomial joint at knots, and the choice of knot locations is crucial. Zhou and Shen (2001) proposed to use spatially adaptive regression splines (SARS), where the knots are estimated using a selection scheme. Dimatteo, Genovese, and Kass (2001) proposed to use Bayesian adaptive regression splines (BARS), where certain priors for knot locations are considered. In this thesis, a knot estimation method based on the Bayesian information criterion (BIC) is proposed, and simulation studies are carried out to compare BARS, SARS and the proposed BIC-based method. B-樣條節點馬可夫鏈蒙地卡羅 B-Spline knot reversible-jump Morkov chain Monte Carlo Bayesian information criterion
6	NORMAL MIXTURE AND CONTAMINATED MODEL WITH NUISANCE PARAMETER AND APPLICATIONS Fan, Qian 01 January 2014 (has links) This paper intend to find the proper hypothesis and test statistic for testing existence of bilaterally contamination when there exists nuisance parameter. The test statistic is based on method of moments estimators. Union-Intersection test is used for testing if the distribution of population can be implemented by a bilaterally contaminated normal model with unknown variance. This paper also developed a hierarchical normal mixture model (HNM) and applied it to birth weight data. EM algorithm is employed for parameter estimation and a singular Bayesian information criterion (sBIC) is applied to choose the number components. We also proposed a singular flexible information criterion which in addition involves a data-driven penalty. bilaterally contaminated normal model UnionIntersection test hierarchical normal mixture model singular Bayesian information criterion singular flexible information criterion Microarrays Multivariate Analysis Statistical Methodology Statistical Models
7	Model selection for discrete Markov random fields on graphs / Seleção de modelos para campos aleatórios Markovianos discretos sobre grafos Iara Moreira Frondana 28 June 2016 (has links) In this thesis we propose to use a penalized maximum conditional likelihood criterion to estimate the graph of a general discrete Markov random field. We prove the almost sure convergence of the estimator of the graph in the case of a finite or countable infinite set of variables. Our method requires minimal assumptions on the probability distribution and contrary to other approaches in the literature, the usual positivity condition is not needed. We present several examples with a finite set of vertices and study the performance of the estimator on simulated data from theses examples. We also introduce an empirical procedure based on k-fold cross validation to select the best value of the constant in the estimators definition and show the application of this method in two real datasets. / Nesta tese propomos um critério de máxima verossimilhança penalizada para estimar o grafo de dependência condicional de um campo aleatório Markoviano discreto. Provamos a convergência quase certa do estimador do grafo no caso de um conjunto finito ou infinito enumerável de variáveis. Nosso método requer condições mínimas na distribuição de probabilidade e contrariamente a outras abordagens da literatura, a condição usual de positividade não é necessária. Introduzimos alguns exemplos com um conjunto finito de vértices e estudamos o desempenho do estimador em dados simulados desses exemplos. Também propomos um procedimento empírico baseado no método de validação cruzada para selecionar o melhor valor da constante na definição do estimador, e mostramos a aplicação deste procedimento em dois conjuntos de dados reais. Campos aleatórios Markovianos discretos Critério de Informação Bayesiano Grafos simples não-dirigidos Seleção de modelos Discrete Markov random fields Simlple undirected graphs
8	Risk factor modeling of Hedge Funds' strategies / Risk factor modeling of Hedge Funds' strategies Radosavčević, Aleksa January 2017 (has links) This thesis aims to identify main driving market risk factors of different strategies implemented by hedge funds by looking at correlation coefficients, implementing Principal Component Analysis and analyzing "loadings" for first three principal components, which explain the largest portion of the variation of hedge funds' returns. In the next step, a stepwise regression through iteration process includes and excludes market risk factors for each strategy, searching for the combination of risk factors which will offer a model with the best "fit", based on The Akaike Information Criterion - AIC and Bayesian Information Criterion - BIC. Lastly, to avoid counterfeit results and overcome model uncertainty issues a Bayesian Model Average - BMA approach was taken. Key words: Hedge Funds, hedge funds' strategies, market risk, principal component analysis, stepwise regression, Akaike Information Criterion, Bayesian Information Criterion, Bayesian Model Averaging Author's e-mail: aleksaradosavcevic@gmail.com Supervisor's e-mail: mp.princ@seznam.cz
9	Probabilistic Diagnostic Model for Handling Classifier Degradation in Machine Learning Gustavo A. Valencia-Zapata (8082655) 04 December 2019 (has links) Several studies point out different causes of performance degradation in supervised machine learning. Problems such as class imbalance, overlapping, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms. Even though a number of approaches either in the form of a methodology or an algorithm try to minimize performance degradation, they have been isolated efforts with limited scope. This research consists of three main parts: In the first part, a novel probabilistic diagnostic model based on identifying signs and symptoms of each problem is presented. Secondly, the behavior and performance of several supervised algorithms are studied when training sets have such problems. Therefore, prediction of success for treatments can be estimated across classifiers. Finally, a probabilistic sampling technique based on training set diagnosis for avoiding classifier degradation is proposed<br> Statistics Pattern Recognition and Data Mining Class imbalance Overlapping Small-disjuncts Noisy labels Sparseness Gaussian Mixture Models Separation index Classifier degradation Bayesian Information Criterion (BIC)
10	VePMAD: A Vehicular Platoon Management Anomaly Detection System : A Case Study of Car-following Mode, Middle Join and Exit Maneuvers Bayaa, Weaam January 2021 (has links) Vehicle communication using sensors and wireless channels plays an important role to allow exchanging information. Adding more components to allow exchanging more information with infrastructure enhanced the capabilities of vehicles and enabled the rise of Cooperative Intelligent Transport Systems (C-ITS). Leveraging such capabilities, more applications such as Cooperative Adaptive Cruise Control (CACC) and platooning were introduced. CACC is an enhancement of Adaptive Cruise Control (ACC). It enables longitudinal automated vehicle control and follows the Constant Time Gap (CTG) strategy where, distance between vehicles is proportional to the speed. Platooning is different in terms of addressing both longitudinal and lateral control. In addition, it adopts the Constant Distance Gap (CDG) control strategy, with separation between vehicles unchanged with speed. Platooning requires close coupling and accordingly achieves goals of increased lane throughput and reduced energy consumption. When a longitudinal controller only is used, platooning operates in car-following mode and no Platoon Management Protocol (PMP) is used. On the other hand, when both longitudinal and lateral controllers are used, platooning operates in maneuver mode and coordination between vehicles is needed to perform maneuvers. Exchanging information allows the platoon to make real time maneuvering decisions. However, all the aforementioned benefits of platooning cannot be achieved if the system is vulnerable to misbehavior (i.e., the platoon is behaving incorrectly). Most of work in the literature attributes this misbehavior to malicious actors where an attacker injects malicious messages. Standards made efforts to develop security services to authenticate and authorize the sender. However, authenticated users equipped with cryptographic primitives can mount attacks (i.e., falsification attacks) and accordingly they cannot be detected by standard services such as cryptographic signatures. Misbehavior can disturb platoon behavior or even cause collision. Many Misbehavior Detection Schemes (MDSs) are proposed in the literature in the context of Vehicular ad hoc network (VANET) and CACC. These MDSs apply algorithms or rules to detect sudden or gradual changes of kinematic information disseminated by other vehicles. Reusing these MDSs directly during maneuvers can lead to false positives when they treat changes in kinematic information during the maneuver as an attack. This thesis addresses this gap by designing a new modular framework that has the capability to discern maneuvering process from misbehavior by leveraging platoon behavior recognition, that is, the platoon mode of operation (e.g., car-following mode or maneuver mode). In addition, it has the ability to recognize the undergoing maneuver (e.g., middle join or exit). Based on the platoon behavior recognition module, the anomaly detection module detects deviations from expected behavior. Unsupervised machine learning, notably Hidden Markov Model with Gaussian Mixture Model emission (GMMHMM), is used to learn the nominal behavior of the platoon during different modes and maneuvers. This is used later by the platoon behavior recognition and anomaly detection modules. GMMHMM is trained with nominal behavior of platoon using multivariate time series representing kinematic characteristics of the vehicles. Different models are used to detect attacks in different scenarios (e.g., different speeds). Two approaches for anomaly detection are investigated, Viterbi algorithm based anomaly detection and Forward algorithm based anomaly detection. The proposed framework managed to detect misbehavior whether the compromised vehicle is a platoon leader or follower. Empirical results show very high performance, with the platoon behavior recognition module reaching 100% in terms of accuracy. In addition, it can predict ongoing platoon behavior at early stages and accordingly, use the correct model representing the nominal behavior. Forward algorithm based anomaly detection, which rely on computing likelihood, showed better performance reaching 98% with slight variations in terms of accuracy, precision, recall and F1 score. Different platooning controllers can be resilient to some attacks and accordingly, the attack can result in slight deviation from nominal behavior. However, The anomaly detection module was able to detect this deviation. / Kommunikation mellan fordon som använder sensorer och radiokommunikation spelar en viktig roll för att kunna möjliggöra informationsutbyte. Genom att lägga till er komponenter för infrastrukturkommunikation förbättras fordonens generella kommunikationskapacitet och möjliggör C-ITS. Det möjliggör också för att introducera ytterligare applikationer, exempelvis CACC samt plutonering. CACC är en förbättring av ACC -konceptet. Denna teknik möjliggör longitudinell automatiserad fordonskontroll och följer en CTG -strategi där avståndet mellan fordon är proportionellt mot hastigheten. Plutonering är annorlunda med avseende på att hantera longitudinell och lateral kontroll. Dessutom antar den en kontrollstrategi för CDG där avståndet mellan fordon förblir oförändrat med hastighet. Plutonering kräver en nära koppling mellan fordon för att uppnå målet med ökad filgenomströmning och reducerad energikonsumtion. När enbart longitudinell kontroll är aktiverad, fungerar plutonering i bilföljande läge och funktionen PMP används inte. När både longitudinella och laterala kontroller används, arbetar plutonen istället i manöverläge och samordning mellan fordon behövs för att utföra olika manövrar. Informationsutbytet möjliggör att plutonen kan man manövrera i realtid. Alla ovan nämnda fördelar med plutonering kan emellertid inte uppnås om systemet är sårbart för felbeteende, det vill säga att plutonen beter sig fel. I litteraturen kopplas detta missförhållande till skadliga aktörer där en angripare injicerar skadliga meddelanden. I standardiseringsarbeten har man försökt utveckla säkerhetstjänster för att autentisera och auktorisera avsändaren. Trots detta kan autentiserade användare utrustade med kryptografiska primitiv upprätta förfalskningsattacker som inte detekteras av standardtjänster som kryptografiska signaturer. Felaktigt handhavande kan orsaka störningar i plutonens beteende eller till och med orsaka kollisioner och följaktligen påverka tillförlitligheten. Det finns manga MDSs beskrivna i litteraturen i relation till VANET och CACC. MDSs använder algoritmer eller regler för att detektera snabba eller långsamma förändringar kinematisk information som sprids av andra fordon. Direkt användning av MDSs under manövrar kan leda till falska positiva resultat eftersom de kommer att behandla förändringar i kinematisk information under manövern som en attack. Denna avhandling adresserar detta gap genom utformningen av ett modulärt ramverk som kan urskilja manöverprocessen från misskötsamhet genom att utnyttja plutonens beteendeigenkänningsmodul för att intelligent känna igen plutonläget (t.ex. bilföljande läge eller manöverläge). Ramverket har vidare egenskapen att känna igen pågående manövrar (frikoppling eller växelbyte) och avvikelser från förväntat beteende. Modulen använder en oövervakad maskininlärningssmodell, GMMHMM, för att lära en plutons normala beteende under olika lägen och manövrar som sedan används för plutonbeteendeigenkänning och avvikelsedetektion. GMMHMM tränas på data från plutoneringens normalbeteende i form av multivariata tidsserier som representerar fordonets kinematiska karakteristik. Olika modeller används för att upptäcka attacker i olika scenarier (t.ex. olika hastigheter). Två tillvägagångssätt för avvikelsedetektion undersöks, Viterbi-algoritmen samt Forward-algoritmen. Det föreslagna systemet lyckas upptäcka det felaktiga beteendet oavsett om det komprometterade fordonet är en plutonledare eller följare. Empiriska resultat visar mycket hög prestanda för beteendeigenkänningsmodulen som när 100%. Dessutom kan den känna igen plutonens beteende i ett tidigt skede. Resultat med Forward- algoritmen för avvikelsedetektion visar på en prestanda på 98% med små variationer med avseende på måtten accuracy, precision, recall och F1-score. Avvikelsedetektionsmodulen kan även upptäcka små avvikelser i beteende. MDS GMMHMM PMP Machine Learning Bayesian information Criterion (BIC) Platoon Behavior Recognition MDS GMMHMM PMP Maskininlärnings BIC Plutonbeteendeigenkänning Elektroteknik och elektronik

Search results