Global ETD Search

1	Searching Genome-wide Disease Association Through SNP Data Guo, Xuan 11 August 2015 (has links) Taking the advantage of the high-throughput Single Nucleotide Polymorphism (SNP) genotyping technology, Genome-Wide Association Studies (GWASs) are regarded holding promise for unravelling complex relationships between genotype and phenotype. GWASs aim to identify genetic variants associated with disease by assaying and analyzing hundreds of thousands of SNPs. Traditional single-locus-based and two-locus-based methods have been standardized and led to many interesting findings. Recently, a substantial number of GWASs indicate that, for most disorders, joint genetic effects (epistatic interaction) across the whole genome are broadly existing in complex traits. At present, identifying high-order epistatic interactions from GWASs is computationally and methodologically challenging. My dissertation research focuses on the problem of searching genome-wide association with considering three frequently encountered scenarios, i.e. one case one control, multi-cases multi-controls, and Linkage Disequilibrium (LD) block structure. For the first scenario, we present a simple and fast method, named DCHE, using dynamic clustering. Also, we design two methods, a Bayesian inference based method and a heuristic method, to detect genome-wide multi-locus epistatic interactions on multiple diseases. For the last scenario, we propose a block-based Bayesian approach to model the LD and conditional disease association simultaneously. Experimental results on both synthetic and real GWAS datasets show that the proposed methods improve the detection accuracy of disease-specific associations and lessen the computational cost compared with current popular methods. Algorithm GWAS SNP analysis epistasis clustering Bayesian Theory
2	Classificação de lesões de mama contendo microcalcificações associadas / Detection and characterization of mammographic microcalcifications Ferrari, Ricardo José 29 September 1998 (has links) Neste trabalho é proposto um sistema computacional para auxílio ao diagnóstico de câncer de mama com microcalcificações associadas. O sistema é composto por 3 etapas principais: segmentação, extração de atributos e classificação. Na etapa de segmentação, a região suspeita do mamograma digitalizado (ROI - região de interesse) é processada para isolar as microcalcificações das estruturas normais da imagem. O resultado final é uma imagem binária contendo apenas as microcalcificações. Nesta etapa são utilizadas três técnicas combinadas: linearização do histograma da imagem (\"stretching\"), imagem-diferença e \"thresholding\" adaptativo. Na etapa de extração de atributos, são realizadas 34 medidas: 13 medidas de textura, calculadas da ROI da imagem não segmentada, 19 medidas de forma das microcalcificações, 1 medida de distribuição e 1 de quantidade das microcalcificações, calculadas da ROI da imagem segmentada. A partir dos métodos erro de Bayes e distância JM, são separados os 6 melhores atributos para compor o vetor de atributos utilizado na classificação. Na etapa de classificação, são avaliadas três diferentes classificadores: Regra de Bayes (método paramétrico), k-NN (método não-paramétrico) e Rede Neural Artificial - tipo MLP (Perceptron multi-camadas). Os classificadores são treinados e testados com diferentes grupos de amostras, utilizando a técnica \"leave-k-out\". Por fim, os resultados obtidos em cada etapa são apresentados e discutidos a partir de tabelas e curvas ROC. / In the present work, a computerized system has been proposed to aid in the diagnosis of breast cancer with associated microcalcifications. The system is composed of 3 main stages: segmentation, features extraction, and classification. In the segmentation stage, the suspected region in the digitized mammogram is processed to isolate the microcalcifications from the normal structures of the image. The final result is a binary image which has only microcalcifications. At this stage three combined techniques have been used: the stretching method, the image-difference and a thresholding adaptive method. At the feature extraction stage, 34 measurements were implemented: 13 of texture, calculated from the ROI of the original image, 19 of shape, 1 of distribution and 1 measure related of the number of the microcalcification. To compose the feature space, a subset of the six best features were evaluated using the Bayes error and Jeffreis-Matusita methods. In the classification stage, three classifiers were evaluated: the Bayes roles (parametric method), the k-Nearest Neighbour (non-parametric method), and a MLP (Multi-Iayer perceptron) Artifitial Neural Network. The classifiers were trained and tested with different sample groups using the leave-k-out method. The final results obtained at each stage are shown and discussed using the receiver operating characteristic (ROC) curves and tables. Bayesian theory Breast cancer Câncer de mama Detecção de microcalcificações Detection of microcalcifications Teoria Bayesiana
3	Modelling operational risk using skew t-copulas and Bayesian inference Garzon Rozo, Betty Johanna January 2016 (has links) Operational risk losses are heavy tailed and are likely to be asymmetric and extremely dependent among business lines/event types. The analysis of dependence via copula models has been focussed on the bivariate case mainly. In the vast majority of instances symmetric elliptical copulas are employed to model dependence for severities. This thesis proposes a new methodology to assess, in a multivariate way, the asymmetry and extreme dependence between severities, and to calculate the capital for operational risk. This methodology simultaneously uses (i) several parametric distributions and an alternative mixture distribution (the Lognormal for the body of losses and the generalised Pareto Distribution for the tail) using a technique from extreme value theory, (ii) the multivariate skew t-copula applied for the first time across severities and (iii) Bayesian theory. The former to model severities, I test simultaneously several parametric distributions and the mixture distribution for each business line. This procedure enables me to achieve multiple combinations of the severity distribution and to find which fits most closely. The second to effectively model asymmetry and extreme dependence in high dimensions. The third to estimate the copula model, given the high multivariate component (i.e. eight business lines and seven event types) and the incorporation of mixture distributions it is highly difficult to implement maximum likelihood. Therefore, I use a Bayesian inference framework and Markov chain Monte Carlo simulation to evaluate the posterior distribution to estimate and make inferences of the parameters of the skew t-copula model. The research analyses an updated operational loss data set, SAS® Operational Risk Global Data (SAS OpRisk Global Data), to model operational risk at international financial institutions. I then evaluate the impact of this multivariate, asymmetric and extreme dependence on estimating the total regulatory capital, among other established multivariate copulas. My empirical findings are consistent with other studies reporting thin and medium-tailed loss distributions. My approach substantially outperforms symmetric elliptical copulas, demonstrating that modelling dependence via the skew t-copula provides a more efficient allocation of capital charges of up to 56% smaller than that indicated by the standard Basel model.
4	Learning a Multiview Weighted Majority Vote Classifier : Using PAC-Bayesian Theory and Boosting / Apprentissage de vote de majorité pour la classification multivue : Utilisation de la théorie PAC-Bayésienne et du boosting Goyal, Anil 23 October 2018 (has links) La génération massive de données, nous avons de plus en plus de données issues de différentes sources d’informations ayant des propriétés hétérogènes. Il est donc important de prendre en compte ces représentations ou vues des données. Ce problème d'apprentissage automatique est appelé apprentissage multivue. Il est utile dans de nombreux domaines d’applications, par exemple en imagerie médicale, nous pouvons représenter le cerveau humains via des IRM, t-fMRI, EEG, etc. Dans cette cette thèse, nous nous concentrons sur l’apprentissage multivue supervisé, où l’apprentissage multivue est une combinaison de différents modèles de classifications ou de vues. Par conséquent, selon notre point de vue, il est intéressant d’aborder la question de l’apprentissage à vues multiples dans le cadre PAC-Bayésien. C’est un outil issu de la théorie de l’apprentissage statistique étudiant les modèles s’exprimant comme des votes de majorité. Un des avantages est qu’elle permet de prendre en considération le compromis entre précision et diversité des votants, au cœur des problématiques liées à l’apprentissage multivue. La première contribution de cette thèse étend la théorie PAC-Bayésienne classique (avec une seule vue) à l’apprentissage multivue (avec au moins deux vues). Pour ce faire, nous définissons une hiérarchie de votants à deux niveaux: les classifieurs spécifiques à la vue et les vues elles-mêmes. Sur la base de cette stratégie, nous avons dérivé des bornes en généralisation PAC-Bayésiennes (probabilistes et non-probabilistes) pour l’apprentissage multivue. D'un point de vue pratique, nous avons conçu deux algorithmes d'apprentissage multivues basés sur notre stratégie PAC-Bayésienne à deux niveaux. Le premier algorithme appelé PB-MVBoost est un algorithme itératif qui apprend les poids sur les vues en contrôlant le compromis entre la précision et la diversité des vues. Le second est une approche de fusion tardive où les prédictions des classifieurs spécifiques aux vues sont combinées via l’algorithme PAC-Bayésien CqBoost proposé par Roy et al. Enfin, nous montrons que la minimisation des erreurs pour le vote de majorité multivue est équivalente à la minimisation de divergences de Bregman. De ce constat, nous proposons un algorithme appelé MωMvC2 pour apprendre un vote de majorité multivue. / With tremendous generation of data, we have data collected from different information sources having heterogeneous properties, thus it is important to consider these representations or views of the data. This problem of machine learning is referred as multiview learning. It has many applications for e.g. in medical imaging, we can represent human brain with different set of features for example MRI, t-fMRI, EEG, etc. In this thesis, we focus on supervised multiview learning, where we see multiview learning as combination of different view-specific classifiers or views. Therefore, according to our point of view, it is interesting to tackle multiview learning issue through PAC-Bayesian framework. It is a tool derived from statistical learning theory studying models expressed as majority votes. One of the advantages of PAC-Bayesian theory is that it allows to directly capture the trade-off between accuracy and diversity between voters, which is important for multiview learning. The first contribution of this thesis is extending the classical PAC-Bayesian theory (with a single view) to multiview learning (with more than two views). To do this, we considered a two-level hierarchy of distributions over the view-specific voters and the views. Based on this strategy, we derived PAC-Bayesian generalization bounds (both probabilistic and expected risk bounds) for multiview learning. From practical point of view, we designed two multiview learning algorithms based on our two-level PAC-Bayesian strategy. The first algorithm is a one-step boosting based multiview learning algorithm called as PB-MVBoost. It iteratively learns the weights over the views by optimizing the multiview C-Bound which controls the trade-off between the accuracy and the diversity between the views. The second algorithm is based on late fusion approach where we combine the predictions of view-specific classifiers using the PAC-Bayesian algorithm CqBoost proposed by Roy et al. Finally, we show that minimization of classification error for multiview weighted majority vote is equivalent to the minimization of Bregman divergences. This allowed us to derive a parallel update optimization algorithm (referred as MωMvC2) to learn our multiview weighted majority vote. Apprentissage multivue Théorie PAC-Bayésienne Votes de majorité Multiview Learning PAC-Bayesian Theory Boosting Majority Vote
5	A Sector-Specific Multi-Factor Alpha Model- With Application in Taiwan Stock Market Chen, Ting-Hsuan 27 June 2011 (has links) This study constructs a quantitative stock selection model across multiple sectors with the application of the Bayesian method. It employees factors from the Taiwan stock market which could explain stock returns. Under this structure, each sector that has different significant factors is allowed to be imported into sub models. The factors are calculated into alpha scores and used to do stock selection. Therefore, the demonstration of both intra and inter-sector alpha scores into sector-specific integration alpha scores is an important concept in this study. Furthermore, an enhanced index fund is built based on the model and related to the benchmark to illustrate the power of this model. Once the contents of a portfolio are decided, this model could provide stock selection criterion based on the predictive power of stock return. Finally, the results demonstrate that this model is practical and flexible for local stock portfolio analysis. Quantitative Portfolio Management Multi-Factor Model Bayesian Theory Enhanced Index Fund Alpha Model
6	Classificação de lesões de mama contendo microcalcificações associadas / Detection and characterization of mammographic microcalcifications Ricardo José Ferrari 29 September 1998 (has links) Neste trabalho é proposto um sistema computacional para auxílio ao diagnóstico de câncer de mama com microcalcificações associadas. O sistema é composto por 3 etapas principais: segmentação, extração de atributos e classificação. Na etapa de segmentação, a região suspeita do mamograma digitalizado (ROI - região de interesse) é processada para isolar as microcalcificações das estruturas normais da imagem. O resultado final é uma imagem binária contendo apenas as microcalcificações. Nesta etapa são utilizadas três técnicas combinadas: linearização do histograma da imagem (\"stretching\"), imagem-diferença e \"thresholding\" adaptativo. Na etapa de extração de atributos, são realizadas 34 medidas: 13 medidas de textura, calculadas da ROI da imagem não segmentada, 19 medidas de forma das microcalcificações, 1 medida de distribuição e 1 de quantidade das microcalcificações, calculadas da ROI da imagem segmentada. A partir dos métodos erro de Bayes e distância JM, são separados os 6 melhores atributos para compor o vetor de atributos utilizado na classificação. Na etapa de classificação, são avaliadas três diferentes classificadores: Regra de Bayes (método paramétrico), k-NN (método não-paramétrico) e Rede Neural Artificial - tipo MLP (Perceptron multi-camadas). Os classificadores são treinados e testados com diferentes grupos de amostras, utilizando a técnica \"leave-k-out\". Por fim, os resultados obtidos em cada etapa são apresentados e discutidos a partir de tabelas e curvas ROC. / In the present work, a computerized system has been proposed to aid in the diagnosis of breast cancer with associated microcalcifications. The system is composed of 3 main stages: segmentation, features extraction, and classification. In the segmentation stage, the suspected region in the digitized mammogram is processed to isolate the microcalcifications from the normal structures of the image. The final result is a binary image which has only microcalcifications. At this stage three combined techniques have been used: the stretching method, the image-difference and a thresholding adaptive method. At the feature extraction stage, 34 measurements were implemented: 13 of texture, calculated from the ROI of the original image, 19 of shape, 1 of distribution and 1 measure related of the number of the microcalcification. To compose the feature space, a subset of the six best features were evaluated using the Bayes error and Jeffreis-Matusita methods. In the classification stage, three classifiers were evaluated: the Bayes roles (parametric method), the k-Nearest Neighbour (non-parametric method), and a MLP (Multi-Iayer perceptron) Artifitial Neural Network. The classifiers were trained and tested with different sample groups using the leave-k-out method. The final results obtained at each stage are shown and discussed using the receiver operating characteristic (ROC) curves and tables. Câncer de mama Detecção de microcalcificações Teoria Bayesiana Bayesian theory Breast cancer Detection of microcalcifications
7	Modelos de predição utilizando lógica fuzzy : uma abordagem inspirada na inferência bayesiana / Prediction models using fuzzy logic : an approach inspired in the bayesian inference Bacani, Felipo, 1985- 20 August 2018 (has links) Orientador: Laécio Carvalho de Barros / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica / Made available in DSpace on 2018-08-20T13:51:43Z (GMT). No. of bitstreams: 1 Bacani_Felipo_M.pdf: 2024723 bytes, checksum: badf6f6540880c17a7c2ffbf9d211db0 (MD5) Previous issue date: 2012 / Resumo: O presente trabalho tem por objetivo aplicar a teoria de conjuntos fuzzy a modelos de predição (inferência) de dados. O modelo utilizado baseia-se fortemente nas relações fuzzy em espaços contínuos (caso não matricial) e na regra de inferência modus ponens, utilizando t-normas (que neste contexto são similares à operação de cópula em estatística). é do modus ponens que surge o caráter \condicional" de alguns dos termos envolvidos, e a partir daí é que a analogia com a inferência bayesiana é feita. Entretanto, são apenas analogias conceituais: o presente trabalho não lida com nenhuma distribuição de probabilidades. Na verdade, conjuntos fuzzy são tratados como distribuições de possibilidades. A metodologia proposta é utilizada com o objetivo de tornar mais precisa a previsão de um especialista, levando em conta um registro histórico sobre o problema. Ou seja, melhorar a previsão do especialista levando em conta o que ocorreu com as previsões anteriores. Para testar a metodologia, utilizou-se dados meteorológicos de temperatura e umidade provenientes de lavouras de café. Os dados foram gentilmente cedidos pelo CEPAGRI/Unicamp. Os testes foram avaliados através de dois indicadores estatísticos, 'D' de Willmott e MAPE (Mean Absolute Percentage Error), mostrarando que a metodologia foi capaz de melhorar a previsão do especialista na maioria das situações estudadas / Abstract: This work aims to apply Fuzzy set theory in forecasting models. The modeling methodology is largely based on continuous fuzzy relations and in the modus ponens, using t-norms (that in this context are similar to the copula operations in statistics). It is from the modus ponens that arises the \conditional" interpretation of some of the terms involved, and it is from there that an analogy with the Bayesian inference is made. However, it is only a conceptual analogy: this work do not involve probability distributions. Actually, fuzzy sets are treated as possibility distributions. The methodology is used to improve the accuracy of expert forecasting considering a historic data. Namely, to improve expert prediction based on past performance. To evaluate the test the methodology, temperature and humidity data from coffee crop was used. The data was gently provided by CEPAGRI/Unicamp. Results were validated using two different statistic indicators, MAPE (mean absolute percentage error) and Willmott 'D', showing that the methodology was able to improve the expert prediction in most cases / Mestrado / Matematica Aplicada / Mestre em Matemática Aplicada Conjuntos fuzzy Biomatemática Fuzzy sets Bayesian theory of statistical decision Biomathematics
8	Um modelo sistemico para atividade de avaliação e testes de software / A systematic model for evaluation activity and software testing Silva, Eduardo de Vasconcelos 24 February 2006 (has links) Orientador: Ana Cervigni Guerra, Rogerio Drummond / Dissertação (mestrado profissional) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-07T22:07:00Z (GMT). No. of bitstreams: 1 Silva_EduardodeVasconcelos_M.pdf: 694246 bytes, checksum: 7be11f3a0bafc2a11d2340ac3bdcdc26 (MD5) Previous issue date: 2006 / Resumo: A atividade da escrita de casos de testes para software é em sua essência não determinística e por conseguinte envolve riscos. Por outro lado, o grau de densidade de testes dos requisitos do sistema sofre influência da forma que esses requisitos são interpretados. Uma proposta para sistematizar o processo de criação de casos de teste encontra-se no uso de uma rede Bayesiana que modela a arquitetura de testes a ser implementada, aliada a uma ponderação estatística de riscos a cenários. A rede é muito adequada a problemas não determinísticos e que envolvam riscos. Aliada ao uso da rede Bayesiana, essa sistematização engloba o critério de adequação, cujo objetivo é minimizar o impacto da cobertura de requisitos. Cada requisito do sistema é interpretado segundo critérios pré definidos. Como produto final desta proposta sistêmica, além do ferramental gráfico que possibilita a descrição dos casos de testes segundo uma seqüência lógica e simulação de cenários, têm-se uma matriz que reúne todos os casos de testes obtidos da rede e demais oriundos da análise dos requisitos, segundo o enfoque do critério de adequação. O estudo experimental sinaliza um incremento de quatro vezes e meio na densidade de testes de requisitos comparativamente a técnica tradicional. Na fase de análise há uma tendência de redução de esforço em torno de um quarto. Um interessante resultado dessa técnica sistematizada, está na identificação de cenários não previstos pelos requisitos o que vem agregar na atualização da documentação de design / Abstract: The task of writing software use cases is in essence non deterministic and therefore involves risks. On the other hand, the coverage level of requirements depends the way of document requirements are interpreted.In order to bring a systematic approach in developing software use cases, the Bayesian network technique helps supporting with this problem. The current software test architecture is modeled in a graphical way, adding scenario simulation and risk statistic. The network addresses very well risks and non deterministic scenarios.Along with the Bayesian Network , the proposed systematic approach encompasses the adequacy criteria which main goal is to improve requirements coverage. Each system requirement is interpreted according to well defined criteria. A matrix will be the final product, as a result from the current systematic approach. All use cases obtained from the network and those discovered thanks of adequacy criteria can be seen together. In addition this, the technique allows scenarios simulation. The experimental results show an increase of four times and half in requirements coverage, as compared against the traditional technique. In the analysis phase , there is a trend of reducing the effort by a factor of 4.One interesting result using this technique is the amount of new scenarios identified. Most of them are not properly described in the requirement document. This brings a powerful tool to add value in updating the design document / Mestrado / Engenharia de Software / Mestre em Ciência da Computação Software - Testes Software - Avaliação Engenharia de software Software - Testing Software - Evaluation Bayesian theory of statistical decision Software engineering
9	Apprentissage de vote de majorité pour la classification supervisée et l'adaptation de domaine : Approches PAC Bayésiennes et combinaison de similarités Morvant, Emilie 18 September 2013 (has links) De nombreuses applications font appel à des méthodes d'apprentissage capables de considérer différentes sources d'information (e.g. images, son, texte) en combinant plusieurs modèles ou descriptions. Cette thèse propose des contributions théoriquement fondées permettant de répondre à deux problématiques importantes pour ces méthodes :(i) Comment intégrer de la connaissance a priori sur des informations ?(ii) Comment adapter un modèle sur des données ne suivant pas la distribution des données d'apprentissage ?Une 1ère série de résultats en classification supervisée s'intéresse à l'apprentissage de votes de majorité sur des classifieurs dans un contexte PAC-Bayésien prenant en compte un a priori sur ces classifieurs. Le 1er apport étend un algorithme de minimisation de l'erreur du vote en classification binaire en permettant l'utilisation d'a priori sous la forme de distributions alignées sur les votants. Notre 2ème contribution analyse théoriquement l'intérêt de la minimisation de la norme opérateur de la matrice de confusion de votes dans un contexte de données multiclasses. La 2nde série de résultats concerne l'AD en classification binaire : le 3ème apport combine des fonctions similarités (epsilon,gamma,tau)-Bonnes pour inférer un espace rapprochant les distributions des données d'apprentissage et de test à l'aide de la minimisation d'une borne. Notre 4ème contribution propose une analyse PAC-Bayésienne de l'AD basée sur une divergence entre distributions. Nous en dérivons des garanties théoriques pour les votes de majorité et un algorithme adapté aux classifieurs linéaires minimisant cette borne. / Many applications make use of machine learning methods able to take into account different information sources (e.g. sounds, image, text) by combining different descriptors or models. This thesis proposes a series of contributions theoretically founded dealing with two mainissues for such methods:(i) How to embed some a priori information available?(ii) How to adapt a model on new data following a distribution different from the learning data distribution? This last issue is known as domain adaptation (DA).A 1st series of contributions studies the problem of learning a majority vote over a set of voters for supervised classification in the PAC-Bayesian context allowing one to consider an a priori on the voters. Our 1st contribution extends an algorithm minimizing the error of the majority vote in binary classification by allowing the use of an a priori expressed as an aligned distribution. The 2nd analyses theoretically the interest of the minimization of the operator norm of the confusion matrix of the votes in the multiclass setting. Our 2nd series of contributions deals with DA for binary classification. The 3rd result combines (epsilon,gamma,tau)-Good similarity functions to infer a new projection space allowing us to move closer the learning and test distributions by means of the minimization of a DA bound. Finally, we propose a PAC-Bayesian analysis for DA based on a divergence between distributions. This analysis allows us to derive guarantees for learning majority votes in a DA context, and to design an algorithm specialized to linear classifiers minimizing our bound. Apprentissage Automatique Vote de majorité Théorie PAC-Bayésienne Classification supervisée Adaptation de domaine Machine Learning Majority vote PAC-Bayesian theory Supervised classification Domain Adaptation 004
10	Path reconstruction in diffusion tensor magnetic resonance imaging Song, Xin 13 July 2011 (has links) (PDF) The complicated underwater environment and the poor underwater vision make super-mini underwater cable robot hardly to be controlled. Traditionally, the manual control method by operators is adopted by this kind of robots. Unfortunately, the robots can hardly work normally in these practical circumstances. Therefore, to overcome these shortcomings and improve the abilities of these underwater cable robots, this paper proposes several improvements, including the system design, the motion controller design, three dimensional obstacle recognition and three dimensional path reconstruction technologies etc. The details are displayed as follow: (1) Super-mini underwater robot system design: several improvement schemes and important design ideas are investigated for the super-mini underwater robot.(2) Super-mini robot motion controller design: The motion controller design of underwater robot in complicated circumstance is investigated. A new adaptive neural network sliding mode controller with balanced parameter controller (ANNSMB) is proposed. Based on the theory of adaptive fuzzy sliding mode controller (AFSMC), an improved algorithm is also proposed and applied to the underwater robot. (3)Research of three dimensional underwater environment reconstructions: The algorithms and the experiments of underwater environment reconstructions are investigated. DT-MRI image processing algorithm and the theory of three dimensional obstacle reconstructions are adopted and improved for the application of the underwater robot. (4) The super-mini underwater robot path planning algorithms are investigated. [SPI:OTHER] Engineering Sciences/Other Medical Imaging MRI Imaging Underwater robot 3D Reconstruction Bayesian Theory

Search results