• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 146
  • 40
  • 23
  • 20
  • 7
  • 6
  • 5
  • 5
  • 3
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 305
  • 200
  • 90
  • 59
  • 52
  • 51
  • 41
  • 37
  • 36
  • 36
  • 33
  • 28
  • 27
  • 25
  • 24
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
261

Insurances against job loss and disability : Private and public interventions and their effects on job search and labor supply

Andersson, Josefine January 2017 (has links)
Essay I: Employment Security Agreements, which are elements of Swedish collective agreements, offer a unique opportunity to study very early job search counselling of displaced workers. These agreements provide individual job search assistance to workers who are dismissed due to redundancy, often as early as during the period of notice. Compared to traditional labor market policies, the assistance provided is earlier and more responsive to the needs of the individual worker. In this study, I investigate the effects of the individual counseling and job search assistance provided through the Employment Security Agreement for Swedish blue-collar workers on job finding and subsequent job quality. The empirical strategy is based on the rules of eligibility in a regression discontinuity framework. I estimate the effect for workers with short tenure, who are dismissed through mass-layoffs. My results do not suggest that the program has an effect on the probability of becoming unemployed, the duration of unemployment, or income. However, the results indicate that the program has a positive effect on the duration of the next job. Essay II: The well-known positive relationship between the unemployment benefit level and unemployment duration can be separated into two potential sources; a moral hazard effect, and a liquidity effect pertaining to the increased ability to smooth consumption. The latter is a socially optimal response due to credit and insurance market failures. These two effects are difficult to separate empirically, but the social optimality of an unemployment insurance policy can be evaluated by studying the effect of a non-distortionary lump-sum severance grant on unemployment durations. In this study, I evaluate the effects on unemployment duration and subsequent job quality of a lump-sum severance grant provided to displaced workers, by means of a Swedish collective agreement. I use a regression discontinuity design, based on the strict age requirement to be eligible for the grant. I find that the lump-sum grant has a positive effect on the probability of becoming unemployed and the length of the completed unemployment duration, but no effect on subsequent job quality. My analysis also indicates that spousal income is important for the consumption smoothing abilities of displaced workers, and that the grant may have a greater effect in times of more favorable labor market conditions. Essay III: Evidence from around the world suggest that individuals who are awarded disability benefits in some cases still have residual working capacity, while disability insurance systems typically involve strong disincentives for benefit recipients to work. Some countries have introduced policies to incentivize disability insurance recipients to use their residual working capacities on the labor market. One such policy is the continuous deduction program in Sweden, introduced in 2009. In this study, I investigate whether the financial incentives provided by this program induce disability insurance recipients to increase their labor supply or education level. Retroactively determined eligibility to the program with respect to time of benefit award provides a setting resembling a natural experiment, which could be used to estimate the effects of the program using a regression discontinuity design. However, a simultaneous regime change of disability insurance eligibility causes covariate differences between treated and controls, which I adjust for using a matching strategy. My results suggest that the financial incentives provided by the program have not had any effect on labor supply or educational attainment.
262

Classification of uncertain data in the framework of belief functions : nearest-neighbor-based and rule-based approaches / Classification des données incertaines dans le cadre des fonctions de croyance : la métode des k plus proches voisins et la méthode à base de règles

Jiao, Lianmeng 26 October 2015 (has links)
Dans de nombreux problèmes de classification, les données sont intrinsèquement incertaines. Les données d’apprentissage disponibles peuvent être imprécises, incomplètes, ou même peu fiables. En outre, des connaissances spécialisées partielles qui caractérisent le problème de classification peuvent également être disponibles. Ces différents types d’incertitude posent de grands défis pour la conception de classifieurs. La théorie des fonctions de croyance fournit un cadre rigoureux et élégant pour la représentation et la combinaison d’une grande variété d’informations incertaines. Dans cette thèse, nous utilisons cette théorie pour résoudre les problèmes de classification des données incertaines sur la base de deux approches courantes, à savoir, la méthode des k plus proches voisins (kNN) et la méthode à base de règles.Pour la méthode kNN, une préoccupation est que les données d’apprentissage imprécises dans les régions où les classes de chevauchent peuvent affecter ses performances de manière importante. Une méthode d’édition a été développée dans le cadre de la théorie des fonctions de croyance pour modéliser l’information imprécise apportée par les échantillons dans les régions qui se chevauchent. Une autre considération est que, parfois, seul un ensemble de données d’apprentissage incomplet est disponible, auquel cas les performances de la méthode kNN se dégradent considérablement. Motivé par ce problème, nous avons développé une méthode de fusion efficace pour combiner un ensemble de classifieurs kNN couplés utilisant des métriques couplées apprises localement. Pour la méthode à base de règles, afin d’améliorer sa performance dans les applications complexes, nous étendons la méthode traditionnelle dans le cadre des fonctions de croyance. Nous développons un système de classification fondé sur des règles de croyance pour traiter des informations incertains dans les problèmes de classification complexes. En outre, dans certaines applications, en plus de données d’apprentissage, des connaissances expertes peuvent également être disponibles. Nous avons donc développé un système de classification hybride fondé sur des règles de croyance permettant d’utiliser ces deux types d’information pour la classification. / In many classification problems, data are inherently uncertain. The available training data might be imprecise, incomplete, even unreliable. Besides, partial expert knowledge characterizing the classification problem may also be available. These different types of uncertainty bring great challenges to classifier design. The theory of belief functions provides a well-founded and elegant framework to represent and combine a large variety of uncertain information. In this thesis, we use this theory to address the uncertain data classification problems based on two popular approaches, i.e., the k-nearest neighbor rule (kNN) andrule-based classification systems. For the kNN rule, one concern is that the imprecise training data in class over lapping regions may greatly affect its performance. An evidential editing version of the kNNrule was developed based on the theory of belief functions in order to well model the imprecise information for those samples in over lapping regions. Another consideration is that, sometimes, only an incomplete training data set is available, in which case the ideal behaviors of the kNN rule degrade dramatically. Motivated by this problem, we designedan evidential fusion scheme for combining a group of pairwise kNN classifiers developed based on locally learned pairwise distance metrics.For rule-based classification systems, in order to improving their performance in complex applications, we extended the traditional fuzzy rule-based classification system in the framework of belief functions and develop a belief rule-based classification system to address uncertain information in complex classification problems. Further, considering that in some applications, apart from training data collected by sensors, partial expert knowledge can also be available, a hybrid belief rule-based classification system was developed to make use of these two types of information jointly for classification.
263

Estimation de régularité locale / Local regularity estimation

Servien, Rémi 12 March 2010 (has links)
L'objectif de cette thèse est d'étudier le comportement local d'une mesure de probabilité, notamment à l'aide d'un indice de régularité locale. Dans la première partie, nous établissons la normalité asymptotique de l'estimateur des kn plus proches voisins de la densité. Dans la deuxième, nous définissons un estimateur du mode sous des hypothèses affaiblies. Nous montrons que l'indice de régularité intervient dans ces deux problèmes. Enfin, nous construisons dans une troisième partie différents estimateurs pour l'indice de régularité à partir d'estimateurs de la fonction de répartition, dont nous réalisons une revue bibliographique. / The goal of this thesis is to study the local behavior of a probability measure, using a local regularity index. In the first part, we establish the asymptotic normality of the nearest neighbor density estimate. In the second, we define a mode estimator under weakened hypothesis. We show that the regularity index interferes in this two problems. Finally, we construct in a third part various estimators of the regularity index from estimators of the distribution function, which we achieve a review.
264

MiRNA and co : methodologically exploring the world of small RNAs / MiARN et compagnie : une exploration méthodologique du monde des petits ARNs

Higashi, Susan 26 November 2014 (has links)
La principale contribution de cette thèse est le développement d'une méthode fiable, robuste, et rapide pour la prédiction des pré-miARNs. Deux objectifs avaient été assignés : efficacité et flexibilité. L'efficacité a été rendue possible au moyen d'un algorithme quadratique. La flexibilité repose sur deux aspects, la nature des données expérimentales et la position taxonomique de l'organisme (en particulier plantes ou animaux). Mirinho accepte en entrée des séquences de génomes complets mais aussi les très nombreuses séquences résultant d'un séquençage massif de type NGS de “RNAseq”. “L'universalité” taxonomique est obtenu par la possibilité de modifier les contraintes sur les tailles de la tige (double hélice) et de la boule terminale. Dans le cas de la prédiction des miARN de plantes la plus grande longueur de leur pré-miARN conduit à des méthodes d'extraction de la structure secondaire en tige-boule moins précises. Mirinho prend en compte ce problème lui permettant de fournir des structures secondaires de pré-miARN plus semblables à celles de miRBase que les autres méthodes disponibles. Mirinho a été utilisé dans le cadre de deux questions biologiques précises l'une concernant des RNAseq l'autre de l'ADN génomique. La première question a conduit au traitement et l'analyse des données RNAseq de Acyrthosiphon pisum, le puceron du pois. L'objectif était d'identifier les miARN qui sont différentiellement exprimés au cours des quatre stades de développement de cette espèce et sont donc des candidats à la régulation des gènes au cours du développement. Pour cette analyse, nous avons développé un pipeline, appelé MirinhoPipe. La deuxieme question a permis d'aborder les problèmes liés à la prévision et l'analyse des ARN non-codants (ARNnc) dans la bactérie Mycoplasma hyopneumoniae. Alvinho a été développé pour la prédiction de cibles des miRNA autour d'une segmentation d'une séquence numérique et de la détection de la conservation des séquences entre ncRNA utilisant un graphe k-partite. Nous avons finalement abordé un problème lié à la recherche de motifs conservés dans un ensemble de séquences et pouvant ainsi correspondre à des éléments fonctionnels / The main contribution of this thesis is the development of a reliable, robust, and much faster method for the prediction of pre-miRNAs. With this method, we aimed mainly at two goals: efficiency and flexibility. Efficiency was made possible by means of a quadratic algorithm. Flexibility relies on two aspects, the input type and the organism clade. Mirinho can receive as input both a genome sequence and small RNA sequencing (sRNA-seq) data of both animal and plant species. To change from one clade to another, it suffices to change the lengths of the stem-arms and of the terminal loop. Concerning the prediction of plant miRNAs, because their pre-miRNAs are longer, the methods for extracting the hairpin secondary structure are not as accurate as for shorter sequences. With Mirinho, we also addressed this problem, which enabled to provide pre-miRNA secondary structures more similar to the ones in miRBase than the other available methods. Mirinho served as the basis to two other issues we addressed. The first issue led to the treatment and analysis of sRNA-seq data of Acyrthosiphon pisum, the pea aphid. The goal was to identify the miRNAs that are expressed during the four developmental stages of this species, allowing further biological conclusions concerning the regulatory system of such an organism. For this analysis, we developed a whole pipeline, called MirinhoPipe, at the end of which Mirinho was aggregated. We then moved on to the second issue, that involved problems related to the prediction and analysis of non-coding RNAs (ncRNAs) in the bacterium Mycoplasma hyopneumoniae. A method, called Alvinho, was thus developed for the prediction of targets in this bacterium, together with a pipeline for the segmentation of a numerical sequence and detection of conservation among ncRNA sequences using a kpartite graph. We finally addressed a problem related to motifs, that is to patterns, that may be composed of one or more parts, that appear conserved in a set of sequences and may correspond to functional elements.
265

Diskriminační a shluková analýza jako nástroj klasifikace objektů / Discriminant and cluster analysis as a tool for classification of objects

Rynešová, Pavlína January 2015 (has links)
Cluster and discriminant analysis belong to basic classification methods. Using cluster analysis can be a disordered group of objects organized into several internally homogeneous classes or clusters. Discriminant analysis creates knowledge based on the jurisdiction of existing classes classification rule, which can be then used for classifying units with an unknown group membership. The aim of this thesis is a comparison of discriminant analysis and different methods of cluster analysis. To reflect the distances between objects within each cluster, squeared Euclidean and Mahalanobis distances are used. In total, there are 28 datasets analyzed in this thesis. In case of leaving correlated variables in the set and applying squared Euclidean distance, Ward´s method classified objects into clusters the most successfully (42,0 %). After changing metrics on the Mahalanobis distance, the most successful method has become the furthest neighbor method (37,5 %). After removing highly correlated variables and applying methods with Euclidean metric, Ward´s method was again the most successful in classification of objects (42,0%). From the result implies that cluster analysis is more precise when excluding correlated variables than when leaving them in a dataset. The average result of discriminant analysis for data with correlated variables and also without correlated variables is 88,7 %.
266

Soluções aproximadas para algoritmos escaláveis de mineração de dados em domínios de dados complexos usando GPGPU / On approximate solutions to scalable data mining algorithms for complex data problems using GPGPU

Alexander Victor Ocsa Mamani 22 September 2011 (has links)
A crescente disponibilidade de dados em diferentes domínios tem motivado o desenvolvimento de técnicas para descoberta de conhecimento em grandes volumes de dados complexos. Trabalhos recentes mostram que a busca em dados complexos é um campo de pesquisa importante, já que muitas tarefas de mineração de dados, como classificação, detecção de agrupamentos e descoberta de motifs, dependem de algoritmos de busca ao vizinho mais próximo. Para resolver o problema da busca dos vizinhos mais próximos em domínios complexos muitas abordagens determinísticas têm sido propostas com o objetivo de reduzir os efeitos da maldição da alta dimensionalidade. Por outro lado, algoritmos probabilísticos têm sido pouco explorados. Técnicas recentes relaxam a precisão dos resultados a fim de reduzir o custo computacional da busca. Além disso, em problemas de grande escala, uma solução aproximada com uma análise teórica sólida mostra-se mais adequada que uma solução exata com um modelo teórico fraco. Por outro lado, apesar de muitas soluções exatas e aproximadas de busca e mineração terem sido propostas, o modelo de programação em CPU impõe restrições de desempenho para esses tipos de solução. Uma abordagem para melhorar o tempo de execução de técnicas de recuperação e mineração de dados em várias ordens de magnitude é empregar arquiteturas emergentes de programação paralela, como a arquitetura CUDA. Neste contexto, este trabalho apresenta uma proposta para buscas kNN de alto desempenho baseada numa técnica de hashing e implementações paralelas em CUDA. A técnica proposta é baseada no esquema LSH, ou seja, usa-se projeções em subespac¸os. O LSH é uma solução aproximada e tem a vantagem de permitir consultas de custo sublinear para dados em altas dimensões. Usando implementações massivamente paralelas melhora-se tarefas de mineração de dados. Especificamente, foram desenvolvidos soluções de alto desempenho para algoritmos de descoberta de motifs baseados em implementações paralelas de consultas kNN. As implementações massivamente paralelas em CUDA permitem executar estudos experimentais sobre grandes conjuntos de dados reais e sintéticos. A avaliação de desempenho realizada neste trabalho usando GeForce GTX470 GPU resultou em um aumento de desempenho de até 7 vezes, em média sobre o estado da arte em buscas por similaridade e descoberta de motifs / The increasing availability of data in diverse domains has created a necessity to develop techniques and methods to discover knowledge from huge volumes of complex data, motivating many research works in databases, data mining and information retrieval communities. Recent studies have suggested that searching in complex data is an interesting research field because many data mining tasks such as classification, clustering and motif discovery depend on nearest neighbor search algorithms. Thus, many deterministic approaches have been proposed to solve the nearest neighbor search problem in complex domains, aiming to reduce the effects of the well-known curse of dimensionality. On the other hand, probabilistic algorithms have been slightly explored. Recently, new techniques aim to reduce the computational cost relaxing the quality of the query results. Moreover, in large-scale problems, an approximate solution with a solid theoretical analysis seems to be more appropriate than an exact solution with a weak theoretical model. On the other hand, even though several exact and approximate solutions have been proposed, single CPU architectures impose limits on performance to deliver these kinds of solution. An approach to improve the runtime of data mining and information retrieval techniques by an order-of-magnitude is to employ emerging many-core architectures such as CUDA-enabled GPUs. In this work we present a massively parallel kNN query algorithm based on hashing and CUDA implementation. Our method, based on the LSH scheme, is an approximate method which queries high-dimensional datasets with sub-linear computational time. By using the massively parallel implementation we improve data mining tasks, specifically we create solutions for (soft) realtime time series motif discovery. Experimental studies on large real and synthetic datasets were carried out thanks to the highly CUDA parallel implementation. Our performance evaluation on GeForce GTX 470 GPU resulted in average runtime speedups of up to 7x on the state-of-art of similarity search and motif discovery solutions
267

Bank Customer Churn Prediction : A comparison between classification and evaluation methods

Tandan, Isabelle, Goteman, Erika January 2020 (has links)
This study aims to assess which supervised statistical learning method; random forest, logistic regression or K-nearest neighbor, that is the best at predicting banks customer churn. Additionally, the study evaluates which cross-validation set approach; k-Fold cross-validation or leave-one-out cross-validation that yields the most reliable results. Predicting customer churn has increased in popularity since new technology, regulation and changed demand has led to an increase in competition for banks. Thus, with greater reason, banks acknowledge the importance of maintaining their customer base.   The findings of this study are that unrestricted random forest model estimated using k-Fold is to prefer out of performance measurements, computational efficiency and a theoretical point of view. Albeit, k-Fold cross-validation and leave-one-out cross-validation yield similar results, k-Fold cross-validation is to prefer due to computational advantages.   For future research, methods that generate models with both good interpretability and high predictability would be beneficial. In order to combine the knowledge of which customers end their engagement as well as understanding why. Moreover, interesting future research would be to analyze at which dataset size leave-one-out cross-validation and k-Fold cross-validation yield the same results.
268

Machine learning techniques for content-based information retrieval / Méthodes d’apprentissage automatique pour la recherche par le contenu de l’information

Chafik, Sanaa 22 December 2017 (has links)
Avec l’évolution des technologies numériques et la prolifération d'internet, la quantité d’information numérique a considérablement évolué. La recherche par similarité (ou recherche des plus proches voisins) est une problématique que plusieurs communautés de recherche ont tenté de résoudre. Les systèmes de recherche par le contenu de l’information constituent l’une des solutions prometteuses à ce problème. Ces systèmes sont composés essentiellement de trois unités fondamentales, une unité de représentation des données pour l’extraction des primitives, une unité d’indexation multidimensionnelle pour la structuration de l’espace des primitives, et une unité de recherche des plus proches voisins pour la recherche des informations similaires. L’information (image, texte, audio, vidéo) peut être représentée par un vecteur multidimensionnel décrivant le contenu global des données d’entrée. La deuxième unité consiste à structurer l’espace des primitives dans une structure d’index, où la troisième unité -la recherche par similarité- est effective.Dans nos travaux de recherche, nous proposons trois systèmes de recherche par le contenu de plus proches voisins. Les trois approches sont non supervisées, et donc adaptées aux données étiquetées et non étiquetées. Elles sont basées sur le concept du hachage pour une recherche efficace multidimensionnelle des plus proches voisins. Contrairement aux approches de hachage existantes, qui sont binaires, les approches proposées fournissent des structures d’index avec un hachage réel. Bien que les approches de hachage binaires fournissent un bon compromis qualité-temps de calcul, leurs performances en termes de qualité (précision) se dégradent en raison de la perte d’information lors du processus de binarisation. À l'opposé, les approches de hachage réel fournissent une bonne qualité de recherche avec une meilleure approximation de l’espace d’origine, mais induisent en général un surcoût en temps de calcul.Ce dernier problème est abordé dans la troisième contribution. Les approches proposées sont classifiées en deux catégories, superficielle et profonde. Dans la première catégorie, on propose deux techniques de hachage superficiel, intitulées Symmetries of the Cube Locality sensitive hashing (SC-LSH) et Cluster-Based Data Oriented Hashing (CDOH), fondées respectivement sur le hachage aléatoire et l’apprentissage statistique superficiel. SCLSH propose une solution au problème de l’espace mémoire rencontré par la plupart des approches de hachage aléatoire, en considérant un hachage semi-aléatoire réduisant partiellement l’effet aléatoire, et donc l’espace mémoire, de ces dernières, tout en préservant leur efficacité pour la structuration des espaces hétérogènes. La seconde technique, CDOH, propose d’éliminer l’effet aléatoire en combinant des techniques d’apprentissage non-supervisé avec le concept de hachage. CDOH fournit de meilleures performances en temps de calcul, en espace mémoire et en qualité de recherche.La troisième contribution est une approche de hachage basée sur les réseaux de neurones profonds appelée "Unsupervised Deep Neuron-per-Neuron Hashing" (UDN2H). UDN2H propose une indexation individuelle de la sortie de chaque neurone de la couche centrale d’un modèle non supervisé. Ce dernier est un auto-encodeur profond capturant une structure individuelle de haut niveau de chaque neurone de sortie.Nos trois approches, SC-LSH, CDOH et UDN2H, ont été proposées séquentiellement durant cette thèse, avec un niveau croissant, en termes de la complexité des modèles développés, et en termes de la qualité de recherche obtenue sur de grandes bases de données d'information / The amount of media data is growing at high speed with the fast growth of Internet and media resources. Performing an efficient similarity (nearest neighbor) search in such a large collection of data is a very challenging problem that the scientific community has been attempting to tackle. One of the most promising solutions to this fundamental problem is Content-Based Media Retrieval (CBMR) systems. The latter are search systems that perform the retrieval task in large media databases based on the content of the data. CBMR systems consist essentially of three major units, a Data Representation unit for feature representation learning, a Multidimensional Indexing unit for structuring the resulting feature space, and a Nearest Neighbor Search unit to perform efficient search. Media data (i.e. image, text, audio, video, etc.) can be represented by meaningful numeric information (i.e. multidimensional vector), called Feature Description, describing the overall content of the input data. The task of the second unit is to structure the resulting feature descriptor space into an index structure, where the third unit, effective nearest neighbor search, is performed.In this work, we address the problem of nearest neighbor search by proposing three Content-Based Media Retrieval approaches. Our three approaches are unsupervised, and thus can adapt to both labeled and unlabeled real-world datasets. They are based on a hashing indexing scheme to perform effective high dimensional nearest neighbor search. Unlike most recent existing hashing approaches, which favor indexing in Hamming space, our proposed methods provide index structures adapted to a real-space mapping. Although Hamming-based hashing methods achieve good accuracy-speed tradeoff, their accuracy drops owing to information loss during the binarization process. By contrast, real-space hashing approaches provide a more accurate approximation in the mapped real-space as they avoid the hard binary approximations.Our proposed approaches can be classified into shallow and deep approaches. In the former category, we propose two shallow hashing-based approaches namely, "Symmetries of the Cube Locality Sensitive Hashing" (SC-LSH) and "Cluster-based Data Oriented Hashing" (CDOH), based respectively on randomized-hashing and shallow learning-to-hash schemes. The SC-LSH method provides a solution to the space storage problem faced by most randomized-based hashing approaches. It consists of a semi-random scheme reducing partially the randomness effect of randomized hashing approaches, and thus the memory storage problem, while maintaining their efficiency in structuring heterogeneous spaces. The CDOH approach proposes to eliminate the randomness effect by combining machine learning techniques with the hashing concept. The CDOH outperforms the randomized hashing approaches in terms of computation time, memory space and search accuracy.The third approach is a deep learning-based hashing scheme, named "Unsupervised Deep Neuron-per-Neuron Hashing" (UDN2H). The UDN2H approach proposes to index individually the output of each neuron of the top layer of a deep unsupervised model, namely a Deep Autoencoder, with the aim of capturing the high level individual structure of each neuron output.Our three approaches, SC-LSH, CDOH and UDN2H, were proposed sequentially as the thesis was progressing, with an increasing level of complexity in terms of the developed models, and in terms of the effectiveness and the performances obtained on large real-world datasets
269

Detekce fibrilace síní v krátkodobých EKG záznamech / Detection of atrial fibrillation in short-term ECG

Ambrožová, Monika January 2019 (has links)
Atrial fibrillation is diagnosed in 1-2% of the population, in next decades, it expects a significant increase in the number of patients with this arrhythmia in connection with the aging of the population and the higher incidence of some diseases that are considered as risk factors of atrial fibrillation. The aim of this work is to describe the problem of atrial fibrillation and the methods that allow its detection in the ECG record. In the first part of work there is a theory dealing with cardiac physiology and atrial fibrillation. There is also basic descreption of the detection of atrial fibrillation. In the practical part of work, there is described software for detection of atrial fibrillation, which is provided by BTL company. Furthermore, an atrial fibrillation detector is designed. Several parameters were selected to detect the variation of RR intervals. These are the parameters of the standard deviation, coefficient of skewness and kurtosis, coefficient of variation, root mean square of the successive differences, normalized absolute deviation, normalized absolute difference, median absolute deviation and entropy. Three different classification models were used: support vector machine (SVM), k-nearest neighbor (KNN) and discriminant analysis classification. The SVM classification model achieves the best results. Results of success indicators (sensitivity: 67.1%; specificity: 97.0%; F-measure: 66.8%; accuracy: 92.9%).
270

Statistické vyhodnocení fylogeneze biologických sekvencí / Statistic evaluation of phylogeny of biological sequences

Vadják, Šimon January 2014 (has links)
The master's thesis provides a comprehensive overview of resampling methods for testing the correctness topology of the phylogenetic trees which estimate the process of phylogeny on the bases of biological sequences similarity. We focused on the possibility of errors creation in this estimate and the possibility of their removal and detection. These methods were implemented in Matlab for Bootstrapping, jackknifing, OTU jackknifing and PTP test (Permutation tail probability). The work aims to test their applicability to various biological sequences and also to assess the impact of the choice of input analysis parameters on the results of these statistical tests.

Page generated in 0.033 seconds