• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 32
  • 12
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 67
  • 20
  • 18
  • 17
  • 13
  • 13
  • 9
  • 9
  • 9
  • 9
  • 8
  • 8
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Evolutionäre Sprachtransformation

Jung, Hagen 10 June 2011 (has links) (PDF)
Verwandte Sprachen besitzen Wörter gleichen Ursprungs. Im Laufe der Geschichte ändern diese Wörter ihre Gestalt und lassen sich heute in ähnlicher Form in verschiedenen Sprachen als Kognate wiederfinden. In dieser Arbeit wird ein formales Modell zur Beschreibung dieser Änderungen mit Hilfe von kodierten, lexikalischen Wortlisten entwickelt. Das Modell ist geeignet, automatisch und objektiv die Evolution idealisierter Sprachen mit Hilfe eines Sprach-Phylogeniebaumes abzubilden. Dabei werden die einzelnen Buchstabenveränderungen verwandter Wörter und die rekonstruierten Protoformen untersucht. Insbesondere interessieren solche Buchstabenveränderungen, die für mehrere Wörter einer Sprache synchron stattgefunden haben. Ein weiterer Bestandteil des evolutionären Modells ist die Identifikation von Kognaten, um die möglichen Buchstabenersetzungen zwischen den verwandten Wörtern untersuchen zu können. Für die Rekonstruktion linguistisch plausibler Buchstabenveränderungen und Kognatzuweisungen entlang einer Sprachphylogenie wird ein parsimonisches Kostenmodell verwendet, welches die verschiedenen Sprach- und Transformationsverläufe bewertet. Die Suche nach der plausibelsten Lösung ist NP-vollständig, so dass für den enorm großen Suchraum ein Annäherungsverfahren vorgeschlagen wird. Ausgehend von einer geeigneten Rekonstruktion wird durch sukzessives und minimales Verändern einzelner Transformationen oder Kognatzuweisungen mit Hilfe eines speziell entwickelten Approximationsverfahrens nicht nur eine lokal maximale Lösung, sondern eine global beste Lösung angenähert. Mit dem gewählten umfassenden Ansatz des untersuchten Rekonstruktionsmodells ist eine Sprachentwicklung für kleine Wortlisten in angemessener Zeit berechenbar. Als großer Vorteil ist die Nachvollziehbarkeit aller Einzeltransformationen für den linguistischen Diskurs anzusehen. Insbesonders die Identifikation regulärer Buchstabenersetzungen mit möglicher Interpretation als Lautwandel früherer Sprachen ist hierbei von Bedeutung. / Related languages contain words of the same origin. Through time these words change. Remaining similarities between these words can be detected in different languages. In this work, transformations across lexical wordlist are used to model these changes. To reconstruct the possible pathways of language change an algorithm is choosen that calculates the phylogeny, the appropriate protolanguage and the cognate sets. An evaluation function detects plausible evolutions. Because of the enormous amount of possible solutions an approximative method is proposed that continuously modifies and improves possible solutions.
52

As áreas de endemismo dos opiliones (arachnida) da floresta atlântica ao norte do rio São Francisco, Brasil / As áreas de endemismo dos opiliones (arachnida) da floresta atlântica ao norte do rio São Francisco, Brasil

Souza, Adriano Medeiros de 28 February 2013 (has links)
Made available in DSpace on 2015-04-17T14:55:25Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 6612338 bytes, checksum: 944e87b47204fb137759922d590127af (MD5) Previous issue date: 2013-02-28 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The Atlantic Forest is one of the richest regions of the world, both in species diversity, as in endemism. Due to this and to the degree of devastation, a few years ago, this biome was considered one of the 25 worldwide biodiversity hotspots. However, the historical relationships between different sectors of the Atlantic Forest are poorly understood. A critical step in that knowledge is the delimitation of the areas of endemism, which are basic units for biogeographic analyzes. Studies performed in this subject have advanced enough, both south and southeastern Atlantic Forest, by using the distribution patterns of harvestmen which occur into this region, nevertheless, a gap remains regarding to the northeastern Atlantic Forest. Therefore, this study aims to delimit the areas of endemism in Atlantic Forest north São Francisco river, by using Opiliones species distributions, also comparing the obtained results by different methodologies and finally evaluating the influence of the size of the cells on the results. In total, we used 1581 occurrences of 224 species, and of these, 18 are not yet described. The occurrences were obtained of field collecting, literature and museums. We used three numeric methods to search for areas of endemism that work using the occurrence of species in a set of cells on a grid: Endemicity Analysis (NDM), Parsimony Analysis of Endemicity (PAE) and Biotic Element Analysis (BEA). For the three methods were applied three grids: a less detailed grid (2° X 2°), a intermediated grid (1° X 1°) and one more detailed grid (with 0,5° X 0,5° cells). Afterwards, the areas of endemism were delimited by applying on the results of the numeric analyzes a protocol based on a number of combined criteria derived from areas of endemic concepts described in the literature. Altogether, thirteen endemism areas were delimited for Atlantic Forest, and, three of these correspond to the northeastern Atlantic Forest: Area of Endemism Bahia (BA), Area of Endemism Brejos Cearenses (BCE) and Area of Endemism Pernambuco (PE). The results from NDM and PAE were similar, whereas BEA results were entirely different and arbitrary. The size of cells had influences in the analysis, both on the number of areas found, as the number of cells included in each area. The largest amount of data used, turned limited areas into more robust, especially those of the northeastern Atlantic Forest, if compared to previous work. These areas, as well as those from south and southeastern Atlantic Forest probably correspond to areas of Pleistocene forest refuge, when the entire Atlantic Forest experienced cycles of expansion and contraction. These oscillations are associated with the cycles of glaciation and warming that occurred during the Quaternary, where, the expansion of northeastern Atlantic Forest boundaries probably led it to a contact with the Amazon Rainforest. Other likely causes were marine transgressions or tectonism, in this case, applied to the region of Baía de Todos os Santos. / A Mata Atlântica é uma das regiões mais ricas do mundo, tanto em diversidade de espécies quanto em endemismo. Devido a isso e ao grau de devastação desse bioma, há alguns anos ela foi enquadrada como um dos 25 hotspots mundias de biodiversidade. Entretanto, as relações históricas entre os diferentes setores da Mata Atlântica são pouco conhecido. Uma etapa fundamental é a delimitação das áreas de endemismo, que são unidades básicas para análises biogeográficas. Os estudos nessa área avançaram bastante na Mata Atlântica do sul e sudeste com o uso das distribuições dos opiliões que ocorrem nessa região, entretanto, permanece uma lacuna com relação à Mata Atlântica nordestina. Diante disso, o objetivo do presente estudo foi delimitar as áreas de endemismo da Mata Atlântica localizada ao norte do rio São Francisco, utilizando as distribuições de espécies de Opiliones, além de comparar os resultados obtidos pelas diferentes metodologias utilizadas e avaliar a influência do tamanho das células sobre os resultados. Ao todo, foram utilizadas 1581 ocorrências de 224 espécies, sendo que, dessas, 18 ainda não estão descritas. As ocorrências foram obtidas por coletas, revisão de literatura e visita a museus. Foram utilizados três métodos numéricos para busca de áreas de endemismo que trabalham usando a ocorrência das espécies num conjunto de células de uma grade: a Análise de Endemicidade (NDM), a Análise de Parcimônia de Endemicidade (PAE) e a Análise de Elementos Bióticos (BEA). Para os três métodos foram aplicados três grades: uma grade menos detalhada (2° X 2°), uma grade intermediária (1° X 1°) e uma grade mais detalhada (com células de 0,5° X 0,5°). Em seguida, as áreas de endemismo foram delimitadas aplicando, sobre os resultados das análises numéricas, um protocolo baseado numa série de critérios combinados que, por sua vez, derivam de conceitos de áreas de endemismo descritos na literatura. Ao todo, foram delimitadas treze áreas de endemismo para a Mata Atlântica, sendo que, dessas, três correspondem à Mata Atlântica nordestina: a Área de Endemismo Bahia (BA), a Área de Endemismo Brejos Cearenses (BCE) e a Área de Endemismo Pernambuco (PE). Os resultados obtidos pela NDM e pela PAE foram similares, enquanto que a BEA apresentou resultados completamente diferentes e arbitrários. O tamanho das células utilizadas teve influência nas análises, tanto na quantidade de áreas encontradas, quanto na quantidade de células incluída em cada área. A maior quantidade de dados utilizada tornou áreas delimitadas mais robustas, sobretudo aquelas da Mata Atlântica nordestina, quando comparada a trabalhos anteriores. Essas áreas, assim como aquelas da Mata Atlântica sul e sudeste, provavelmente correspondem às áreas de refúgios florestais pleistocênicos, quando a Mata Atlântica inteira experimentou ciclos de expansão e retração. Essas oscilações estão associadas aos ciclos de glaciação e aquecimento que ocorreram durante o Quaternário, sendo que, na Mata Atlântica nordestina, a expansão de seus limites provavelmente ocasionou um contato com a Floresta Amazônica. Outras causas prováveis foram às transgressões marinhas ou tectonismo, no caso, aplicados a região da Baía de Todos os Santos.
53

Distribuição de musgos (Bryophyta) no Brasil: riqueza, endemismo e conservação

Amorim, Eduardo Toledo de 16 August 2017 (has links)
Submitted by Geandra Rodrigues (geandrar@gmail.com) on 2018-01-08T16:35:29Z No. of bitstreams: 1 eduardotoledoamorim.pdf: 7085942 bytes, checksum: 0064c64ba9943c302c39ab2df413436b (MD5) / Rejected by Adriana Oliveira (adriana.oliveira@ufjf.edu.br), reason: Favor corrigir Co-orientador: Neto, Luiz Menini on 2018-01-23T11:08:07Z (GMT) / Submitted by Geandra Rodrigues (geandrar@gmail.com) on 2018-01-23T12:54:03Z No. of bitstreams: 1 eduardotoledoamorim.pdf: 7085942 bytes, checksum: 0064c64ba9943c302c39ab2df413436b (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2018-01-24T13:27:56Z (GMT) No. of bitstreams: 1 eduardotoledoamorim.pdf: 7085942 bytes, checksum: 0064c64ba9943c302c39ab2df413436b (MD5) / Made available in DSpace on 2018-01-24T13:27:56Z (GMT). No. of bitstreams: 1 eduardotoledoamorim.pdf: 7085942 bytes, checksum: 0064c64ba9943c302c39ab2df413436b (MD5) Previous issue date: 2017-08-16 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Bryophyta (musgos) compõem a segunda divisão mais diversa de plantas terrestres, com aproximadamente 13.000 espécies. Vários trabalhos foram realizados com o intuito de compilar o conhecimento da flora de briófitas para o Brasil, apontando os estados com o maior número de espécies e evidenciando as carências de amostragem para o país. Entretanto, ainda não se sabe de forma mais precisa, quais as lacunas no conhecimento dos musgos no Brasil. Um dos temas principais na biogeografia é compreender o motivo de espécies apresentarem ampla distribuição ou endemismo. Endemismo, objeto deste estudo, está relacionado à ocorrência restrita de um táxon, por terem se originado neste local e não terem se dispersado, ou porque ficaram restritos à área, em relação a sua distribuição anterior. Atualmente, destacam-se duas hipóteses alopátricas mais próximas para modelos de diversificação: Hipótese de Refúgio no Pleistoceno e de Isolamento em Montanhas. No Brasil, as áreas são protegidas através das Unidades de Conservação (UCs), as quais, de modo geral, são abrigadas para que se minimizem as ações antrópicas, tornando-as habitats excelentes para a colonização de briófitas. O presente trabalho teve por objetivo geral realizar um estudo sobre a distribuição de musgos para o Brasil, identificando a riqueza e o endemismo no país e apresentando uma abordagem conservacionista para as espécies Foi realizado um levantamento dos dados para as ocorrências de musgos, através de diferentes bancos de dados on-line e bibliografia. Posteriormente, esses dados foram refinados quanto à identificação em nível específico, à validade taxonômica e às coordenadas geográficas. Em seguida, foram marcados os pontos dos registros no mapa e, elaborado o gradeamento por quadrículas de 1° x 1°. A partir daí, foram realizadas as análises de riqueza, riqueza estimada, número de registros e a Análise de Parcimônia de Endemismos (PAE). Foi utilizada a ferramenta de modelagem preditiva de distribuição de espécies para identificar as áreas de maior adequabilidade para espécies endêmicas do Brasil e para elaborar um mapa de áreas de concentração de endemismo dessas espécies no país. No total, foram levantados 26.691 registros, representando 868 espécies de musgos. Foram estabelecidas 394 quadrículas, tendo o número de espécies por quadrícula variado de 1 a 235. A Floresta Atlântica apresentou maior riqueza, tanto pelas condições que fornece ao estabelecimento dos musgos, quanto à maior intensidade amostral no Domínio. A PAE resultou em apenas uma área de endemismo localizada no centro do Estado da Bahia, no Domínio Fitogeográfico da Caatinga, na região do Parque Nacional da Chapada Diamantina. As quadrículas propostas como áreas potenciais de endemismo foram encontradas em seis áreas, dispersas no Cerrado e na Floresta Atlântica. As áreas de endemismo de musgos são, em sua maioria, áreas de montanhas, que corroboram as principais hipóteses de especiação dos organismos. Dentre as UCs, 218 apresentaram áreas com adequabilidade ambiental para a presença da espécie, das quais, 68 estão inseridas nas categorias de Uso Sustentável e 150 na categoria de Proteção Integral, demonstrando a importância das UCs para a brioflora na Floresta Atlântica. / Bryophyta (mosses) are the second most diverse division of terrestrial plants, with about 13.000 species. In order to compile the knowledge of the bryophyte flora for Brazil, a range of studies were developed showing the States with the largest number of species and evidencing the lack of sampling for the country. However, the deficiency in the knowledge of mosses in Brazil is not yet known precisely. One of the main themes in biogeography is to understand why species are widely distributed or endemic. Endemism is related to the restricted occurrence of, because they originated in this place and did not disperse, or because they were confined to the area, in relation to its previous distribution. Two allopatric hypotheses are most relevant for diversification models: Pleistocene Refuge and Montane Isolate Hypothesis. In Brazil, the preserved areas are protected through Conservation Units (UCs), which have low degradation traits, making it an excellent habitat for the colonization of bryophytes. Thus, the main aims of this work were to study the distribution of mosses in Brazil, identifying the richness and endemism in the country and to present a conservationist approach to mosses species. Firstly, in order to compile information on the occurrence of mosses, we did a survey of the data through different online databases and bibliography. Subsequently, we refined these data regarding the identification at the specific level, the taxonomic validity and the geographic coordinate. Then, we marked the points of the records on the map, and we elaborated the grids by squares of 1° x 1°. Therefore, we performed analyzes of richness, estimated richness, number of records and the Parsimony Analysis of Endemicity (PAE). We used the species distribution predictive modelling to corroborate the existence of areas of endemism in Brazil and to elaborate a map of endemism clustering areas. Of the total, we collected 868 species of mosses from 26.691 records. From this information, we filled a number of 394 squares, having the number of species per grid ranging from 1 to 235. The Atlantic Forest presented greater richness, both by the conditions that it provides to the establishment of moss, and the greater sampling intensity in the Domain. The PAE resulted in only one area of endemism, located in the center of the State of Bahia, in the Phytogeographical Domain of the Caatinga, in the region of the Parque Nacional da Chapada Diamantina. The grids proposed as potential areas of endemism were found in six areas, scattered in the Cerrado and in the Atlantic Forest. The areas of endemism of mosses are mostly areas of mountains, which corroborate the main hypotheses of speciation of the organisms. In the UCs, 218 showed areas with environmental suitability for the presence of the species, whereupon 68 are inserted in the categories of Sustainable use and 150 in the category of integral protection, demonstrating the importance of the UCs for the bryoflora in the Atlantic Forest.
54

Relative Timing of Intron Gain and a New Marker for Phylogenetic Analyses

Lehmann, Jörg 12 February 2014 (has links)
Despite decades of effort by molecular systematists, the trees of life of eukaryotic organisms still remain partly unresolved or in conflict with each other. An ever increasing number of fully-sequenced genomes of various eukaryotes allows to consider gene and species phylogenies at genome-scale. However, such phylogenomics-based approaches also revealed that more taxa and more and more gene sequences are not the ultimate solution to fully resolve these conflicts, and that there is a need for sequence-independent phylogenetic meta-characters that are derived from genome sequences. Spliceosomal introns are characteristic features of eukaryotic nuclear genomes. The relatively rare changes of spliceosomal intron positions have already been used as genome-level markers, both for the estimation of intron evolution and phylogenies, however with variable success. In this thesis, a specific subset of these changes is introduced and established as a novel phylogenetic marker, termed near intron pair (NIP). These characters are inferred from homologous genes that contain mutually-exclusive intron presences at pairs of coding sequence (CDS) positions in close proximity. The idea that NIPs are powerful characters is based on the assumption that both very small exons and multiple intron gains at the same position are rare. To obtain sufficient numbers of NIP character data from genomic and alignment data sets in a consistent and flexible way, the implementation of a computational pipeline was a main goal of this work. Starting from orthologous (or more general: homologous) gene datasets comprising genomic sequences and corresponding CDS transcript annotations, the multiple alignment generation is an integral part of this pipeline. The alignment can be calculated at the amino acid level utilizing external tools (e.g. transAlign) and results in a codon alignment via back-translation. Guided by the multiple alignment, the positionally homologous intron positions should become apparent when mapped individually for each transcript. The pipeline proceeds at this stage to output portions of the intron-annotated alignment that contain at least one candidate of a NIP character. In a subsequent pipeline script, these collected so-called NIP region files are finally converted to binary state characters representing valid NIPs in dependence of quality filter constraints concerning, e.g., the amino acid alignment conservation around intron loci and splice sites, to name a few. The computational pipeline tools provide the researcher to elaborate on NIP character matrices that can be used for tree inference, e.g., using the maximum parsimony approach. In a first NIP-based application, the phylogenetic position of major orders of holometabolic insects (more specifically: the Coleoptera-Hymenoptera-Mecopterida trifurcation) was evaluated in a cladistic sense. As already suggested during a study on the eIF2gamma gene based on two NIP cases (Krauss et al. 2005), the genome-scale evaluation supported Hymenoptera as sister group to an assemblage of Coleoptera and Mecopterida, in agreement with other studies, but contradicting the previously established view. As part of the genome paper describing a new species of twisted-wing parasites (Strepsiptera), the NIP method was employed to help to resolve the phylogenetic position of them within (holometabolic) insects. Together with analyses of sequence patterns and a further meta-character, it revealed twisted-wing parasites as being the closest relatives of the mega-diverse beetles. NIP-based reconstructions of the metazoan tree covering a broad selection of representative animal species also identified some weaknesses of the NIP approach that may suffer e.g. from alignment/ortholog prediction artifacts (depending on the depth of range of taxa) and systematic biases (long branch attraction artifacts, due to unequal evolutionary rates of intron gain/loss and the use of the maximum parsimony method). In a further study, the identification of NIPs within the recently diverged genus Drosophila could be utilized to characterize recent intron gain events that apparently involved several cases of intron sliding and tandem exon duplication, albeit the mechanisms of gain for the majority of cases could not be elucidated. Finally, the NIP marker could be established as a novel phylogenetic marker, in particular dedicated to complementarily explore the wealth of genome data for phylogenetic purposes and to address open questions of intron evolution.
55

Investigating infant preference for prosocial others: Replication and extension using repeated measures

Nighbor, Tyler D. 01 January 2014 (has links)
Recent research suggests that infants as young as 5 months old demonstrate an innate or unlearned ability to make judgments about others' prosocial and antisocial behavior. Data used to support this assertion suggest that, when given a single opportunity to choose a puppet after watching a puppet show, more infants (72-100% of infants) choose a helper puppet (the puppet that "helps" another puppet attain its goal) over a hinderer puppet (the puppet that "hinders" another puppet from attaining its goal). However, to date, no independent research teams have published a replication of these methods and results. The purpose of the current investigation was to replicate Hamlin and Wynn (2011) and extend their work by including within-subject repeated measures of choice. Twenty infants were shown a puppet show nearly identical to that used by Hamlin and Wynn (2011) and asked to make a choice between the two puppets afterwards (i.e., the helper or hinderer); infants were then asked to choose a puppet four more times, resulting in five choices per infant. Results of the current investigation failed to replicate those of Hamlin and Wynn (2011), as infants in the current investigation did not consistently choose the helper more often during subsequent choice trials. Implications and limitations of this study, as well as suggestions for future research are discussed.
56

Phylogenetic Analyses of subtribe Goodyerinae and Revision of <i>Goodyera</i> section <i>Goodyera</i> (Orchidaceae) from Indonesia, and Fungal Association of <i>Goodyera</i> section <i>Goodyera</i>

Juswara, Lina S. 03 September 2010 (has links)
No description available.
57

Phylogeny and Molecular Evolution of the Voltage-gated Sodium Channel Gene scn4aa in the Electric Fish Genus Gymnotus

Xiao, Dawn Dong-yi 19 March 2014 (has links)
Analyses of the evolution and function of voltage-gated sodium channel proteins (Navs) have largely been limited to mutations from individual people with diagnosed neuromuscular disease. This project investigates the carboxyl-terminus of the Nav paralog (locus scn4aa 3’) that is preferentially expressed in electric organs of Neotropical weakly-electric fishes (Order Gymnotiformes). As a model system, I used the genus Gymnotus, a diverse clade of fishes that produce species-specific electric organ discharges (EODs). I clarified evolutionary relationships among Gymnotus species using mitochondrial (cytochrome b, and 16S ribosome) and nuclear (rag2, and scn4aa) gene sequences (3739 nucleotide positions from 28 Gymnotus species). I analyzed the molecular evolution of scn4aa 3’, and detected evidence for positive selection at eight amino acid sites in seven Gymnotus lineages. These eight amino acid sites are located in motifs that may be important for modulation of EOD frequencies.
58

Phylogeny and Molecular Evolution of the Voltage-gated Sodium Channel Gene scn4aa in the Electric Fish Genus Gymnotus

Xiao, Dawn Dong-yi 19 March 2014 (has links)
Analyses of the evolution and function of voltage-gated sodium channel proteins (Navs) have largely been limited to mutations from individual people with diagnosed neuromuscular disease. This project investigates the carboxyl-terminus of the Nav paralog (locus scn4aa 3’) that is preferentially expressed in electric organs of Neotropical weakly-electric fishes (Order Gymnotiformes). As a model system, I used the genus Gymnotus, a diverse clade of fishes that produce species-specific electric organ discharges (EODs). I clarified evolutionary relationships among Gymnotus species using mitochondrial (cytochrome b, and 16S ribosome) and nuclear (rag2, and scn4aa) gene sequences (3739 nucleotide positions from 28 Gymnotus species). I analyzed the molecular evolution of scn4aa 3’, and detected evidence for positive selection at eight amino acid sites in seven Gymnotus lineages. These eight amino acid sites are located in motifs that may be important for modulation of EOD frequencies.
59

Algorithmes basés sur la programmation DC et DCA pour l’apprentissage avec la parcimonie et l’apprentissage stochastique en grande dimension / DCA based algorithms for learning with sparsity in high dimensional setting and stochastical learning

Phan, Duy Nhat 15 December 2016 (has links)
De nos jours, avec l'abondance croissante de données de très grande taille, les problèmes de classification de grande dimension ont été mis en évidence comme un challenge dans la communauté d'apprentissage automatique et ont beaucoup attiré l'attention des chercheurs dans le domaine. Au cours des dernières années, les techniques d'apprentissage avec la parcimonie et l'optimisation stochastique se sont prouvées être efficaces pour ce type de problèmes. Dans cette thèse, nous nous concentrons sur le développement des méthodes d'optimisation pour résoudre certaines classes de problèmes concernant ces deux sujets. Nos méthodes sont basées sur la programmation DC (Difference of Convex functions) et DCA (DC Algorithm) étant reconnues comme des outils puissants d'optimisation non convexe. La thèse est composée de trois parties. La première partie aborde le problème de la sélection des variables. La deuxième partie étudie le problème de la sélection de groupes de variables. La dernière partie de la thèse liée à l'apprentissage stochastique. Dans la première partie, nous commençons par la sélection des variables dans le problème discriminant de Fisher (Chapitre 2) et le problème de scoring optimal (Chapitre 3), qui sont les deux approches différentes pour la classification supervisée dans l'espace de grande dimension, dans lequel le nombre de variables est beaucoup plus grand que le nombre d'observations. Poursuivant cette étude, nous étudions la structure du problème d'estimation de matrice de covariance parcimonieuse et fournissons les quatre algorithmes appropriés basés sur la programmation DC et DCA (Chapitre 4). Deux applications en finance et en classification sont étudiées pour illustrer l'efficacité de nos méthodes. La deuxième partie étudie la L_p,0régularisation pour la sélection de groupes de variables (Chapitre 5). En utilisant une approximation DC de la L_p,0norme, nous prouvons que le problème approché, avec des paramètres appropriés, est équivalent au problème original. Considérant deux reformulations équivalentes du problème approché, nous développons différents algorithmes basés sur la programmation DC et DCA pour les résoudre. Comme applications, nous mettons en pratique nos méthodes pour la sélection de groupes de variables dans les problèmes de scoring optimal et d'estimation de multiples matrices de covariance. Dans la troisième partie de la thèse, nous introduisons un DCA stochastique pour des problèmes d'estimation des paramètres à grande échelle (Chapitre 6) dans lesquelles la fonction objectif est la somme d'une grande famille des fonctions non convexes. Comme une étude de cas, nous proposons un schéma DCA stochastique spécial pour le modèle loglinéaire incorporant des variables latentes / These days with the increasing abundance of data with high dimensionality, high dimensional classification problems have been highlighted as a challenge in machine learning community and have attracted a great deal of attention from researchers in the field. In recent years, sparse and stochastic learning techniques have been proven to be useful for this kind of problem. In this thesis, we focus on developing optimization approaches for solving some classes of optimization problems in these two topics. Our methods are based on DC (Difference of Convex functions) programming and DCA (DC Algorithms) which are wellknown as one of the most powerful tools in optimization. The thesis is composed of three parts. The first part tackles the issue of variable selection. The second part studies the problem of group variable selection. The final part of the thesis concerns the stochastic learning. In the first part, we start with the variable selection in the Fisher's discriminant problem (Chapter 2) and the optimal scoring problem (Chapter 3), which are two different approaches for the supervised classification in the high dimensional setting, in which the number of features is much larger than the number of observations. Continuing this study, we study the structure of the sparse covariance matrix estimation problem and propose four appropriate DCA based algorithms (Chapter 4). Two applications in finance and classification are conducted to illustrate the efficiency of our methods. The second part studies the L_p,0regularization for the group variable selection (Chapter 5). Using a DC approximation of the L_p,0norm, we indicate that the approximate problem is equivalent to the original problem with suitable parameters. Considering two equivalent reformulations of the approximate problem we develop DCA based algorithms to solve them. Regarding applications, we implement the proposed algorithms for group feature selection in optimal scoring problem and estimation problem of multiple covariance matrices. In the third part of the thesis, we introduce a stochastic DCA for large scale parameter estimation problems (Chapter 6) in which the objective function is a large sum of nonconvex components. As an application, we propose a special stochastic DCA for the loglinear model incorporating latent variables
60

Models and algorithms to study the common evolutionary history of hosts and symbionts / Modèles et algorithmes pour étudier l'histoire évolutive commune des hôtes et des symbiotes

Urbini, Laura 23 October 2017 (has links)
Lors de cette thèse, je me suis intéressée aux modèles et aux algorithmes pour étudier l'histoire évolutive commune des hôtes et des symbiotes. Le premier objectif était d'analyser la robustesse des méthodes de réconciliation des arbres phylogénétiques, qui sont très utilisées dans ce type d'étude. Celles-ci associent (ou lient) un arbre, d'habitude celui des symbiotes, à l'autre, en utilisant un modèle dit basé sur des évènements. Les évènements les plus utilisés sont la cospéciation, la duplication, le saut et la perte. Les phylogénies des hôtes et des symbiotes sont généralement considérés comme donnés, et sans aucune erreur. L'objectif était de comprendre les forces et les faiblesses du modèle parcimonieux utilisé et comprendre comment les résultats finaux peuvent être influencés en présence de petites perturbations ou d'erreurs dans les données en entrée. Ici deux cas sont considérés, le premier est le choix erroné d'une association entre les feuilles des hôtes et des symbiotes dans le cas où plusieurs existent, le deuxième est lié au mauvais choix de l'enracinement de l'arbre des symbiotes. Nos résultats montrent que le choix des associations entre feuilles et le choix de l'enracinement peuvent avoir un fort impact sur la variabilité de la réconciliation obtenue. Nous avons également remarqué que l'evènement appelé “saut” joue un rôle important dans l'étude de la robustesse, surtout pour le problème de l'enracinement. Le deuxième objectif de cette thèse était d'introduire certains evènements peu ou pas formellement considérés dans la littérature. L'un d'entre eux est la “propagation”, qui correspond à l'invasion de différents hôtes par un même symbiote. Dans ce cas, lorsque les propagations ne sont pas considérés, les réconciliations optimales sont obtenues en tenant compte seulement des coûts des évènements classiques (cospeciation, duplication, saut, perte). La nécessité de développer des méthodes statistiques pour assigner les coûts les plus appropriés est toujours d'actualité. Deux types de propagations sont introduites : verticaux et horizontaux. Le premier type correspond à ce qu'on pourrait appeler aussi un gel, à savoir que l'évolution du symbiote s'arrête et “gèle” alors que le symbiote continue d'être associé à un hôte et aux nouvelles espèces qui descendent de cet hôte. Le second comprend à la fois une invasion, du symbiote qui reste associé à l'hôte initial, mais qui en même temps s'associe (“envahit”) un autre hôte incomparable avec le premier, et un gel par rapport à l'évolution des deux l'hôtes, celui auquel il était associé au début et celui qu'il a envahi. Nos résultats montrent que l'introduction de ces evènements rend le modèle plus réaliste, mais aussi que désormais il est possible d'utiliser directement des jeux de données avec un symbiote qui est associé plusieurs hôtes au même temps, ce qui n'était pas faisable auparavant / In this Ph.D. work, we proposed models and algorithms to study the common evolutionary history of hosts and symbionts. The first goal was to analyse the robustness of the methods of phylogenetic tree reconciliations, which are a common way of performing such study. This involves mapping one tree, most often the symbiont’s, to the other using a so-called event-based model. The events considered in general are cospeciation, duplication, host switch, and loss. The host and the symbiont phylogenies are usually considered as given and without any errors. The objective here was to understand the strengths and weaknesses of the parsimonious model used in such mappings of one tree to another, and how the final results may be influenced when small errors are present, or are introduced in the input datasets. This may correspond either to a wrong choice of present-day symbiont-host associations in the case where multiple ones exist, or to small errors related to a wrong rooting of the symbiont tree. Our results show that the choice of leaf associations and of root placement may have a strong impact on the variability of the reconciliation output. We also noticed that the host switch event has an important role in particular for the rooting problem. The second goal of this Ph.D. was to introduce some events that are little or not formally considered in the literature. One of them is the spread, which corresponds to the invasion of different hosts by a same symbiont. In this case, as when spreads are not considered, the optimal reconciliations obtained will depend on the choice made for the costs of the events. The need to develop statistical methods to assign the most appropriate ones therefore remains of actuality. Two types of spread are introduced: vertical and horizontal. The first case corresponds to what could be called also a freeze in the sense that the evolution of the symbiont “freezes” while the symbiont continues to be associated with a host and with the new species that descend from this host. The second includes both an invasion, of the symbiont which remains with the initial host but at the same time gets associated with (“invades”) another one incomparable with the first, and a freeze, actually a double freeze as the evolution of the symbiont “freezes” in relation to the evolution of the host to which it was initially associated and in relation to the evolution of the second one it “invaded”. Our results show that the introduction of these events makes the model more realistic, but also that it is now possible to directly use datasets with a symbiont that is associated with more than one host at the same time, which was not feasible before

Page generated in 0.4436 seconds