• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 37
  • 19
  • 16
  • 4
  • 1
  • Tagged with
  • 92
  • 92
  • 23
  • 21
  • 21
  • 19
  • 17
  • 16
  • 15
  • 13
  • 13
  • 12
  • 12
  • 12
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Assessment of supervised classification methods for the analysis of RNA-seq data / Développement, évaluation et application de méthodes statistiques pour l'analyse de données multidimensionnelles de comptage produites par les technologies de séquençage à haut débit ("Next Generation Sequencing")

Abuelqumsan, Mustafa 20 December 2018 (has links)
Les technologies « Next Generation Sequencing» (NGS), qui permettent de caractériser les séquences génomiques à un rythme sans précédent, sont utilisées pour caractériser la diversité génétique humaine et le transcriptome (partie du génome transcrite en acides ribonucléiques). Les variations du niveau d’expression des gènes selon les organes et circonstances, sous-tendent la différentiation cellulaire et la réponse aux changements d’environnement. Comme les maladies affectent souvent l’expression génique, les profils transcriptomiques peuvent servir des fins médicales (diagnostic, pronostic). Différentes méthodes d’apprentissage artificiel ont été proposées pour classer des individus sur base de données multidimensionnelles (par exemple, niveau d’expression de tous les gènes dans des d’échantillons). Pendant ma thèse, j’ai évalué des méthodes de « machine learning » afin d’optimiser la précision de la classification d’échantillons sur base de profils transcriptomiques de type RNA-seq. / Since a decade, “Next Generation Sequencing” (NGS) technologies enabled to characterize genomic sequences at an unprecedented pace. Many studies focused of human genetic diversity and on transcriptome (the part of genome transcribed into ribonucleic acid). Indeed, different tissues of our body express different genes at different moments, enabling cell differentiation and functional response to environmental changes. Since many diseases affect gene expression, transcriptome profiles can be used for medical purposes (diagnostic and prognostic). A wide variety of advanced statistical and machine learning methods have been proposed to address the general problem of classifying individuals according to multiple variables (e.g. transcription level of thousands of genes in hundreds of samples). During my thesis, I led a comparative assessment of machine learning methods and their parameters, to optimize the accuracy of sample classification based on RNA-seq transcriptome profiles.
32

Classificação semissupervisionada de séries temporais extraídas de imagens de satélite / Semi-supervised classification of time series extracted from satellite images

Amaral, Bruno Ferraz do 29 April 2016 (has links)
Nas últimas décadas, com o crescimento acelerado na geração e armazenamento de dados, houve um aumento na necessidade de criação e gerenciamento de grandes bases de dados. Logo, a utilização de técnicas de mineração de dados adequadas para descoberta de padrões e informações úteis em bases de dados é uma tarefa de interesse. Em especial, bases de séries temporais têm sido alvo de pesquisas em áreas como medicina, economia e agrometeorologia. Em mineração de dados, uma das tarefas mais exploradas é a classificação. Entretanto, é comum em bases de séries temporais, a quantidade e complexidade de dados extrapolarem a capacidade humana de análise manual dos dados, o que torna o processo de supervisão dos dados custoso. Como consequência disso, são produzidos poucos dados rotulados, em comparação a um grande volume de dados não rotulados disponíveis. Nesse cenário, uma abordagem adequada para análise desses dados é a classificação semissupervisionada, que considera dados rotulados e não rotulados para o treinamento do classificador. Nesse contexto, este trabalho de mestrado propõe 1) uma metodologia de análise de dados obtidos a partir de séries temporais de imagens de satélite (SITS) usando tarefas de mineração de dados e 2) uma técnica baseada em grafos para classificação semissupervisionada de séries temporais extraídas de imagens de satélite. A metodologia e a técnica de classificação desenvolvidas são aplicadas na análise de séries temporais de índices de vegetação obtidas a partir de SITS, visando a identificação de áreas de plantio de cana-de-açúcar. Os resultados obtidos em análise experimental, realizada com apoio de especialistas no domínio de aplicação, indicam que a metodologia proposta é adequada para auxiliar pesquisas em agricultura. Além disso, os resultados do estudo comparativo mostram que a técnica de classificação semissupervisionada desenvolvida supera métodos de classificação supervisionada consolidados na literatura e métodos correlatos de classificação semissupervisionada. / The amount of digital data generated and stored as well as the need of creation and management of large databases has increased significantly, in the last decades. The possibility of finding valid and potentially useful patterns and information in large databases has attracted the attention of many scientific areas. Time series databases have been explored using data mining methods in serveral domains of application, such as economics, medicine and agrometeorology. Due to the large volume and complexity of some time series databases, the process of labeling data for supervised tasks, such as classification, can be very expensive. To overcome the problem of scarcity of labeled data, semi-supervised classification, which benefits from both labeled and unlabeled data available, can be applied to classify data from large time series databases. In this Master dissertation, we propose 1) a framework for the analysis of data extracted from satellite image time series (SITS) using data mining tasks and 2) a graph-based semi-supervised classification method, developed to classify temporal data obtained from satellite images. According to experts in agrometeorology, the use of the proposed method and framework provides an automatic way of analyzing data extracted from SITS, which is very useful for supporting research in this domain of application. We apply the framework and the proposed semi-supervised classification method in the analysis of vegetation index time series, aiming at identifying sugarcane crop fields, in Brazil. Experimental results indicate that our proposed framework is useful for supporting researches in agriculture, according to experts in the domain of application. We also show that our method is more accurate than traditional supervised methods and related semi-supervised methods.
33

Spectroscopie optique multi-modalités in vivo : instrumentation, extraction et classification diagnostique de tissus sains et hyperplasiques cutanés / Multi-modality optical spectroscopy in vivo : instrumentation, extraction and classification diagnosis of normal and hyperplastic cutaneous tissue

Diaz-Ayil, Gilberto 16 November 2009 (has links)
L’incidence des cancers cutanés est en constante progression. Leur diagnostic précoce et leur caractérisation in vivo constituent donc un enjeu important. Une approche multimodale et non invasive en spectroscopie fibrée résolue spatialement a été implémentée. L’instrumentation développée permet des mesures co-localisées en multiple excitation d’AutoFluorescence (AF, 7 pics entre 360 et 430 nm) et en Réflectance Diffuse (RD, 390 à 720 nm) résolues spatialement à 5 distances inter-fibres (entre 271 et 1341 µm). Le protocole expérimental a porté sur les stades précoces de cancers cutanés UV-induits sur un modèle pré-clinique. L’analyse histopathologique a permis de définir 4 classes de référence de tissus cutanés : Sain (S), Hyperplasie Compensatoire (HC), Hyperplasie Atypique (HA) et Dysplasie (D), menant à 6 combinaisons de paires histologiques à discriminer. Suite au prétraitement des spectres bruts acquis, puis à l’extraction, la sélection et la réduction de jeux de caractéristiques spectroscopiques, les performances de trois algorithmes de classification supervisée ont été comparées : k-Plus Proches Voisins, Analyse Discriminante Linéaire et Machine à Vecteur de Support. Différentes modalités ont également été évaluées : mono-excitation d’AF seule, Matrices d’Excitation-Emission en AF seules (EEMs), RD seule, couplage EEMs – RD et couplage EEMs – RD résolue spatialement. L’efficacité finale de notre méthode diagnostique a été évaluée en termes de sensibilité (Se) et de spécificité (Sp). Les meilleures résultats obtenus sont : Se et Sp ≈ 100% pour discriminer HC vs autres ; Sp ≈ 100% et Se > 95% pour discriminer S vs HA ou D ; Sp ≈ 74% et Se ≈ 63% pour HA vs D / The incidence of skin cancers is steadily increasing. Their in vivo early diagnosis and characterization is an important issue. An approach noninvasive: the spatially resolved multi-modality spectroscopy has been implemented. The instrumentation developed allows to co-localized measures in multiple AutoFluorescence excitation (AF, 7 peaks between 360 and 430 nm) and Diffuse Reflectance (DR, 390 to 720 nm) spatially resolved at 5 inter-fiber distances (between 271 and 1341 μm). The experimental protocol was focused on the early stages of skin cancer UV-induced in a preclinical model. Four reference classes were defined based on the histopathological analysis of the skin samples: Healthy (H), Compensatory Hyperplasia (CH), Atypical Hyperplasia (AH) and Dysplasia (D), leading to 6 combinations of class pairs to be discriminated. After preprocessing of the raw spectra, extraction, selection and reduction of the most discriminative spectroscopic data set were performed. Then, the efficacy of three supervised classification algorithms was compared: k-Nearest Neighbors, Linear Discriminant Analysis and Support Vector Machine. The contribution of the different modalities was also evaluated: single AF excitation alone, Excitation-Emission Matrices AF (EEMs) alone, DR alone, coupling of EEMs and RD, coupling of EEMs and DR with spatial resolution. The final efficiency of our diagnostic method was evaluated in terms of sensitivity (Se) and specificity (Sp). The best results obtained are: Se and Sp ≈ 100% for discriminating CH vs others; Sp ≈ 100% and Se> 95% for discriminating AH or D vs H; Sp ≈ 74% and Se ≈ 63% to discriminate AH vs D
34

Etude et extraction des règles associatives de classification en classification supervisée / Study and mining associative classification rules in Supervised classification

Bouzouita-Bayoudh, Inès 01 December 2012 (has links)
Dans le cadre de cette thèse, notre intérêt se porte sur la précision de la classification et l'optimalité du parcours de l'espace de recherche. L'objectif recherché est d'améliorer la précision de classification en étudiant les différents types de règles et de réduire l'espace de recherche des règles. Nous avons proposé une approche de classification IGARC permettant de générer un classifieur formé d'une base de règles de classification génériques permettant de mieux classer les nouveaux objets grâce à la flexibilité de petites prémisses caractérisant ces règles. De plus cette approche manipule un nombre réduit de règles en comparaison avec les autres approches de classification associative en se basant sur le principe des bases génériques des règles associatives. Une étude expérimentale inter et intra approches a été faite sur 12 bases Benchmark.Nous avons également proposé une approche Afortiori. Notre travail a été motivé par la recherche d'un algorithme efficace permettant l'extraction des règles génériques aussi bien fréquentes que rares de classification en évitant la génération d'un grand nombre de règles. L'algorithme que nous proposons est particulièrement intéressant dans le cas de bases de données bien spécifiques composées d'exemples positifs et négatifs et dont le nombre d'exemples négatifs est très réduit par rapport aux exemples positifs. La recherche des règles se fait donc sur les exemples négatifs afin de déterminer des règles qui ont un faible support et ce même par rapport à la population des exemples positifs et dont l'extraction pourrait être coûteuse. / Within the framework of this thesis, our interest is focused on classification accuracy and the optimalité of the traversal of the search. we introduced a new direct associative classification method called IGARC that extracts directly a classifier formed by generic associative classification rules from a training set in order to reduce the number of associative classification rules without jeopardizing the classification accuracy. Carried out experiments outlined that IGARC is highly competitive in comparison with popular classification methods.We also introduced a new classification approach called AFORTIORI. We address the problem of generating relevant frequent and rare classification rules. Our work is motivated by the long-standing open question of devising an efficient algorithm for finding rules with low support. A particularly relevant field for rare item sets and rare associative classification rules is medical diagnosis. The proposed approach is based on the cover set classical algorithm. It allows obtaining frequent and rare rules while exploring the search space in a depth first manner. To this end, AFORTIORI adopts the covering set algorithm and uses the cover measure in order to guide the traversal of the search space and to generate the most interesting rules for the classification framework even rare ones. We describe our method and provide comparisons with common methods of associative classification on standard benchmark data set.
35

Classificação semi-supervisionada ativa baseada em múltiplas hierarquias de agrupamento / Active semi-supervised classification based on multiple clustering hierarchies

Batista, Antônio José de Lima 08 August 2016 (has links)
Algoritmos de aprendizado semi-supervisionado ativo podem se configurar como ferramentas úteis em cenários práticos em que os dados são numerosamente obtidos, mas atribuir seus respectivos rótulos de classe se configura como uma tarefa custosa/difícil. A literatura em aprendizado ativo destaca diversos algoritmos, este trabalho partiu do tradicional Hierarchical Sampling estabelecido para operar sobre hierarquias de grupos. As características de tal algoritmo o coloca à frente de outros métodos ativos, entretanto o mesmo ainda apresenta algumas dificuldades. A fim de aprimorá-lo e contornar suas principais dificuldades, incluindo sua sensibilidade na escolha particular de uma hierarquia de grupos como entrada, este trabalho propôs estratégias que possibilitaram melhorar o algoritmo na sua forma original e diante de variantes propostas na literatura. Os experimentos em diferentes bases de dados reais mostraram que o algoritmo proposto neste trabalho é capaz de superar e competir em qualidade dentro do cenário de classificação ativa com outros algoritmos ativos da literatura. / Active semi-supervised learning can play an important role in classification scenarios in which labeled data are laborious and/or expensive to obtain, while unlabeled data are numerous and can be easily acquired. There are many active algorithms in the literature and this work focuses on an active semi-supervised algorithm that can be driven by clustering hierarchy, the well-known Hierarchical Sampling (HS) algorithm. This work takes as a starting point the original Hierarchical Sampling algorithm and perform changes in different aspects of the original algorithm in order to tackle its main drawbacks, including its sensitivity to the choice of a single particular hierarchy. Experimental results over many real datasets show that the proposed algorithm performs superior or competitive when compared to a number of state-of-the-art algorithms for active semi-supervised classification.
36

Amostragem de avifauna urbana por meio de pontos fixos: verificando a eficiência do método / Urban birds sampling by point counts: checking the method efficiency

Alexandrino, Eduardo Roberto 03 September 2010 (has links)
A urbanização é uma das ações antrópicas que mais crescem no mundo atual. Por este motivo pesquisas ecológicas são realizadas nas cidades com o objetivo de reconhecer seus impactos, e as aves são utilizadas como uma das ferramentas para diagnóstico ambiental. Assim, o presente estudo avaliou o método de levantamento de aves por ponto fixo, método amplamente utilizado em estudos com aves em diversos ambientes. Foram analisados três pontos que podem influenciar a amostragem de aves através deste método: 1) o habitat onde o levantamento é realizado, observando a composição dos elementos urbanos existentes na cidade; 2) o intervalo de tempo adotado em cada ponto fixo para a coleta de dados; 3) os fatores potencialmente prejudiciais a observação de aves, tais como o ruído sonoro urbano e a presença de conversas causadas por pessoas curiosas. Com a área de estudo estratificada a partir da quantidade de cobertura arbórea existente nos bairros abrangidos, 90 unidades amostrais foram selecionadas. Nestes, foram quantificados os elementos urbanos presentes, a riqueza, o número de contato de aves, os ruídos sonoros e a presença de conversas. Os resultados demonstraram que a reunião de um número maior de espécies e contatos pode ser favorecida pelas áreas de cobertura arbórea, enquanto áreas construídas e pisos impermeáveis podem prejudicar o número de espécies, sendo o número de contato prejudicado apenas pelas áreas de pisos impermeáveis. O número de espécies observadas não foi significativamente diferente após nove minutos de coleta de dados, entretanto o número de contatos continuou crescendo, demonstrando haver recontagens de indivíduos após este intervalo. A riqueza de espécies foi significativamente diferente entre os dados coletados no período seco e no período chuvoso. Conforme houve a maior presença do ruído sonoro urbano menor foi o número de espécies e contatos obtidos nos pontos. A incidência de conversas ocasionadas por pessoas curiosas foi baixa não prejudicando as coletas de dados. Os resultados encontrados sugerem que: o levantamento de aves no meio urbano através do ponto fixo deve considerar a composição do ambiente, já que a riqueza e o número de contato podem variar de acordo com a presença dos diferentes elementos; sejam adotados intervalos de tempo por ponto não superiores a nove minutos; quando possível diferentes épocas do ano devem ser utilizadas para as coletas de dados, visto que podem ser encontradas diferenças entre as estações; sejam escolhidos locais e momentos para as coletas de dados com baixo ruído sonoro. Por fim, o método de ponto fixo foi considerado eficaz para amostragem de aves urbanas, desde que tais cuidados sejam considerados. / The urbanization is one of the anthropic activities with the highest growth rate in the world. Due to this reason, ecological research are conducted in the cities with the goal of recognizing its impacts, using birds as one of the tools to assess the environmental diagnosis. Therefore, the present study assessed the samples by point counts method, which is broadly used for bird census in many environments. Three issues that might affect the sampling of the birds by using this method were analyzed: 1) the habitat where the sampling is performed, observing the urban elements presented in the city; 2) the period of point count duration spent in each sample; 3) the potential factors which disturb the birds detectability, as urban noise and presence of curious citizens who can talk to the researcher in the point count. The research area was stratified from the amount of tree canopies in the selected suburbs, where 90 sample units were selected. In these units, the presence of urban elements, the richness, the number of birds contacts, the noise and the presence of conversations were quantified. The results showed that the number of species and contacts can be benefited from the tree canopy area, while build up areas and impermeable grounds may harm the number of species, although the contact number is harmed only by the impermeable grounds. The number of observed species did not differ significantly after nine minutes of sample period, however the number of contacts kept increasing, demonstrating a repeated counting birds after this interval. The species richness was significantly different between the samples collected in dry and wet seasons. As the urban noise increased, a lower number of species and birds contacts was acknowledged. The incidence of conversation performed by curious people was low, not being able to harm the sample collection. The results suggest that: the bird survey inside the cities by point counts should consider the composition of environment, since the richness and the number of birds contacts can vary according to the presence of different elements; the time of interval should not exceed nine minutes; when possible, different annual seasons should be used for sampling, since differences may be found among them; places and moments for the sampling should be chosen with a low noise. Finally, the point counts method was considered efficient for the sampling of urban birds, provided that such care are considered.
37

Feature selection based on information theory

Bonev, Boyan 29 June 2010 (has links)
Along with the improvement of data acquisition techniques and the increasing computational capacity of computers, the dimensionality of the data grows higher. Pattern recognition methods have to deal with samples consisting of thousands of features and the reduction of their dimensionality becomes crucial to make them tractable. Feature selection is a technique for removing the irrelevant and noisy features and selecting a subset of features which describe better the samples and produce a better classification performance. It is becoming an essential part of most pattern recognition applications. / In this thesis we propose a feature selection method for supervised classification. The main contribution is the efficient use of information theory, which provides a solid theoretical framework for measuring the relation between the classes and the features. Mutual information is considered to be the best measure for such purpose. Traditionally it has been measured for ranking single features without taking into account the entire set of selected features. This is due to the computational complexity involved in estimating the mutual information. However, in most data sets the features are not independent and their combination provides much more information about the class, than the sum of their individual prediction power. / Methods based on density estimation can only be used for data sets with a very high number of samples and low number of features. Due to the curse of dimensionality, in a multi-dimensional feature space the amount of samples required for a reliable density estimation is very high. For this reason we analyse the use of different estimation methods which bypass the density estimation and estimate entropy directly from the set of samples. These methods allow us to efficiently evaluate sets of thousands of features. / For high-dimensional feature sets another problem is the search order of the feature space. All non-prohibitive computational cost algorithms search for a sub-optimal feature set. Greedy algorithms are the fastest and are the ones which incur less overfitting. We show that from the information theoretical perspective, a greedy backward selection algorithm conserves the amount of mutual information, even though the feature set is not the minimal one. / We also validate our method in several real-world applications. We apply feature selection to omnidirectional image classification through a novel approach. It is appearance-based and we select features from a bank of filters applied to different parts of the image. The context of the task is place recognition for mobile robotics. Another set of experiments are performed on microarrays from gene expression databases. The classification problem aims to predict the disease of a new patient. We present a comparison of the classification performance and the algorithms we present showed to outperform the existing ones. Finally, we succesfully apply feature selection to spectral graph classification. All the features we use are for unattributed graphs, which constitutes a contribution to the field. We also draw interesting conclusions about which spectral features matter most, under different experimental conditions. In the context of graph classification we also show important is the precise estimation of mutual information and we analyse its impact on the final classification results.
38

Proposition d'une méthode spectrale combinée LDA et LLE pour la réduction non-linéaire de dimension : Application à la segmentation d'images couleurs / Proposition of a new spectral method combining LDA and LLE for non-linear dimension reduction : Application to color images segmentation

Hijazi, Hala 19 December 2013 (has links)
Les méthodes d'analyse de données et d'apprentissage ont connu un développement très important ces dernières années. En effet, après les réseaux de neurones, les machines à noyaux (années 1990), les années 2000 ont vu l'apparition de méthodes spectrales qui ont fourni un cadre mathématique unifié pour développer des méthodes de classification originales. Parmi celles-ci ont peut citer la méthode LLE pour la réduction de dimension non linéaire et la méthode LDA pour la discrimination de classes. Une nouvelle méthode de classification est proposée dans cette thèse, méthode issue d'une combinaison des méthodes LLE et LDA. Cette méthode a donné des résultats intéressants sur des ensembles de données synthétiques. Elle permet une réduction de dimension non-linéaire suivie d'une discrimination efficace. Ensuite nous avons montré que cette méthode pouvait être étendue à l'apprentissage semi-supervisé. Les propriétés de réduction de dimension et de discrimination de cette nouvelle méthode, ainsi que la propriété de parcimonie inhérente à la méthode LLE nous ont permis de l'appliquer à la segmentation d'images couleur avec succès. La propriété d'apprentissage semi-supervisé nous a enfin permis de segmenter des images bruitées avec de bonnes performances. Ces résultats doivent être confortés mais nous pouvons d'ores et déjà dégager des perspectives de poursuite de travaux intéressantes. / Data analysis and learning methods have known a huge development during these last years. Indeed, after neural networks, kernel methods in the 90', spectral methods appeared in the years 2000. Spectral methods provide an unified mathematical framework to expand new original classification methods. Among these new techniques, two methods can be highlighted : LLE for non-linear dimension reduction and LDA as discriminating classification method. In this thesis document a new classification technique is proposed combining LLE and LDA methods. This new method makes it possible to provide efficient non-linear dimension reduction and discrimination. Then an extension of the method to semi-supervised learning is proposed. Good properties of dimension reduction and discrimination associated with the sparsity property of the LLE technique make it possible to apply our method to color images segmentation with success. Semi-supervised version of our method leads to efficient segmentation of noisy color images. These results have to be extended and compared with other state-of-the-art methods. Nevertheless interesting perspectives of this work are proposed in conclusion for future developments.
39

Concept Based Knowledge Discovery from Biomedical Literature.

Radovanovic, Aleksandar. January 2009 (has links)
<p>This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology&nbsp / resented can be integrated with the researchers&rsquo / own knowledge, experimentation and observations for optimal progression of scientific research.</p>
40

Detecting Swiching Points and Mode of Transport from GPS Tracks

Araya, Yeheyies January 2012 (has links)
In recent years, various researches are under progress to enhance the quality of the travel survey. These researches were mainly performed with the aid of GPS technology. Initially the researches were mainly focused on the vehicle travel mode due to the availability of GPS technology in vehicle. But, nowadays due to the accessible of GPS devices for personal uses, researchers have diverted their focus on personal mobility in all travel modes. This master’s thesis aimed at developing a mechanism to extract one type of travel survey information particularly travel mode from collected GPS dataset. The available GPS dataset is collected for travel modes of walk, bike, car, and public transport travel modes such as bus, train and subway. The developed procedure consists of two stages where the first is the dividing the track trips into trips and further the trips into segments by means of a segmentation process. The segmentation process is based on an assumption that a traveler switches from one transportation mode to the other. Thus, the trips are divided into walking and non walking segments. The second phase comprises a procedure to develop a classification model to infer the separated segments with travel modes of walk, bike, bus, car, train and subway. In order to develop the classification model, a supervised classification method has been used where decision tree algorithm is adopted. The highest obtained prediction accuracy of the classification system is walk travel mode with 75.86%. In addition, the travel modes of bike and bus have shown the lowest prediction accuracy. Moreover, the developed system has showed remarkable results that could be used as baseline for further similar researches.

Page generated in 0.5531 seconds