Global ETD Search

81	Development of novel unsupervised and supervised informatics methods for drug discovery applications Mohiddin, Syed B. 22 February 2006 (has links) No description available. Engineering, Chemical Unsupervised Classification Supervised Classification Principal Component Analysis Partial Least Squares Hierarchical K-means Clustering Identifying Diverse Molecular Targets
82	Stratégie d'évaluation de l'état des transformateurs : esquisse de solutions pour la gestion intégrée des transformateurs vieillissants / Transformer condition assesment strategy : Outline solutions for aging transformers integrated management Eke, Samuel 11 June 2018 (has links) Cette thèse de doctorat traite des méthodes d’évaluation de l’état des transformateurs de puissance à huile. Elle apporte une approche particulière de mise en oeuvre des méthodes de classification dans la fouille de données. Elle propose une stratégie qui met en oeuvre deux nouveaux indicateurs de santé de l’huile construit à partir d’un système neuro flou ANFIS (Adaptative Neuro-Fuzzy Inference System) et un classifieur ou prédicteur de défaut construit à partir des méthodes de classification supervisée, notamment le classifieur Bayésien naïf. Un organigramme simple et efficace d’évaluation de l’état des transformateurs y est proposé. Il permet de faire une analyse rapide des paramètres issus des analyses physico-chimiques de l’huile et de des gaz dissous. Une exploitation des méthodes de classification non supervisée, notamment les méthodes de k-moyennes et C-moyennes flous a permis de reconstruire les périodes de fonctionnement d’un transformateur marquées par des défauts particuliers. Il a aussi été démontré comment ces méthodes peuvent servir d’outil d’aide à l’organisation de la maintenance d’un groupe de transformateurs à partir des données d’analyses d’huile disponibles. / This PhD thesis deals the assessment method of the state of power transformers filled with oil. It brings a new approach by implementing classification methods and data mining dedicated to transformer maintenance. It proposes a strategy based on two new oil health indicators built from an adaptive Neuro-Fuzzy Inference System (ANFIS). Two classifiers were built on a labeled learning database. The Naive Bayes classifier was retained for the detection of fault from gases dissolved in oil. A simple and efficient flowchart for evaluating the condition of transformers is proposed. It allows a quick analysis of the parameters resulting from physicochemical analyzes of oil and dissolved gases. Using unsupervised classification techniques through the methods of kmeans and fuzzy C-means allowed to reconstruct operating periods of a transformer, with some particular faults. It has also been demonstrated how these methods can be used as tool to help the maintenance of a group of transformers from available oil analysis data. Transformateur de puissance Analyse des gaz dissous Classification supervisée Classification non supervisée Maintenance Classifieur Evaluation Données Vieillissement Huile Isolante Isolation solide Power transformer Dissolved gas analysis Supervised classification Unsupervised classification Maintenance Classifier Evaluation Data Aging Insulating oil Solid insulation
83	EMG Signal Decomposition Using Motor Unit Potential Train Validity Parsaei, Hossein 09 1900 (has links) Electromyographic (EMG) signal decomposition is the process of resolving an EMG signal into its component motor unit potential trains (MUPTs). The extracted MUPTs can aid in the diagnosis of neuromuscular disorders and the study of the neural control of movement, but only if they are valid trains. Before using decomposition results and the motor unit potential (MUP) shape and motor unit (MU) firing pattern information related to each active MU for either clinical or research purposes the fact that the extracted MUPTs are valid needs to be confirmed. The existing MUPT validation methods are either time consuming or related to operator experience and skill. More importantly, they cannot be executed during automatic decomposition of EMG signals to assist with improving decomposition results. To overcome these issues, in this thesis the possibility of developing automatic MUPT validation algorithms has been explored. Several methods based on a combination of feature extraction techniques, cluster validation methods, supervised classification algorithms, and multiple classifier fusion techniques were developed. The developed methods, in general, use either the MU firing pattern or MUP-shape consistency of a MUPT, or both, to estimate its overall validity. The performance of the developed systems was evaluated using a variety of MUPTs obtained from the decomposition of several simulated and real intramuscular EMG signals. Based on the results achieved, the methods that use only shape or only firing pattern information had higher generalization error than the systems that use both types of information. For the classifiers that use MU firing pattern information of a MUPT to determine its validity, the accuracy for invalid trains decreases as the number of missed-classification errors in trains increases. Likewise, for the methods that use MUP-shape information of a MUPT to determine its validity, the classification accuracy for invalid trains decreases as the within-train similarity of the invalid trains increase. Of the systems that use both shape and firing pattern information, those that separately estimate MU firing pattern validity and MUP-shape validity and then estimate the overall validity of a train by fusing these two indices using trainable fusion methods performed better than the single classifier scheme that estimates MUPT validity using a single classifier, especially for the real data used. Overall, the multi-classifier constructed using trainable logistic regression to aggregate base classifier outputs had the best performance with overall accuracy of 99.4% and 98.8% for simulated and real data, respectively. The possibility of formulating an algorithm for automated editing MUPTs contaminated with a high number of false-classification errors (FCEs) during decomposition was also investigated. Ultimately, a robust method was developed for this purpose. Using a supervised classifier and MU firing pattern information provided by each MUPT, the developed algorithm first determines whether a given train is contaminated by a high number of FCEs and needs to be edited. For contaminated MUPTs, the method uses both MU firing pattern and MUP shape information to detect MUPs that were erroneously assigned to the train. Evaluation based on simulated and real MU firing patterns, shows that contaminated MUPTs could be detected with 84% and 81% accuracy for simulated and real data, respectively. For a given contaminated MUPT, the algorithm on average correctly classified around 92.1% of the MUPs of the MUPT. The effectiveness of using the developed MUPT validation systems and the MUPT editing methods during EMG signal decomposition was investigated by integrating these algorithms into a certainty-based EMG signal decomposition algorithm. Overall, the decomposition accuracy for 32 simulated and 30 real EMG signals was improved by 7.5% (from 86.7% to 94.2%) and 3.4% (from 95.7% to 99.1%), respectively. A significant improvement was also achieved in correctly estimating the number of MUPTs represented in a set of detected MUPs. The simulated and real EMG signals used were comprised of 3–11 and 3–15 MUPTs, respectively. Classifier fusion cluster validation EMG signal decomposition motor unit firing patterns motor unit potential train motor unit potential train validation motor unit potential train validity supervised classification System Design Engineering
84	EMG Signal Decomposition Using Motor Unit Potential Train Validity Parsaei, Hossein 09 1900 (has links) Electromyographic (EMG) signal decomposition is the process of resolving an EMG signal into its component motor unit potential trains (MUPTs). The extracted MUPTs can aid in the diagnosis of neuromuscular disorders and the study of the neural control of movement, but only if they are valid trains. Before using decomposition results and the motor unit potential (MUP) shape and motor unit (MU) firing pattern information related to each active MU for either clinical or research purposes the fact that the extracted MUPTs are valid needs to be confirmed. The existing MUPT validation methods are either time consuming or related to operator experience and skill. More importantly, they cannot be executed during automatic decomposition of EMG signals to assist with improving decomposition results. To overcome these issues, in this thesis the possibility of developing automatic MUPT validation algorithms has been explored. Several methods based on a combination of feature extraction techniques, cluster validation methods, supervised classification algorithms, and multiple classifier fusion techniques were developed. The developed methods, in general, use either the MU firing pattern or MUP-shape consistency of a MUPT, or both, to estimate its overall validity. The performance of the developed systems was evaluated using a variety of MUPTs obtained from the decomposition of several simulated and real intramuscular EMG signals. Based on the results achieved, the methods that use only shape or only firing pattern information had higher generalization error than the systems that use both types of information. For the classifiers that use MU firing pattern information of a MUPT to determine its validity, the accuracy for invalid trains decreases as the number of missed-classification errors in trains increases. Likewise, for the methods that use MUP-shape information of a MUPT to determine its validity, the classification accuracy for invalid trains decreases as the within-train similarity of the invalid trains increase. Of the systems that use both shape and firing pattern information, those that separately estimate MU firing pattern validity and MUP-shape validity and then estimate the overall validity of a train by fusing these two indices using trainable fusion methods performed better than the single classifier scheme that estimates MUPT validity using a single classifier, especially for the real data used. Overall, the multi-classifier constructed using trainable logistic regression to aggregate base classifier outputs had the best performance with overall accuracy of 99.4% and 98.8% for simulated and real data, respectively. The possibility of formulating an algorithm for automated editing MUPTs contaminated with a high number of false-classification errors (FCEs) during decomposition was also investigated. Ultimately, a robust method was developed for this purpose. Using a supervised classifier and MU firing pattern information provided by each MUPT, the developed algorithm first determines whether a given train is contaminated by a high number of FCEs and needs to be edited. For contaminated MUPTs, the method uses both MU firing pattern and MUP shape information to detect MUPs that were erroneously assigned to the train. Evaluation based on simulated and real MU firing patterns, shows that contaminated MUPTs could be detected with 84% and 81% accuracy for simulated and real data, respectively. For a given contaminated MUPT, the algorithm on average correctly classified around 92.1% of the MUPs of the MUPT. The effectiveness of using the developed MUPT validation systems and the MUPT editing methods during EMG signal decomposition was investigated by integrating these algorithms into a certainty-based EMG signal decomposition algorithm. Overall, the decomposition accuracy for 32 simulated and 30 real EMG signals was improved by 7.5% (from 86.7% to 94.2%) and 3.4% (from 95.7% to 99.1%), respectively. A significant improvement was also achieved in correctly estimating the number of MUPTs represented in a set of detected MUPs. The simulated and real EMG signals used were comprised of 3–11 and 3–15 MUPTs, respectively. Classifier fusion cluster validation EMG signal decomposition motor unit firing patterns motor unit potential train motor unit potential train validation motor unit potential train validity supervised classification System Design Engineering
85	Apprentissage automatique de caractéristiques audio : application à la génération de listes de lecture thématiques / Machine learning algorithms applied to audio features analysis : application in the automatic generation of thematic musical playlists Bayle, Yann 19 June 2018 (has links) Ce mémoire de thèse de doctorat présente, discute et propose des outils de fouille automatique de mégadonnées dans un contexte de classification supervisée musical.L'application principale concerne la classification automatique des thèmes musicaux afin de générer des listes de lecture thématiques.Le premier chapitre introduit les différents contextes et concepts autour des mégadonnées musicales et de leur consommation.Le deuxième chapitre s'attelle à la description des bases de données musicales existantes dans le cadre d'expériences académiques d'analyse audio.Ce chapitre introduit notamment les problématiques concernant la variété et les proportions inégales des thèmes contenus dans une base, qui demeurent complexes à prendre en compte dans une classification supervisée.Le troisième chapitre explique l'importance de l'extraction et du développement de caractéristiques audio et musicales pertinentes afin de mieux décrire le contenu des éléments contenus dans ces bases de données.Ce chapitre explique plusieurs phénomènes psychoacoustiques et utilise des techniques de traitement du signal sonore afin de calculer des caractéristiques audio.De nouvelles méthodes d'agrégation de caractéristiques audio locales sont proposées afin d'améliorer la classification des morceaux.Le quatrième chapitre décrit l'utilisation des caractéristiques musicales extraites afin de trier les morceaux par thèmes et donc de permettre les recommandations musicales et la génération automatique de listes de lecture thématiques homogènes.Cette partie implique l'utilisation d'algorithmes d'apprentissage automatique afin de réaliser des tâches de classification musicale.Les contributions de ce mémoire sont résumées dans le cinquième chapitre qui propose également des perspectives de recherche dans l'apprentissage automatique et l'extraction de caractéristiques audio multi-échelles. / This doctoral dissertation presents, discusses and proposes tools for the automatic information retrieval in big musical databases.The main application is the supervised classification of musical themes to generate thematic playlists.The first chapter introduces the different contexts and concepts around big musical databases and their consumption.The second chapter focuses on the description of existing music databases as part of academic experiments in audio analysis.This chapter notably introduces issues concerning the variety and unequal proportions of the themes contained in a database, which remain complex to take into account in supervised classification.The third chapter explains the importance of extracting and developing relevant audio features in order to better describe the content of music tracks in these databases.This chapter explains several psychoacoustic phenomena and uses sound signal processing techniques to compute audio features.New methods of aggregating local audio features are proposed to improve song classification.The fourth chapter describes the use of the extracted audio features in order to sort the songs by themes and thus to allow the musical recommendations and the automatic generation of homogeneous thematic playlists.This part involves the use of machine learning algorithms to perform music classification tasks.The contributions of this dissertation are summarized in the fifth chapter which also proposes research perspectives in machine learning and extraction of multi-scale audio features. Annotations musicales automatiques Apprentissage automatique et profond Classification supervisée Fouille de mégadonnées Psychoacoustique Traitement du signal audio numérique Big data mining Machine and deep learning Digital audio signal processing Music information retrieval Psychoacoustics Supervised classification
86	ANÁLISE MULTITEMPORAL DO USO DA TERRA E COBERTURA FLORESTAL COM DADOS DOS SATÉLITES LANDSAT E ALOS / MULTITEMPORAL ANALYSIS OF LAND USE AND FOREST COVERAGE WITH DATA FROM LANDSAT AND ALOS SATELLITES Torres, Daniela Ricalde 29 July 2011 (has links) Conselho Nacional de Desenvolvimento Científico e Tecnológico / The monitoring of the use and coverage is very important when studying determined regions, just because it helps knowing the environmental reality and contributes to solve problems that can probably appear. This research was done from the images of ALOS and LANDSAT satellites. Its main objective was to have a multi-temporal analysis of Arroio Grande micro watershed, central region of Rio Grande do Sul. The specific purposes were to identify and to quantify the different classes of land use found in this micro watershed along the 1987, 1998, 2002, 2005, 2007 and 2009 periods, as well as cross the land use information to show the forest coverage changes during the 22 years of analysis. The software SPRING 5.1.7 was employed to classify the supervised images through Bhattacharya, a sorter algorithm, and the map spatial analysis was done through the Spatial Language of Algebraic Geoprocessing program with the same computational application. The classes of land use as forest, field, agriculture, irrigated agriculture, exposed soil and water layer were observed in the images of each year in this analysis. These classes were utilized in the spatial analysis of the forest coverage in which forest monitoring parameters have been defined (forest maintenance and regeneration, deforestation). In this research the principal results that have been noticed were the increase of 17,98% on the distributed forest coverage, mainly in the areas of bigger declination, and the reduction of 16,32% on the field area. The analysis of the spatial forest coverage has presented stability with the landscape, in a gradual progression, because the area of forest maintenance, found in these 22 years, was 12.252,60ha, the forest regeneration was 4.389,12ha and only 1.853,82ha of deforested area. / O monitoramento do uso e cobertura da terra faz-se importante no estudo de determinadas regiões, pois auxilia no conhecimento da realidade ambiental e contribui na busca por soluções de problemas que possam se apresentar. A partir do uso de imagens dos satélites, ALOS e LANDSAT, foi realizada esta pesquisa com o objetivo principal de fazer uma análise multitemporal na microbacia do Arroio Grande, região central do Rio Grande do Sul, cujos objetivos específicos foram: Identificar e quantificar as diferentes classes de uso da terra encontradas na microbacia nos períodos de 1987, 1998, 2002, 2005, 2007 e 2009; além de cruzar as informações de uso da terra, evidenciando a cobertura florestal que sofreu alterações no decorrer dos 22 anos de análise. Para tanto, foram utilizados o software SPRING 5.1.7 para a classificação supervisionada das imagens, com a adoção do algoritmo classificador Bhattacharya, e a análise espacial dos mapas com a programação LEGAL do mesmo aplicativo computacional. Para esta análise, foram observadas as classes de uso do solo: floresta, campo, agricultura, agricultura irrigada, solo exposto e lâmina d água, nas imagens de cada ano. Estas classes foram empregadas na análise espacial da cobertura florestal em que foram definidos parâmetros para o monitoramento florestal (manutenção florestal, regeneração florestal e desmatamentos). Os principais resultados notados, nesta pesquisa, foram o aumento de 17,98% na cobertura florestal distribuída, principalmente, nas áreas de maiores declividade, e a redução de 16,32% sobre a área de campo. Quanto à análise espacial da cobertura florestal, esta mostrou-se em estabilidade com a paisagem, e em gradual progressão, pois a área de manutenção florestal encontrada, nestes 22 anos, foi de 12.252,60 ha, a regeneração florestal foi de 4.389,12 ha e apenas 1.853,82 ha de área desmatada. Classificação supervisionada Uso da terra Análise espacial Sensoriamento remoto Análise multitemporal Microbacia Supervised classification Land use Spatial analysis Remote sensing Multi-temporal analysis Micro watershed
87	Estimation non-paramétrique du quantile conditionnel et apprentissage semi-paramétrique : applications en assurance et actuariat / Nonparametric estimation of conditional quantile and semi-parametric learning : applications on insurance and actuarial data Knefati, Muhammad Anas 19 November 2015 (has links) La thèse se compose de deux parties : une partie consacrée à l'estimation des quantiles conditionnels et une autre à l'apprentissage supervisé. La partie "Estimation des quantiles conditionnels" est organisée en 3 chapitres : Le chapitre 1 est consacré à une introduction sur la régression linéaire locale, présentant les méthodes les plus utilisées, pour estimer le paramètre de lissage. Le chapitre 2 traite des méthodes existantes d’estimation nonparamétriques du quantile conditionnel ; Ces méthodes sont comparées, au moyen d’expériences numériques sur des données simulées et des données réelles. Le chapitre 3 est consacré à un nouvel estimateur du quantile conditionnel et que nous proposons ; Cet estimateur repose sur l'utilisation d'un noyau asymétrique en x. Sous certaines hypothèses, notre estimateur s'avère plus performant que les estimateurs usuels.<br> La partie "Apprentissage supervisé" est, elle aussi, composée de 3 chapitres : Le chapitre 4 est une introduction à l’apprentissage statistique et les notions de base utilisées, dans cette partie. Le chapitre 5 est une revue des méthodes conventionnelles de classification supervisée. Le chapitre 6 est consacré au transfert d'un modèle d'apprentissage semi-paramétrique. La performance de cette méthode est montrée par des expériences numériques sur des données morphométriques et des données de credit-scoring. / The thesis consists of two parts: One part is about the estimation of conditional quantiles and the other is about supervised learning. The "conditional quantile estimate" part is organized into 3 chapters. Chapter 1 is devoted to an introduction to the local linear regression and then goes on to present the methods, the most used in the literature to estimate the smoothing parameter. Chapter 2 addresses the nonparametric estimation methods of conditional quantile and then gives numerical experiments on simulated data and real data. Chapter 3 is devoted to a new conditional quantile estimator, we propose. This estimator is based on the use of asymmetrical kernels w.r.t. x. We show, under some hypothesis, that this new estimator is more efficient than the other estimators already used.<br> The "supervised learning" part is, too, with 3 chapters: Chapter 4 provides an introduction to statistical learning, remembering the basic concepts used in this part. Chapter 5 discusses the conventional methods of supervised classification. Chapter 6 is devoted to propose a method of transferring a semiparametric model. The performance of this method is shown by numerical experiments on morphometric data and credit-scoring data. Régression non-Paramétrique Quantile Paramètre de lissage Apprentissage statistique Classification supervisée Modèles à score unique Mean regression Quantile Smoothing parameter Statistical learning Supervised classification Semi parametric single index models 519.54
88	Apprentissage de connaissances structurelles à partir d’images satellitaires et de données exogènes pour la cartographie dynamique de l’environnement amazonien / Structurel Knowledge learning from satellite images and exogenous data for dynamic mapping of the amazonian environment Bayoudh, Meriam 06 December 2013 (has links) Les méthodes classiques d'analyse d'images satellites sont inadaptées au volume actuel du flux de données. L'automatisation de l'interprétation de ces images devient donc cruciale pour l'analyse et la gestion des phénomènes observables par satellite et évoluant dans le temps et l'espace. Ce travail vise à automatiser la cartographie dynamique de l'occupation du sol à partir d'images satellites, par des mécanismes expressifs, facilement interprétables en prenant en compte les aspects structurels de l'information géographique. Il s'inscrit dans le cadre de l'analyse d'images basée objet. Ainsi, un paramétrage supervisé d'un algorithme de segmentation d'images est proposé. Dans un deuxième temps, une méthode de classification supervisée d'objets géographiques est présentée combinant apprentissage automatique par programmation logique inductive et classement par l'approche multi-class rule set intersection. Ces approches sont appliquées à la cartographie de la bande côtière Guyanaise. Les résultats démontrent la faisabilité du paramétrage de la segmentation, mais également sa variabilité en fonction des classes de la carte de référence et des données d'entrée. Les résultats de la classification supervisée montrent qu'il est possible d'induire des règles de classification expressives, véhiculant des informations cohérentes et structurelles dans un contexte applicatif donnée et conduisant à des valeurs satisfaisantes de précision et de KAPPA (respectivement 84,6% et 0,7). Ce travail de thèse contribue ainsi à l'automatisation de la cartographie dynamique à partir d'images de télédétection et propose des perspectives originales et prometteuses. / Classical methods for satellite image analysis are inadequate for the current bulky data flow. Thus, automate the interpretation of such images becomes crucial for the analysis and management of phenomena changing in time and space, observable by satellite. Thus, this work aims at automating land cover cartography from satellite images, by expressive and easily interpretable mechanism, and by explicitly taking into account structural aspects of geographic information. It is part of the object-based image analysis framework, and assumes that it is possible to extract useful contextual knowledge from maps. Thus, a supervised parameterization methods of a segmentation algorithm is proposed. Secondly, a supervised classification of geographical objects is presented. It combines machine learning by inductive logic programming and the multi-class rule set intersection approach. These approaches are applied to the French Guiana coastline cartography. The results demonstrate the feasibility of the segmentation parameterization, but also its variability as a function of the reference map classes and of the input data. Yet, methodological developments allow to consider an operational implementation of such an approach. The results of the object supervised classification show that it is possible to induce expressive classification rules that convey consistent and structural information in a given application context and lead to reliable predictions, with overall accuracy and Kappa values equal to, respectively, 84,6% and 0,7. In conclusion, this work contributes to the automation of the dynamic cartography from remotely sensed images and proposes original and promising perpectives Télédétection Analyse d'image basée objet Segmentation Classification supervisée Apprentissage automatique Programmation logique inductive Cartes d'occupation Usage du sol Guyane française Remote sensing Object-based iage analysis Segmentation Supervised classification Machine learning Inductive logic programming Land cover Use maps French Guiana
89	Novel measures on directed graphs and applications to large-scale within-network classification Mantrach, Amin 25 October 2010 (has links) Ces dernières années, les réseaux sont devenus une source importante d’informations dans différents domaines aussi variés que les sciences sociales, la physique ou les mathématiques. De plus, la taille de ces réseaux n’a cessé de grandir de manière conséquente. Ce constat a vu émerger de nouveaux défis, comme le besoin de mesures précises et intuitives pour caractériser et analyser ces réseaux de grandes tailles en un temps raisonnable.<p>La première partie de cette thèse introduit une nouvelle mesure de similarité entre deux noeuds d’un réseau dirigé et pondéré :la covariance “sum-over-paths”. Celle-ci a une interprétation claire et précise :en dénombrant tous les chemins possibles deux noeuds sont considérés comme fortement corrélés s’ils apparaissent souvent sur un même chemin – de préférence court. Cette mesure dépend d’une distribution de probabilités, définie sur l’ensemble infini dénombrable des chemins dans le graphe, obtenue en minimisant l'espérance du coût total entre toutes les paires de noeuds du graphe sachant que l'entropie relative totale injectée dans le réseau est fixée à priori. Le paramètre d’entropie permet de biaiser la distribution de probabilité sur un large spectre :allant de marches aléatoires naturelles où tous les chemins sont équiprobables à des marches biaisées en faveur des plus courts chemins. Cette mesure est alors appliquée à des problèmes de classification semi-supervisée sur des réseaux de taille moyennes et comparée à l’état de l’art.<p>La seconde partie de la thèse introduit trois nouveaux algorithmes de classification de noeuds en sein d’un large réseau dont les noeuds sont partiellement étiquetés. Ces algorithmes ont un temps de calcul linéaire en le nombre de noeuds, de classes et d’itérations, et peuvent dés lors être appliqués sur de larges réseaux. Ceux-ci ont obtenus des résultats compétitifs en comparaison à l’état de l’art sur le large réseaux de citations de brevets américains et sur huit autres jeux de données. De plus, durant la thèse, nous avons collecté un nouveau jeu de données, déjà mentionné :le réseau de citations de brevets américains. Ce jeu de données est maintenant disponible pour la communauté pour la réalisation de tests comparatifs.<p>La partie finale de cette thèse concerne la combinaison d’un graphe de citations avec les informations présentes sur ses noeuds. De manière empirique, nous avons montré que des données basées sur des citations fournissent de meilleurs résultats de classification que des données basées sur des contenus textuels. Toujours de manière empirique, nous avons également montré que combiner les différentes sources d’informations (contenu et citations) doit être considéré lors d’une tâche de classification de textes. Par exemple, lorsqu’il s’agit de catégoriser des articles de revues, s’aider d’un graphe de citations extrait au préalable peut améliorer considérablement les performances. Par contre, dans un autre contexte, quand il s’agit de directement classer les noeuds du réseau de citations, s’aider des informations présentes sur les noeuds n’améliora pas nécessairement les performances.<p>La théorie, les algorithmes et les applications présentés dans cette thèse fournissent des perspectives intéressantes dans différents domaines.<p><p><p>In recent years, networks have become a major data source in various fields ranging from social sciences to mathematical and physical sciences. Moreover, the size of available networks has grow substantially as well. This has brought with it a number of new challenges, like the need for precise and intuitive measures to characterize and analyze large scale networks in a reasonable time. <p>The first part of this thesis introduces a novel measure between two nodes of a weighted directed graph: The sum-over-paths covariance. It has a clear and intuitive interpretation: two nodes are considered as highly correlated if they often co-occur on the same -- preferably short -- paths. This measure depends on a probability distribution over the (usually infinite) countable set of paths through the graph which is obtained by minimizing the total expected cost between all pairs of nodes while fixing the total relative entropy spread in the graph. The entropy parameter allows to bias the probability distribution over a wide spectrum: going from natural random walks (where all paths are equiprobable) to walks biased towards shortest-paths. This measure is then applied to semi-supervised classification problems on medium-size networks and compared to state-of-the-art techniques.<p>The second part introduces three novel algorithms for within-network classification in large-scale networks, i.e. classification of nodes in partially labeled graphs. The algorithms have a linear computing time in the number of edges, classes and steps and hence can be applied to large scale networks. They obtained competitive results in comparison to state-of-the-art technics on the large scale U.S.~patents citation network and on eight other data sets. Furthermore, during the thesis, we collected a novel benchmark data set: the U.S.~patents citation network. This data set is now available to the community for benchmarks purposes. <p>The final part of the thesis concerns the combination of a citation graph with information on its nodes. We show that citation-based data provide better results for classification than content-based data. We also show empirically that combining both sources of information (content-based and citation-based) should be considered when facing a text categorization problem. For instance, while classifying journal papers, considering to extract an external citation graph may considerably boost the performance. However, in another context, when we have to directly classify the network citation nodes, then the help of features on nodes will not improve the results.<p>The theory, algorithms and applications presented in this thesis provide interesting perspectives in various fields.<p> / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Sciences exactes et naturelles Network computers Kernel functions Graph theory -- Data processing Markov processes Ordinateurs de réseau Noyaux (Mathématiques) Théorie des graphes -- Informatique Markov, Processus de betweenness centrality large scale graphs semi-supervised classification graph kernels
90	"The Trees Act Not as Individuals"--Learning to See the Whole Picture in Biology Education and Remote Sensing Research Greenall, Rebeka A.F. 18 August 2023 (has links) (PDF) To increase equity and inclusion for underserved and excluded Indigenous students, we must make efforts to mitigate the unique barriers they face. As their knowledge systems have been historically excluded and erased in Western science, we begin by reviewing the literature on the inclusion of Traditional Ecological Knowledge (TEK) in biology education and describe best practices. Next, to better understand how Native Hawaiian and other Pacific Islander (NHPI) students integrate into the scientific community, we used Social Influence Theory as a framework to measure NHPI student science identity, self-efficacy, alignment with science values, and belonging. We also investigated how students feel their ethnic and science identities interact. We found that NHPI students do not significantly differ from non-NHPI students in these measures of integration, and that NHPI students are varied in how they perceive their ethnic and science identities interact. Some students experience conflict between the two identities, while others view the two as having a strengthening relationship. Next, we describe a lesson plan created to include Hawaiian TEK in a biology class using best practices described in the literature. This is followed by an empirical study on how students were impacted by this lesson. We measured student integration into the science community using science identity, self-efficacy, alignment with science values, and belonging. We found no significant differences between NHPI and non-NHPI students. We also looked at student participation, and found that all students participated more on intervention days involving TEK and other ways of knowing than on non-intervention days. Finally, we describe qualitative findings on how students were impacted by the TEK interventions. We found students were predominantly positively impacted by the inclusion of TEK and discuss future adjustments that could be made using their recommendations. The last chapter describes how we used remote sensing to investigate land cover in a fenced and unfenced region of the Koʻolau Mountains on the island of Oahu. After mapping the biodiversity hotspot Management Unit of Koloa, we found that there is slighlty more bare ground, grass, and bare ground/low vegetation mix in fenced, and thereby ungulate-free areas, than those that were unfenced and had ungulates. Implications of these findings and suggestions for future research are discussed. Indigenous knowledge systems Traditional Ecological Knowledge equity in education Native students HawaiÊ»i NHPI students decolonizing science cultural competency culturally responsive teaching place-based learning remote sensing GIS semi-supervised classification land cover classification Life Sciences

Search results