Global ETD Search

11	Exploring Methods for Comparing Similarity of Dimensionally Inconsistent Multivariate Numerical Data Micic, Natasha, Neagu, Daniel, Torgunov, Denis, Campean, Felician 28 June 2018 (has links) no / When developing multivariate data classiﬁcation and clustering methodologies for data mining, it is clear that most literature contributions only really consider data that contain consistently the same attributes. There are however many cases in current big data analytics applications where for same topic and even same source data sets there are diﬀering attributes being measured, for a multitude of reasons (whether the speciﬁc design of an experiment or poor data quality and consistency). We deﬁne this class of data a dimensionally inconsistent multivariate data, a topic that can be considered a subclass of the Big Data Variety research. This paper explores some classiﬁcation methodologies commonly used in multivariate classiﬁcation and clustering tasks and considers how these traditional methodologies could be adapted to compare dimensionally inconsistent data sets. The study focuses on adapting two similarity measures: Robinson-Foulds tree distance metrics and Variation of Information; for comparing clustering of hierarchical cluster algorithms (such clusters are derived from the raw multivariate data). The results from experiments on engineering data highlight that adapting pairwise measures to exclude non-common attributes from the traditional distance metrics may not be the best method of classiﬁcation. We suggest that more specialised metrics of similarity are required to address challenges presented by dimensionally inconsistent multivariate data, with speciﬁc applications for big engineering data analytics. / Jaguar Land-Rover Big data Clustering Heterogeneous data sets Classiﬁcation methodologies Inconsistent multivariate data
12	Champs aléatoires de Markov cachés pour la cartographie du risque en épidémiologie / Hidden Markov random fields for risk mapping in epidemiology Azizi, Lamiae 13 December 2011 (has links) La cartographie du risque en épidémiologie permet de mettre en évidence des régionshomogènes en terme du risque aﬁn de mieux comprendre l’étiologie des maladies. Nousabordons la cartographie automatique d’unités géographiques en classes de risque commeun problème de classiﬁcation à l’aide de modèles de Markov cachés discrets et de modèlesde mélange de Poisson. Le modèle de Markov caché proposé est une variante du modèle dePotts, où le paramètre d’interaction dépend des classes de risque.Aﬁn d’estimer les paramètres du modèle, nous utilisons l’algorithme EM combiné à une approche variationnelle champ-moyen. Cette approche nous permet d’appliquer l’algorithmeEM dans un cadre spatial et présente une alternative efﬁcace aux méthodes d’estimation deMonte Carlo par chaîne de Markov (MCMC).Nous abordons également les problèmes d’initialisation, spécialement quand les taux de risquesont petits (cas des maladies animales). Nous proposons une nouvelle stratégie d’initialisationappropriée aux modèles de mélange de Poisson quand les classes sont mal séparées. Pourillustrer ces solutions proposées, nous présentons des résultats d’application sur des jeux dedonnées épidémiologiques animales fournis par l’INRA. / The analysis of the geographical variations of a disease and their representation on a mapis an important step in epidemiology. The goal is to identify homogeneous regions in termsof disease risk and to gain better insights into the mechanisms underlying the spread of thedisease. We recast the disease mapping issue of automatically classifying geographical unitsinto risk classes as a clustering task using a discrete hidden Markov model and Poisson classdependent distributions. The designed hidden Markov prior is non standard and consists of avariation of the Potts model where the interaction parameter can depend on the risk classes.The model parameters are estimated using an EM algorithm and the mean ﬁeld approximation. This provides a way to face the intractability of the standard EM in this spatial context,with a computationally efﬁcient alternative to more intensive simulation based Monte CarloMarkov Chain (MCMC) procedures.We then focus on the issue of dealing with very low risk values and small numbers of observedcases and population sizes. We address the problem of ﬁnding good initial parameter values inthis context and develop a new initialization strategy appropriate for spatial Poisson mixturesin the case of not so well separated classes as encountered in animal disease risk analysis.We illustrate the performance of the proposed methodology on some animal epidemiologicaldatasets provided by INRA. Classiﬁcation Cartographie du risque Mélanges de Poisson Modèle de Potts EM champ-moyen Classiﬁcation Discrete hidden Markov random ﬁeld Disease mapping Poisson mixtures Potts model Variational EM 510
13	Traffic analysis of low and ultra-low frame-rate videos / Analyse de trafic routier à partir de vidéos à faible débit Luo, Zhiming January 2017 (has links) Abstract: Nowadays, traffic analysis are relying on data collected from various traffic sensors. Among the various traffic surveillance techniques, video surveillance systems are often used for monitoring and characterizing traffic load. In this thesis, we focused on two aspects of traffic analysis without using motion features in low frame-rate videos: Traffic density flow analysis and Vehicle detection and classification. Traffic density flow analysis}: Knowing in real time when the traffic is fluid or when it jams is a key information to help authorities re-route vehicles and reduce congestion. Accurate and timely traffic flow information is strongly needed by individual travelers, the business sectors and government agencies. In this part, we investigated the possibility of monitoring highway traffic based on videos whose frame rate is too low to accurately estimate motion features. As we are focusing on analyzing traffic images and low frame-rate videos, traffic density is defined as the percentage of road being occupied by vehicles. In our previous work, we validated that traffic status is highly correlated to its texture features and that Convolutional Neural Networks (CNN) has the superiority of extracting discriminative texture features. We proposed several CNN models to segment traffic images into three different classes (road, car and background), classify traffic images into different categories (empty, fluid, heavy, jam) and predict traffic density without using any motion features. In order to generalize the model trained on a specific dataset to analyze new traffic scenes, we also proposed a novel transfer learning framework to do model adaptation. Vehicle detection and classification: The detection of vehicles pictured by traffic cameras is often the very first step of video surveillance systems, such as vehicle counting, tracking and retrieval. In this part, we explore different deep learning methods applied to vehicle detection and classification. Firstly, realizing the importance of large dataset for traffic analysis, we built and released the largest traffic dataset (MIO-TCD) in the world for vehicle localization and classification in collaboration with colleagues from Miovision inc. (Waterloo, On). With this dataset, we organized the Traffic Surveillance Workshop and Challenge in conjunction with CVPR 2017. Secondly, we evaluated several state-of-the-art deep learning methods for the classification and localization task on the MIO-TCD dataset. In light of the results, we may conclude that state-of-the-art deep learning methods exhibit a capacity to localize and recognize vehicle from a single video frame. While with a deep analysis of the results, we also identify scenarios for which state-of-the-art methods are still failing and propose concrete ideas for future work. Lastly, as saliency detection aims to highlight the most relevant objects in an image (e.g. vehicles in traffic scenes), we proposed a multi-resolution 4*5 grid CNN model for the salient object detection. The model enables near real-time high performance saliency detection. We also extend this model to do traffic analysis, experiment results show that our model can precisely segment foreground vehicles in traffic scenes. / De nos jours, l’analyse de traﬁc routier est de plus en plus automatisée et s’appuie sur des données issues de senseurs en tout genre. Parmi les approches d’analyse de traﬁc routier ﬁgurent les méthodes à base de vidéo. Les méthodes à base de vidéo ont pour but d’identiﬁer et de reconnaître les objets en mouvement (généralement des voitures et des piétons) et de comprendre leur dynamique. Un des déﬁs parmi les plus diﬃcile à résoudre est d’analyser des séquences vidéo dont le nombre d’images par seconde est très faible. Ce type de situation est pourtant fréquent considérant qu’il est très diﬃcile (voir impossible) de transmettre et de stocker sur un serveur un très grand nombre d’images issues de plusieurs caméras. Dans ce cas, les méthodes issues de l’état de l’art échouent car un faible nombre d’images par seconde ne permet pas d’extraire les caractéristiques vidéos utilisées par ces méthodes tels le ﬂux optique, la détection de mouvement et le suivi de véhicules. Au cours de cette thèse, nous nous sommes concentré sur l’analyse de traﬁc routier à partir de séquences vidéo contenant un très faible nombre d’images par seconde. Plus particulièrement, nous nous sommes concentrés sur les problème d’estimation de la densité du traﬁc routier et de la classiﬁcation de véhicules. Pour ce faire, nous avons proposé diﬀérents modèles à base de réseaux de neurones profonds (plus particulièrement des réseaux à convolution) ainsi que de nouvelles bases de données permettant d’entraîner les dits modèles. Parmi ces bases de données ﬁgure « MIO-TCD », la plus grosse base de données annotées au monde faite pour l’analyse de traﬁc routier. Traﬃc analysis Traﬃc density Video surveillance Vehicle localization Vehicle classiﬁcation Saliency detection Deep learning Convolutional neural networks Analyse de traﬁc routier Densité de traﬁc Surveillance vidéo Classiﬁcation de véhicule Apprentissage profond Réseaux à convolution
14	Algoritmo kNN na imputação de dados de espectros de massa do tipo MALDI-TOF: uma análise da influência da imputação com kNN sobre o desempenho de classificadores logísticos para identificação de bactérias Santos, Fábio dos 14 September 2018 (has links) Submitted by Angela Maria de Oliveira (amolivei@uepg.br) on 2018-11-06T17:08:39Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Fábio dos Santos.pdf: 1456053 bytes, checksum: 5ee15a88a68aaef87a46a8f42f816e32 (MD5) / Made available in DSpace on 2018-11-06T17:08:39Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Fábio dos Santos.pdf: 1456053 bytes, checksum: 5ee15a88a68aaef87a46a8f42f816e32 (MD5) Previous issue date: 2018-09-14 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / O processo de identiﬁcação de bactérias relacionadas ao crescimento vegetal,é alvo de diversos estudos na área de bioinformática. Uma das formas para realizar esta identiﬁcação é utilizar dados de espectrometria de massa do tipo MALDI-TOF para detectar a presença de proteínas ribossomaisemumaamostra,eentão,usarclassiﬁcadoresparaprocessarestesdadoseselecionar o rótulo com a maior probabilidade. Durante o processo de geração dos espectros de massa paraclassiﬁcaçãoécomumanãodetecçãodealgumdospicosrelacionadosaproteínasribossomais. Considerando isto, este trabalho apresenta um estudo sobre o uso do algoritmo kNN para imputação desses casos. O estudo foi desenvolvido com o uso de classiﬁcadores logísticos para identiﬁcação de bactérias da espécie Staphylococcus aureus e do gênero Bacillus. Durante os experimentos foram testados três técnicas para imputar dados: imputação com zero, imputação com a média do atributo faltante, e a imputação com kNN. Desta última foram usadas duas abordagens: função de agregação de média e função de agregação de mediana. O protocolo experimental implementado possibilitou avaliar a inﬂuência da imputação sobre os resultados de classiﬁcação sob diferentes cenários no que se refere ao número de variáveis faltantes. Os resultadosobtidosmostramqueoempregodokNNnãolevouàumareduçãododesempenhodos classiﬁcadores, em relação àquele observado quando do uso de dados completos. Além disto, a classiﬁcação de dados submetidos a imputação pelo kNN apresentou desempenho superior àquele veriﬁcado quando do uso dos demais métodos. / It is subject of several studies in bioinformatics area the plant growth promoting bacteria identiﬁcation process. An approach to performing it is to process sample’s ribosomal proteins data obtained by MALDI-TOF mass spectrometry through a classiﬁer and select the highest probability label. However, at the time of mass spectra generation, it is common not detecting some ribosomal proteins related peaks data. With this in mind, this work presents a study about data imputation through the kNN algorithm. Logistic classiﬁers were applied to identify bacteria of the Bacillus genus and the Staphylococcus aureus species while three data imputation techniques were tested: with zero, with the average of the missing attribute, and with kNN algorithm. From this latter imputation technique, two approaches were considered: average aggregation function and median aggregation function. The adopted experimental protocol investigated the imputation inﬂuence on classiﬁcation results under different scenarios regarding missing variablesnumber.TheresultsshowthatbothkNN’sapproachesdidnotpromotesigniﬁcantreduction on classiﬁers’ performance when compared with complete data approach and that the classiﬁcation of imputed data by kNN presented superior performance to that of other considered methods. Imputação com kNN Espectrometria de Massa Regressão Logística Classiﬁcação de Bactérias Imputation with kNN Mass Spectrometry Logistic Regression Bacterial Classiﬁcation
15	Icke-triviala billigaste väg-ruttningskonflikter - klassificering och sökmetoder / Non-triivial shortest path routing conflicts - classification and search methods Morén, Björn January 2010 (has links) <p>Within telecommunication and routing of traﬃc in IP-networks a protocol named“Open Shortest Path First” (OSPF) is widely used. This means that a server dealswith the routing over a network with given weights by calculating shortest paths touse for routing. If we assume that a desired traﬃc pattern is given the problem isto ﬁnd out if it is possible to set the weights so that the desired traﬃc pattern is apart of a shortest path graph. In this thesis we assume that it is a unique shortestpath. To search for weights that solve the problem leads to a complex LP-model. Analternative is to search in the LP-dual under certain restrictions. These solutions tothe LP-dual are called conﬂicts and a conﬂict means that there exists no weights sothat the desired traﬃc pattern is obtained. The goal of this thesis is to study, classifyand search for conﬂicts. An algorithm has been developed that ﬁnds some kind ofconﬂicts in polynomial time with respect to the size of the graph.</p> / <p>Inom telekommunikation och ruttning av datatraﬁk i IP-nätverk så används oftaett protokoll som kallas “Open Shortest Path First” (OSPF). Det innebär att enserver sköter ruttningen över ett nätverk genom att utifrån givna bågkostnaderberäkna billigaste vägar som används för ruttningen. Frågeställningen utgårfrån att vi har ett önskat ruttningsschema och vi vill ta reda på om det gåratt sätta bågkostnader så att det önskade ruttningsschemat ingår i en billigasteväg-graf. I det här examensarbetet splittas inte traﬁk utan varje billigaste vägär unik mellan två noder. Att söka efter bågkostnader som löser problemet geren krävande LP-modell och ett alternativ är att utgå från LP-dualen undervissa restriktioner. Dessa lösningar till LP-dualen benämns konﬂikter och enkonﬂikt motsvarar att det inte ﬁnns några bågkostnader så att det önskaderuttningsschemat fås. Målet med examensarbetet är att studera, klassiﬁceraoch söka efter konﬂikter. En algoritm har tagits fram som hittar vissa typer avsådana konﬂikter i polynomiell tid, sett till storleken på grafen.</p> OSPF SPR-conﬂicts SPRD ISPR classiﬁcation search methods OSPF BVR-konﬂikter design av BVR inversa billigaste väg- problem klassiﬁcering sökmetoder. Optimization, systems theory Optimeringslära, systemteori
16	Icke-triviala billigaste väg-ruttningskonflikter - klassificering och sökmetoder / Non-triivial shortest path routing conflicts - classification and search methods Morén, Björn January 2010 (has links) Within telecommunication and routing of traﬃc in IP-networks a protocol named“Open Shortest Path First” (OSPF) is widely used. This means that a server dealswith the routing over a network with given weights by calculating shortest paths touse for routing. If we assume that a desired traﬃc pattern is given the problem isto ﬁnd out if it is possible to set the weights so that the desired traﬃc pattern is apart of a shortest path graph. In this thesis we assume that it is a unique shortestpath. To search for weights that solve the problem leads to a complex LP-model. Analternative is to search in the LP-dual under certain restrictions. These solutions tothe LP-dual are called conﬂicts and a conﬂict means that there exists no weights sothat the desired traﬃc pattern is obtained. The goal of this thesis is to study, classifyand search for conﬂicts. An algorithm has been developed that ﬁnds some kind ofconﬂicts in polynomial time with respect to the size of the graph. / Inom telekommunikation och ruttning av datatraﬁk i IP-nätverk så används oftaett protokoll som kallas “Open Shortest Path First” (OSPF). Det innebär att enserver sköter ruttningen över ett nätverk genom att utifrån givna bågkostnaderberäkna billigaste vägar som används för ruttningen. Frågeställningen utgårfrån att vi har ett önskat ruttningsschema och vi vill ta reda på om det gåratt sätta bågkostnader så att det önskade ruttningsschemat ingår i en billigasteväg-graf. I det här examensarbetet splittas inte traﬁk utan varje billigaste vägär unik mellan två noder. Att söka efter bågkostnader som löser problemet geren krävande LP-modell och ett alternativ är att utgå från LP-dualen undervissa restriktioner. Dessa lösningar till LP-dualen benämns konﬂikter och enkonﬂikt motsvarar att det inte ﬁnns några bågkostnader så att det önskaderuttningsschemat fås. Målet med examensarbetet är att studera, klassiﬁceraoch söka efter konﬂikter. En algoritm har tagits fram som hittar vissa typer avsådana konﬂikter i polynomiell tid, sett till storleken på grafen. OSPF SPR-conﬂicts SPRD ISPR classiﬁcation search methods OSPF BVR-konﬂikter design av BVR inversa billigaste väg- problem klassiﬁcering sökmetoder. Optimization, systems theory Optimeringslära, systemteori
17	Champs aléatoires de Markov cachés pour la cartographie du risque en épidémiologie Azizi, Lamiae 13 December 2011 (has links) (PDF) La cartographie du risque en épidémiologie permet de mettre en évidence des régionshomogènes en terme du risque aﬁn de mieux comprendre l'étiologie des maladies. Nousabordons la cartographie automatique d'unités géographiques en classes de risque commeun problème de classiﬁcation à l'aide de modèles de Markov cachés discrets et de modèlesde mélange de Poisson. Le modèle de Markov caché proposé est une variante du modèle dePotts, où le paramètre d'interaction dépend des classes de risque.Aﬁn d'estimer les paramètres du modèle, nous utilisons l'algorithme EM combiné à une approche variationnelle champ-moyen. Cette approche nous permet d'appliquer l'algorithmeEM dans un cadre spatial et présente une alternative efﬁcace aux méthodes d'estimation deMonte Carlo par chaîne de Markov (MCMC).Nous abordons également les problèmes d'initialisation, spécialement quand les taux de risquesont petits (cas des maladies animales). Nous proposons une nouvelle stratégie d'initialisationappropriée aux modèles de mélange de Poisson quand les classes sont mal séparées. Pourillustrer ces solutions proposées, nous présentons des résultats d'application sur des jeux dedonnées épidémiologiques animales fournis par l'INRA. Classiﬁcation Cartographie du risque Mélanges de Poisson Modèle de Potts EM champ-moyen
18	<b>Sparse Ensemble Networks for Hyperspectral Image Classification</b> Rakesh Kumar Iyer (18424698) 23 April 2024 (has links) <p dir="ltr">We explore the efficacy of sparsity and ensemble model in the classification of hyperspectral images, a pivotal task in remote sensing applications. While Convolutional Neural Networks (CNNs) and Transformer models have shown promise in this domain, each exhibits distinct limitations; CNNs excel in capturing the spatial/local features but falter to capture spectral features, whereas Transformers captures the spectral features at the expense of spatial features. Furthermore, the computational cost associated with training several independent CNN and Transformer networks becomes expensive. To address these limitations, we propose a novel ensemble framework comprising pruned CNNs and Transformers, optimizing both spatial and spectral feature utilization while curbing computational costs. By integrating sparsity through model pruning, our approach effectively reduces redundancy and computational complexity without compromising accuracy. Through extensive experimentation, we find that our method achieves comparable accuracy to its non-sparse counterparts while decreasing the computational cost. Our contribution enhances remote sensing analytics by demonstrating the potential of sparse and ensemble models in improving the precision and computational efficiency of hyperspectral image classification.</p> Signal processing hyperspectral image classiﬁcation Deep Learning vision transformers ensemble neural networks Sparse Neural Network Remote sensing
19	Description et classification des masses mammaires pour le diagnostic du cancer du sein Cheikhrouhou, Imen 27 June 2012 (has links) (PDF) Le diagnostic assisté par ordinateur du cancer du sein devient de plus en plus une nécessité vu la croissance exponentielle du nombre de mammographies effectuées chaque année. En particulier, le diagnostic des masses mammaires et leur classification suscitent actuellement un grand intérêt. En effet, la complexité des formes traitées et la difficulté rencontrée afin de les discerner nécessitent l'usage de descripteurs appropriés. Dans ce travail, des méthodes de caractérisation adaptées aux pathologies mammaires sont proposées ainsi que l'étude de différentes méthodes de classification est abordée. Afin de pouvoir analyser les formes des masses, une étude concernant les différentes techniques de segmentation est réalisée. Cette étude nous a permis de nous orienter vers le modèle du level set basé sur la minimisation de l'énergie de la région évolutive. Une fois les images sont segmentées, une étude des différents descripteurs proposés dans la littérature est menée. Cependant, ces propositions présentent certaines limites telles que la sensibilité au bruit, la non invariance aux transformations géométriques et la description générale et imprécise des lésions. Dans ce contexte, nous proposons un nouveau descripteur intitulé les points terminaux du squelette (SEP) afin de caractériser les spiculations du contour des masses tout en respectant l'invariance à l'échelle. Un deuxième descripteur nommé la sélection des protubérances (PS) est proposé. Il assure de même le critère d'invariance et la description précise de la rugosité du contour. Toutefois, le SEP et le PS sont sensibles au bruit. Une troisième proposition intitulée le descripteur des masses spiculées (SMD) assurant une bonne robustesse au bruit est alors réalisée. Dans l'objectif de comparer différents descripteurs, une étude comparative entre différents classifieurs est effectuée. Les séparateurs à vaste marge (SVM) fournissent pour tous les descripteurs considérés le meilleur résultat de classification. Finalement, les descripteurs proposés ainsi que d'autres couramment utilisés dans le domaine du cancer du sein sont comparés afin de tester leur capacité à caractériser convenablement le contour des masses en question. La performance des trois descripteurs proposés et notamment le SMD est mise en évidence à travers les comparaisons effectuées. Cancer du sein Segmentation Descripteurs de forme Classiﬁcation
20	Développement d'outils et de méthodes de télédétection spatiale optique et radar nécessaires à la haute résolution spatiale Bombrun, Lionel 18 November 2008 (has links) (PDF) Le travail de recherche présenté dans ce mémoire de thèse est dédié au développement des méthodes en télédétection radar polarimétrique et interférométrique. L'interférométrie radar à synthèse d'ouverture renseigne sur la topographie de la zone étudiée ou sur ses déformations. Nous mettons en place des traitements interférométriques pour obtenir un champ de déplacement au sol. La polarimétrie radar étudie les interactions de l'onde électromagnétique avec le milieu étudié et nous informe sur les propriétés physiques des rétrodiffuseurs. Nous examinons en détail les deux modèles de paramétrisation des vecteurs de rétrodiﬁusion : le modèle alpha/beta et le modèle TSVM. Nous proposons ensuite d'utiliser la distribution de Fisher pour modéliser la texture dans les images polarimétriques. En utilisant le modèle multiplicatif scalaire, nous dérivons Pexpression littérale de la distribution de la matrice de cohérence et nous proposons d'implémenter cette nouvelle distribution dans un algorithme de segmentation hiérarchique. Les différentes méthodes proposées durant cette thèse ont été appliquées sur des données interférométriques en bande C sur les glaciers et sur des données polarimétriques en bande L dans le milieu urbain. Classiﬁcation Interférométrie Polarimétrie Segmentation Télédétection Texture

Search results