Global ETD Search

371	Parallel itemset mining in massively distributed environments / Fouille de motifs en parallèle dans des environnements massivement distribués Salah, Saber 20 April 2016 (has links) Le volume des données ne cesse de croître. À tel point qu'on parle aujourd'hui de "Big Data". La principale raison se trouve dans les progrès des outils informatique qui ont offert une grande flexibilité pour produire, mais aussi pour stocker des quantités toujours plus grandes. Les méthodes d'analyse de données ont toujours été confrontées à des quantités qui mettent en difficulté les capacités de traitement, ou qui les dépassent. Pour franchir les verrous technologiques associés à ces questions d'analyse, la communauté peut se tourner vers les techniques de calcul distribué. En particulier, l'extraction de motifs, qui est un des problèmes les plus abordés en fouille de données, présente encore souvent de grandes difficultés dans le contexte de la distribution massive et du parallélisme. Dans cette thèse, nous abordons deux sujets majeurs liés à l'extraction de motifs : les motifs fréquents, et les motifs informatifs (i.e., de forte entropie). / Le volume des données ne cesse de croître. À tel point qu'on parle aujourd'hui de "Big Data". La principale raison se trouve dans les progrès des outils informatique qui ont offert une grande flexibilité pour produire, mais aussi pour stocker des quantités toujours plus grandes.à l'extraction de motifs : les motifs fréquents, et les motifs informatifs (i.e., de forte entropie). Extraction de motifs Données distribuées Classification Pattern Mining Data distribution Classification
372	Approche topologique de la métrologie du mouvement pour des applications en réalité virtuelle / Topological approach of movement metrology for virtual reality applications Bensekka, Chakib 15 October 2018 (has links) Dans le domaine médical, une meilleure connaissance de la fonctionmotrice est susceptible de permettre d’établir des thérapies adaptéesà chaque lésion motrice et des outils d’études et de dépistage dansle cas de maladies neurodégénératives. Dans le domaine de la réalitévirtuelle, la reconnaissance du mouvement est un point angulaire dansl’interaction de l’avatar ou de la personne en immersion avec sonenvironnement. Plusieurs travaux ont été menés dans le but de proposerdes approches de classification du mouvement humain. L’idée principalede ces méthodes est d’extraire des invariants des données enregistréesafin de les regrouper en clusters. Cependant, l'étude du mouvementhumain avec des systèmes de capture de mouvement génère une quantitéde données volumineuse, avec des relations non linéaires entre elles.Les méthodes présentées dans la littérature scientifique utilisent cesdonnées soit directement comme entrée à des algorithmes declassification, soit en appliquant une méthode de réductiondimensionnelle, comme l’analyse en composantes principales avant laclassification. Ces méthodes restent extrêmement sensibles au bruitblanc pendant l'enregistrement ainsi qu’aux différences morphologiquesentre les sujets. Dans notre travail, nous allons présenter uneméthodologie de classification et de reconnaissance du mouvementhumain, qui se base sur l’analyse topologique des donnéescinématiques. L’analyse topologique sera réalisée via la persistancehomologique, qui est une méthode d’analyse des données volumineusesqui permet de leur associer une signature topologique. On combineracette méthode d’analyse topologique avec des algorithmesd’apprentissage afin d’augmenter la précision de la reconnaissance desmouvements en réduisant l'impact des différences morphologiques entreles sujets ainsi que l’impact du bruit blanc issu pendant l'étaped’acquisition du mouvement. Par ailleurs, on combinera la méthoded’analyse topologique avec un algorithme de réseaux de neuronestemporels, afin de construire une approche qui permet de prédire lasuite d’un mouvement à partir d’une partie d’un intervalled’enregistrement.Les résultats ont montré la capacité de l’approche proposée à obtenirune précision avec un taux élevé lors de la classification, ainsi quesa robustesse face au bruit blanc et aux différences morphologiquesentre les personnes. Les résultats ont montré aussi le cout élevé entemps de calcul de notre approche. Nous avons proposé des méthodes(algorithme et parallélisme) de façon à réduire les temps de calculs. / In the medical field, a better knowledge of the motor function isimportant for us to determine therapies adapted to each motor lesion andtools of studies and screening for neurodegenerative diseases. In thedomain of virtual reality, motion recognition is an issue in theinteraction of the avatar or the user in immersion with theirenvironment.Several studies have been conducted with the aim of proposingapproaches to the classification of human movement. The main idea ofthese methods is to extract invariants from the recorded data in orderto order them into clusters. However, the study of human motion withmotion capture systems generates a big quantity of data with nonlinearrelations between them. The presented methods in the scientificliterature use these data either directly as input to classificationalgorithms or by applying a dimensional reduction method such asprincipal component analysis prior to classification. These methodsremain extremely sensitive to white noise during recording as well asmorphological differences between subjects.In our work, we will present a methodology of classification andrecognition of human movement which is based on the topologicalanalysis of kinematic data. Topological analysis will be performed viahomological persistence which is a large data analysis method thatallows them to be topologically signed. This method of topologicalanalysis will be combined with learning algorithms to increase theaccuracy of motion recognition by reducing the impact of morphologicaldifferences between subjects, as well as the impact of white noiseissued during the step of movement acquisition. Also, we will combinethe topological analysis method with a temporal neural networkalgorithm in order to build an approach that allows to predict thecontinuation of a movement from a part of a recording interval.The results showed the ability of the proposed approach to achievehigh accuracy at classification, as well as its robustness againstwhite noise and morphological differences between subjects. Theresults also showed the high cost in computing time of our approachwhich we tried to reduce by modifying its steps and by rewriting thecode so that it can be executed in parallel. Topologie Réalité virtuelle Neurones Classification Topology Virtual Reality Neurons Classification
373	Analysis of the migratory potential of cancerous cells by image preprocessing, segmentation and classification / Analyse du potentiel migratoire des cellules cancéreuses par prétraitement et segmentation d'image et classification des données Syed, Tahir Qasim 13 December 2011 (has links) Ce travail de thèse s’insère dans un projet de recherche plus global dont l’objectif est d’analyser le potentiel migratoire de cellules cancéreuses. Dans le cadre de ce doctorat, on s’intéresse à l’utilisation du traitement des images pour dénombrer et classifier les cellules présentes dans une image acquise via un microscope. Les partenaires biologistes de ce projet étudient l’influence de l’environnement sur le comportement migratoire de cellules cancéreuses à partir de cultures cellulaires pratiquées sur différentes lignées de cellules cancéreuses. Le traitement d’images biologiques a déjà donné lieu `a un nombre important de publications mais, dans le cas abordé ici et dans la mesure où le protocole d’acquisition des images acquises n'était pas figé, le défi a été de proposer une chaîne de traitements adaptatifs ne contraignant pas les biologistes dans leurs travaux de recherche. Quatre étapes sont détaillées dans ce mémoire. La première porte sur la définition des prétraitements permettant d’homogénéiser les conditions d’acquisition. Le choix d’exploiter l’image des écarts-type plutôt que la luminosité est un des résultats issus de cette première partie. La deuxième étape consiste à compter le nombre de cellules présentent dans l’image. Un filtre original, nommé filtre «halo», permettant de renforcer le centre des cellules afin d’en faciliter leur comptage, a été proposé. Une étape de validation statistique de ces centres permet de fiabiliser le résultat obtenu. L’étape de segmentation des images, sans conteste la plus difficile, constitue la troisième partie de ce travail. Il s’agit ici d’extraire des «vignettes», contenant une seule cellule. Le choix de l’algorithme de segmentation a été celui de la «Ligne de Partage des Eaux», mais il a fallu adapter cet algorithme au contexte des images faisant l’objet de cette étude. La proposition d’utiliser une carte de probabilités comme données d’entrée a permis d’obtenir une segmentation au plus près des bords des cellules. Par contre cette méthode entraine une sur-segmentation qu’il faut réduire afin de tendre vers l’objectif : «une région = une cellule». Pour cela un algorithme utilisant un concept de hiérarchie cumulative basée morphologie mathématique a été développée. Il permet d’agréger des régions voisines en travaillant sur une représentation arborescente de ces régions et de leur niveau associé. La comparaison des résultats obtenus par cette méthode à ceux proposés par d’autres approches permettant de limiter la sur-segmentation a permis de prouver l’efficacité de l’approche proposée. L’étape ultime de ce travail consiste dans la classification des cellules. Trois classes ont été définies : cellules allongées (migration mésenchymateuse), cellules rondes «blebbantes» (migration amiboïde) et cellules rondes «lisses» (stade intermédiaire du mode de migration). Sur chaque vignette obtenue à la fin de l’étape de segmentation, des caractéristiques de luminosité, morphologiques et texturales ont été calculées. Une première analyse de ces caractéristiques a permis d’élaborer une stratégie de classification, à savoir séparer dans un premier temps les cellules rondes des cellules allongées, puis séparer les cellules rondes «lisses» des «blebbantes». Pour cela on divise les paramètres en deux jeux qui vont être utilisés successivement dans ces deux étapes de classification. Plusieurs algorithmes de classification ont été testés pour retenir, au final, l’utilisation de deux réseaux de neurones permettant d’obtenir plus de 80% de bonne classification entre cellules longues et cellules rondes, et près de 90% de bonne classification entre cellules rondes «lisses» et «blebbantes». / This thesis is part of a broader research project which aims to analyze the potential migration of cancer cells. As part of this doctorate, we are interested in the use of image processing to count and classify cells present in an image acquired usinga microscope. The partner biologists of this project study the influence of the environment on the migratory behavior of cancer cells from cell cultures grown on different cancer cell lines. The processing of biological images has so far resulted in a significant number of publications, but in the case discussed here, since the protocol for the acquisition of images acquired was not fixed, the challenge was to propose a chain of adaptive processing that does not constrain the biologists in their research. Four steps are detailed in this paper. The first concerns the definition of pre-processing steps to homogenize the conditions of acquisition. The choice to use the image of standard deviations rather than the brightness is one of the results of this first part. The second step is to count the number of cells present in the image. An original filter, the so-called “halo” filter, that reinforces the centre of the cells in order to facilitate counting, has been proposed. A statistical validation step of the centres affords more reliability to the result. The stage of image segmentation, undoubtedly the most difficult, constitutes the third part of this work. This is a matter of extracting images each containing a single cell. The choice of segmentation algorithm was that of the “watershed”, but it was necessary to adapt this algorithm to the context of images included in this study. The proposal to use a map of probabilities as input yielded a segmentation closer to the edges of cells. As against this method leads to an over-segmentation must be reduced in order to move towards the goal: “one region = one cell”. For this algorithm the concept of using a cumulative hierarchy based on mathematical morphology has been developed. It allows the aggregation of adjacent regions by working on a tree representation ofthese regions and their associated level. A comparison of the results obtained by this method with those proposed by other approaches to limit over-segmentation has allowed us to prove the effectiveness of the proposed approach. The final step of this work consists in the classification of cells. Three classes were identified: spread cells (mesenchymal migration), “blebbing” round cells (amoeboid migration) and “smooth” round cells (intermediate stage of the migration modes). On each imagette obtained at the end of the segmentation step, intensity, morphological and textural features were calculated. An initial analysis of these features has allowed us to develop a classification strategy, namely to first separate the round cells from spread cells, and then separate the “smooth” and “blebbing” round cells. For this we divide the parameters into two sets that will be used successively in Two the stages of classification. Several classification algorithms were tested, to retain in the end, the use of two neural networks to obtain over 80% of good classification between long cells and round cells, and nearly 90% of good Classification between “smooth” and “blebbing” round cells. Data classification and machine learning
374	A Common Misconception in Multi-Label Learning Brodie, Michael Benjamin 01 November 2016 (has links) The majority of current multi-label classification research focuses on learning dependency structures among output labels. This paper provides a novel theoretical view on the purported assumption that effective multi-label classification models must exploit output dependencies. We submit that the flurry of recent dependency-exploiting, multi-label algorithms may stem from the deficiencies in existing datasets, rather than an inherent need to better model dependencies. We introduce a novel categorization of multi-label metrics, namely, evenly and unevenly weighted label metrics. We explore specific features that predispose datasets to improved classification by methods that model label dependence. Additionally, we provide an empirical analysis of 15 benchmark datasets, 1 real-life dataset, and a variety of synthetic datasets. We assert that binary relevance (BR) yields similar, if not better, results than dependency-exploiting models for metrics with evenly weighted label contributions. We qualify this claim with discussions on specific characteristics of datasets and models that render negligible the differences between BR and dependency-learning models. binary relevance multi-label classification multi-dimensional classification Computer Sciences
375	Genome relationships among Lotus species based on random amplified polymorphic DNA (RAPD) Campos, Lázara Pereira January 1992 (has links) No description available. Lotus corniculatus. Lotus -- Classification
376	Improving Multiclass Text Classification with the Support Vector Machine Rennie, Jason D. M., Rifkin, Ryan 16 October 2001 (has links) We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substantially outperform Naive Bayes. We present full multiclass results on two well-known text data sets, including the lowest error to date on both data sets. We develop a new indicator of binary performance to show that the SVM's lower multiclass error is a result of its improved binary performance. Furthermore, we demonstrate and explore the surprising result that one-vs-all classification performs favorably compared to other approaches even though it has no error-correcting properties. AI text classification support vector machine multiclass classification
377	A comparison of automated land cover/use classification methods for a Texas bottomland hardwood system using lidar, spot-5, and ancillary data Vernon, Zachary Isaac 15 May 2009 (has links) Bottomland hardwood forests are highly productive ecosystems which perform many important ecological services. Unfortunately, many bottomland hardwood forests have been degraded or lost. Accurate land cover mapping is crucial for management decisions affecting these disappearing systems. SPOT-5 imagery from 2005 was combined with Light Detection and Ranging (LiDAR) data from 2006 and several ancillary datasets to map a portion of the bottomland hardwood system found in the Sulphur River Basin of Northeast Texas. Pixel-based classification techniques, rulebased classification techniques, and object-based classification techniques were used to distinguish nine land cover types in the area. The rule-based classification (84.41% overall accuracy) outperformed the other classification methods because it more effectively incorporated the LiDAR and ancillary datasets when needed. This output was compared to previous classifications from 1974, 1984, 1991, and 1997 to determine abundance trends in the area’s bottomland hardwood forests. The classifications from 1974-1991 were conducted using identical class definitions and input imagery (Landsat MSS 60m), and the direct comparison demonstrates an overall declining trend in bottomland hardwood abundance. The trend levels off in 1997 when medium resolution imagery was first utilized (Landsat TM 30m) and the 2005 classification also shows an increase in bottomland hardwood from 1997 to 2005, when SPOT-5 10m imagery was used. However, when the classifications are re-sampled to the same resolution (60m), the percent area of bottomland hardwood consistently decreases from 1974-2005. Additional investigation of object-oriented classification proved useful. A major shortcoming of object-based classification is limited justification regarding the selection of segmentation parameters. Often, segmentation parameters are arbitrarily defined using general guidelines or are determined through a large number of parameter combinations. This research justifies the selection of segmentation parameters through a process that utilizes landscape metrics and statistical techniques to determine ideal segmentation parameters. The classification resulting from these parameters outperforms the classification resulting from arbitrary parameters by approximately three to six percent in terms of overall accuracy, demonstrating that landscape metrics can be successfully linked to segmentation parameters in order to create image objects that more closely resemble real-world objects and result in a more accurate final classification. Bottomland hardwood forests object-based classification rule-based classification segmentation
378	Using sentence-level classification to predict sentiment at the document-level Hutton, Amanda Rachel 21 August 2012 (has links) This report explores various aspects of sentiment mining. The two research goals for the report were: (1) to determine useful methods in increasing recall of negative sentences and (2) to determine the best method for applying sentence level classification to the document level. The methods in this report were applied to the Movie Reviews corpus at both the document and sentence level. The basic approach was to first identify polar and neutral sentences within the text and then classify the polar sentences as either positive or negative. The Maximum Entropy classifier was used as the baseline system in which the application of further methods was explored. Part-of-speech tagging was used for its effectiveness to determine if its inclusion increased recall of negative sentences. It was also used to aid in the handling of negations within sentences at the sentence level. Smoothing was investigated and various metrics to describe the sentiment composition were explored to address goal (2). Negative recall was shown to increase with the adjustment of the classification threshold and was also seen to increase through the methods used to address goal (2). Overall, classifying at the sentence level using bigrams and a cutoff value of one was observed to result in the highest evaluation scores. / text Sentiment mining Sentence-level classification Text classification Recall
379	CLASSIFICATION OF PRISON INMATES ACCORDING TO PRISON RULES AND REGULATIONS (ENVIRONMENT) Stebbins, Glenn Thurston January 1985 (has links) No description available. Prisoners -- Classification. Crime -- Classification. Inmates of institutions. Prison administration. Prison psychology.
380	Genome relationships among Lotus species based on random amplified polymorphic DNA (RAPD) Campos, Lázara Pereira January 1992 (has links) The usefulness of RAPDs (Random Amplified Polymorphic DNA) to distinguish among different taxa of Lotus was evaluated. The following species were included: L. corniculatus, L. tenuis, L. alpinus, L. japonicus, and L. uliginosus. Several accessions for each species were studied. Following DNA extraction, amplification reactions were performed in a Hybaid DNA Thermal Cycler, and the product visualized according to a standard procedure. Twenty primers were used for each species/accession. Clear bands and several polymorphisms were obtained for all primers. A phenogram was drawn based on the genetic distance among the species. L. alpinus appears as the most distant species from L. corniculatus, followed by L. uliginosus, L. tenuis, and L. japonicus. With the exception of L. alpinus, these findings are in agreement with previous experimental studies in the L. corniculatus group. The use of a greater number of primers and increased number of species may provide a greater resolution of the systematics of these taxa. Lotus -- Classification Lotus corniculatus.

Search results