31 |
Traitement automatique d’informations appliqué aux ressources humaines / Automatic processing of information applied to human resourcesKessler, Rémy 10 July 2009 (has links)
Depuis les années 90, Internet est au coeur du marché du travail. D’abord mobilisée sur des métiers spécifiques, son utilisation s’étend à mesure qu’augmente le nombre d’internautes dans la population. La recherche d’emploi au travers des « bourses à l’emploi électroniques » est devenu une banalité et le e-recrutement quelque chose de courant. Cette explosion d’informations pose cependant divers problèmes dans leur traitement en raison de la grande quantité d’information difficile à gérer rapidement et efficacement pour les entreprises. Nous présentons dans ce mémoire, les travaux que nous avons développés dans le cadre du projet E-Gen, qui a pour but la création d’outils pour automatiser les flux d’informations lors d’un processus de recrutement. Nous nous intéressons en premier lieu à la problématique posée par le routage précis de courriels. La capacité d’une entreprise à gérer efficacement et à moindre coût ces flux d’informations, devient un enjeu majeur de nos jours pour la satisfaction des clients. Nous proposons l’application des méthodes d’apprentissage afin d’effectuer la classification automatique de courriels visant leur routage, en combinant techniques probabilistes et machines à vecteurs de support. Nous présentons par la suite les travaux qui ont été menés dans le cadre de l’analyse et l’intégration d’une offre d’emploi par Internet. Le temps étant un facteur déterminant dans ce domaine, nous présentons une solution capable d’intégrer une offre d’emploi d’une manière automatique ou assistée afin de pouvoir la diffuser rapidement. Basé sur une combinaison de systèmes de classifieurs pilotés par un automate de Markov, le système obtient de très bons résultats. Nous proposons également les diverses stratégies que nous avons mises en place afin de fournir une première évaluation automatisée des candidatures permettant d’assister les recruteurs. Nous avons évalué une palette de mesures de similarité afin d’effectuer un classement pertinent des candidatures. L’utilisation d’un modèle de relevance feedback a permis de surpasser nos résultats sur ce problème difficile et sujet à une grande subjectivité. / Since the 90s, Internet is at the heart of the labor market. First mobilized on specific expertise, its use spreads as increase the number of Internet users in the population. Seeking employment through "electronic employment bursary" has become a banality and e-recruitment something current. This information explosion poses various problems in their treatment with the large amount of information difficult to manage quickly and effectively for companies. We present in this PhD thesis, the work we have developed under the E-Gen project, which aims to create tools to automate the flow of information during a recruitment process.We interested first to the problems posed by the routing of emails. The ability of a companie to manage efficiently and at lower cost this information flows becomes today a major issue for customer satisfaction. We propose the application of learning methods to perform automatic classification of emails to their routing, combining technical and probabilistic vector machines support. After, we present work that was conducted as part of the analysis and integration of a job ads via Internet. We present a solution capable of integrating a job ad from an automatic or assisted in order to broadcast it quickly. Based on a combination of classifiers systems driven by a Markov automate, the system gets very good results. Thereafter, we present several strategies based on vectorial and probabilistic models to solve the problem of profiling candidates according to a specific job offer to assist recruiters. We have evaluated a range of measures of similarity to rank candidatures by using ROC curves. Relevance feedback approach allows to surpass our previous results on this task, difficult, diverse and higly subjective.
|
32 |
Similarity measures for scientific workflowsStarlinger, Johannes 08 January 2016 (has links)
In Laufe der letzten zehn Jahre haben Scientific Workflows als Werkzeug zur Erstellung von reproduzierbaren, datenverarbeitenden in-silico Experimenten an Aufmerksamkeit gewonnen, in die sowohl lokale Skripte und Anwendungen, als auch Web-Services eingebunden werden können. Über spezialisierte Online-Bibliotheken, sogenannte Repositories, können solche Workflows veröffentlicht und wiederverwendet werden. Mit zunehmender Größe dieser Repositories werden Ähnlichkeitsmaße für Scientific Workflows notwendig, etwa für Duplikaterkennung, Ähnlichkeitssuche oder Clustering von funktional ähnlichen Workflows. Die vorliegende Arbeit untersucht solche Ähnlichkeitsmaße für Scientific Workflows. Als erstes untersuchen wir ähnlichkeitsrelevante Eigenschaften von Scientific Workflows und identifizieren Charakteristika der Wiederverwendung ihrer Komponenten. Als zweites analysieren und reimplementieren wir existierende Lösungen für den Vergleich von Scientific Workflows entlang definierter Teilschritte des Vergleichsprozesses. Wir erstellen einen großen Gold-Standard Corpus von Workflowähnlichkeiten, der über 2400 Bewertungen für 485 Workflowpaare enthält, die von 15 Experten aus 6 Institutionen beigetragen wurden. Zum ersten Mal erlauben diese Vorarbeiten eine umfassende, vergleichende Evaluation verschiedener Ähnlichkeitsmaße für Scientific Workflows, in der wir einige vorige Ergebnisse bestätigen, andere aber revidieren. Als drittes stellen wir ein neue Methode für das Vergleichen von Scientific Workflows vor. Unsere Evaluation zeigt, dass diese neue Methode bessere und konsistentere Ergebnisse liefert und leicht mit anderen Ansätzen kombiniert werden kann, um eine weitere Qualitätssteigerung zu erreichen. Als viertes zweigen wir, wie die Resultate aus den vorangegangenen Schritten genutzt werden können, um aus Standardkomponenten eine Suchmaschine für schnelle, qualitativ hochwertige Ähnlichkeitssuche im Repositorymaßstab zu implementieren. / Over the last decade, scientific workflows have gained attention as a valuable tool to create reproducible in-silico experiments. Specialized online repositories have emerged which allow such workflows to be shared and reused by the scientific community. With increasing size of these repositories, methods to compare scientific workflows regarding their functional similarity become a necessity. To allow duplicate detection, similarity search, or clustering, similarity measures for scientific workflows are an essential prerequisite. This thesis investigates similarity measures for scientific workflows. We carry out four consecutive research tasks: First, we closely investigate the relevant properties of scientific workflows regarding their similarity and identify characteristics of re-use of their components. Second, we review and dissect existing approaches to scientific workflow comparison into a defined set of subtasks necessary in the process of workflow comparison, and re-implement previous approaches to each subtask. We create a large gold-standard corpus of expert-ratings on workflow similarity, with more than 2400 ratings provided for 485 pairs of workflows by 15 workflow experts from 6 institutions. For the first time, this allows comprehensive, comparative evaluation of different scientific workflow similarity measures, confirming some previous findings, but rejecting others. Third, we propose and evaluate a novel method for scientific workflow comparison. We show that this novel method provides results of both higher quality and higher consistency than previous approaches, and can easily be stacked and ensembled with other approaches for still better performance and higher speed. Fourth, we show how our findings can be leveraged to implement a search engine using off-the-shelf tools that performs fast, high quality similarity search for scientific workflows at repository-scale, a premier area of application for similarity measures for scientific workflows.
|
33 |
Leveraging Sequential Nature of Conversations for Intent ClassificationGotteti, Shree January 2021 (has links)
No description available.
|
34 |
Contributions to fuzzy object comparison and applications. Similarity measures for fuzzy and heterogeneous data and their applications.Bashon, Yasmina M. January 2013 (has links)
This thesis makes an original contribution to knowledge in the fi eld
of data objects' comparison where the objects are described by attributes
of fuzzy or heterogeneous (numeric and symbolic) data types.
Many real world database systems and applications require information
management components that provide support for managing
such imperfect and heterogeneous data objects. For example,
with new online information made available from various sources, in
semi-structured, structured or unstructured representations, new information
usage and search algorithms must consider where such data
collections may contain objects/records with di fferent types of data:
fuzzy, numerical and categorical for the same attributes.
New approaches of similarity have been presented in this research to
support such data comparison. A generalisation of both geometric and set theoretical similarity models has enabled propose new similarity
measures presented in this thesis, to handle the vagueness (fuzzy data
type) within data objects. A framework of new and unif ied similarity
measures for comparing heterogeneous objects described by numerical,
categorical and fuzzy attributes has also been introduced.
Examples are used to illustrate, compare and discuss the applications
and e fficiency of the proposed approaches to heterogeneous data comparison. / Libyan Embassy
|
35 |
Role Mining With Hierarchical Clustering and Binary Similarity Measures / Role mining med hierarkisk klustring och binära likhetsmåttOlsson, Magnus January 2023 (has links)
Role engineering, a critical task in role-based access control systems, is the process of identifying a complete set of roles that accurately reflect the structure of an organization. Role mining, a data-driven approach, utilizes data mining techniques on user-permission assignments represented as binary data to automatically derive these roles. However, relying solely on data-driven methods often leads to the generation of a large set of roles lacking interpretability. To address this limitation, this thesis presents a role mining algorithm, whose results can be viewed as an initial step in the role engineering process, in order to streamline the task of defining semantically meaningful roles, where human analysis is an inevitable post-processing step. The algorithm is based on hierarchical clustering analysis, and its main objective is identifying a sufficiently small set of roles that cover as large a proportion of the user-permission assignments as possible. To evaluate the performance of the algorithm, multiple real-world data sets representing diverse access control scenarios are utilized. The evaluation focuses on comparing various binary similarity measures, with the goal of determining the most suitable characteristics of a binary similarity measure to be used for role mining. The evaluation of different binary similarity measures provides insights into their effectiveness in achieving accurate role definitions to be used as a foundation for constructing meaningful roles. Ultimately, this research contributes to the advancement of role mining methodologies, facilitating improved access control systems that align with organizational needs and enhance security and efficiency. / Role engineering går ut på att identifiera en komplett uppsättning roller som återspeglar strukturen i en organisation och är en viktig uppgift när organisationer övergår till rollbaserad åtkomstkontroll. Role mining är en datadriven metod som använder data mining-tekniker på användarnas behörighetstilldelningar för att automatiskt härleda dessa roller. Dessa tilldelningar kan representeras som binär data. Att enbart förlita sig på datadrivna metoder leder dock ofta till att en stor uppsättning svårtolkade roller genereras. För att adressera denna begränsning har en role mining-algoritm utvecklas i det här arbetet. Genom att applicera algoritmen på den binära tilldelningsdatan kan de erhållna resultaten betraktas som ett inledande steg i role engineering-processen. Syftet är att effektivisera arbetet med att definiera semantiskt meningsfulla roller, där mänsklig analys är en oundviklig fas. Algoritmen är baserad på hierarkisk klustring och har som huvudsyfte att identifiera en lagom stor uppsättning roller som täcker så stor del av behörighetstilldelningarna som möjligt. För att utvärdera algoritmens prestanda appliceras den på flertalet datamängder insamlade från varierande verkliga åtkomstkontrollsystem. Utvärderingen fokuserar på att jämföra olika binära likhetsmått med målet att bestämma de mest lämpliga egenskaperna för ett binärt likhetsmått som ska användas för role mining. Utvärderingen av olika binära likhetsmått ger insikter i deras effektivitet att uppnå korrekta rolldefinitioner som kan användas som grund för att konstruera meningsfulla roller. Denna forskning bidrar till framsteg inom role mining och syftar till att underlätta övergången till rollbaserad åtkomstkontroll samt förbättra metoderna för att identifiera roller som överensstämmer med organisationsbehov och förbättrar säkerhet och effektivitet.
|
36 |
Extraction d'arguments de relations n-aires dans les textes guidée par une RTO de domaine / Extraction of arguments in N-ary relations in texts guided by a domain OTRBerrahou, Soumia Lilia 29 September 2015 (has links)
Aujourd'hui, la communauté scientifique a l'opportunité de partager des connaissances et d'accéder à de nouvelles informations à travers les documents publiés et stockés dans les bases en ligne du web. Dans ce contexte, la valorisation des données disponibles reste un défi majeur pour permettre aux experts de les réutiliser et les analyser afin de produire de la connaissance du domaine. Pour être valorisées, les données pertinentes doivent être extraites des documents puis structurées. Nos travaux s'inscrivent dans la problématique de la capitalisation des données expérimentales issues des articles scientifiques, sélectionnés dans des bases en ligne, afin de les réutiliser dans des outils d'aide à la décision. Les mesures expérimentales (par exemple, la perméabilité à l'oxygène d'un emballage ou le broyage d'une biomasse) réalisées sur différents objets d'études (par exemple, emballage ou procédé de bioraffinerie) sont représentées sous forme de relations n-aires dans une Ressource Termino-Ontologique (RTO). La RTO est modélisée pour représenter les relations n-aires en associant une partie terminologique et/ou linguistique aux ontologies afin d'établir une distinction claire entre la manifestation linguistique (le terme) et la notion qu'elle dénote (le concept). La thèse a pour objectif de proposer une contribution méthodologique d'extraction automatique ou semi-automatique d'arguments de relations n-aires provenant de documents textuels afin de peupler la RTO avec de nouvelles instances. Les méthodologies proposées exploitent et adaptent conjointement des approches de Traitement automatique de la Langue (TAL) et de fouille de données, le tout s'appuyant sur le support sémantique apporté par la RTO de domaine. De manière précise, nous cherchons, dans un premier temps, à extraire des termes, dénotant les concepts d'unités de mesure, réputés difficiles à identifier du fait de leur forte variation typographique dans les textes. Après la localisation de ces derniers par des méthodes de classification automatique, les variants d'unités sont identifiés en utilisant des mesures d'édition originales. La seconde contribution méthodologique de nos travaux repose sur l'adaptation et la combinaison de méthodes de fouille de données (extraction de motifs et règles séquentiels) et d'analyse syntaxique pour identifier les instances d'arguments de la relation n-aire recherchée. / Today, a huge amount of data is made available to the research community through several web-based libraries. Enhancing data collected from scientific documents is a major challenge in order to analyze and reuse efficiently domain knowledge. To be enhanced, data need to be extracted from documents and structured in a common representation using a controlled vocabulary as in ontologies. Our research deals with knowledge engineering issues of experimental data, extracted from scientific articles, in order to reuse them in decision support systems. Experimental data can be represented by n-ary relations which link a studied object (e.g. food packaging, transformation process) with its features (e.g. oxygen permeability in packaging, biomass grinding) and capitalized in an Ontological and Terminological Ressource (OTR). An OTR associates an ontology with a terminological and/or a linguistic part in order to establish a clear distinction between the term and the notion it denotes (the concept). Our work focuses on n-ary relation extraction from scientific documents in order to populate a domain OTR with new instances. Our contributions are based on Natural Language Processing (NLP) together with data mining approaches guided by the domain OTR. More precisely, firstly, we propose to focus on unit of measure extraction which are known to be difficult to identify because of their typographic variations. We propose to rely on automatic classification of texts, using supervised learning methods, to reduce the search space of variants of units, and then, we propose a new similarity measure that identifies them, taking into account their syntactic properties. Secondly, we propose to adapt and combine data mining methods (sequential patterns and rules mining) and syntactic analysis in order to overcome the challenging process of identifying and extracting n-ary relation instances drowned in unstructured texts.
|
37 |
Estructura computacional i aplicacions de la semblança molecular quànticaAmat Barnés, Lluís 02 June 2003 (has links)
La tesis tracta diferents aspectes relacionats amb el càlcul de la semblança quàntica, així com la seva aplicació en la racionalització i predicció de l'activitat de fàrmacs. Es poden destacar dos progressos importants en el desenvolupament de noves metodologies que faciliten el càlcul de les mesures de semblança quàntica. En primer lloc, la descripció de les molècules mitjançant les funciones densitat aproximades PASA (Promolecular Atomic Shell Approximation) ha permès descriure amb suficient precisió la densitat electrònica dels sistemes moleculars analitzats, reduint substancialment el temps de càlcul de les mesures de semblança. En segon lloc, el desenvolupament de tècniques de superposició molecular específiques de les mesures de semblança quàntica ha permès resoldre el problema de l'alineament en l'espai dels compostos comparats. El perfeccionament d'aquests nous procediments i algoritmes matemàtics associats a les mesures de semblança molecular quàntica, ha estat essencial per poder progressar en diferents disciplines de la química computacional, sobretot les relacionades amb les anàlisis quantitatives entre les estructures moleculars i les seves activitats biològiques, conegudes amb les sigles angleses QSAR (Quantitative Structure-Activity Relationships). Precisament en l'àrea de les relacions estructura-activitat s'han presentat dues aproximacions fonamentades en la semblança molecular quàntica que s'originen a partir de dues representacions diferents de les molècules. La primera descripció considera la densitat electrònica global de les molècules i és important, entre altres, la disposició dels objectes comparats en l'espai i la seva conformació tridimensional. El resultat és una matriu de semblança amb les mesures de semblança de tots els parells de compostos que formen el conjunt estudiat. La segona descripció es fonamenta en la partició de la densitat global de les molècules en fragments. S'utilitzen mesures d'autosemblança per analitzar els requeriments bàsics d'una determinada activitat des del punt de vista de la semblança quàntica. El procés permet la detecció de les regions moleculars que són responsables d'una alta resposta biològica. Això permet obtenir un patró amb les regions actives que és d'evident interès per als propòsits del disseny de fàrmacs. En definitiva, s'ha comprovat que mitjançant la simulació i manipulació informàtica de les molècules en tres dimensions es pot obtenir una informació essencial en l'estudi de la interacció entre els fàrmacs i els seus receptors macromoleculars. / There is probably no other concept that contributed to the development of chemistry so remarkably as the ill-defined, qualitative concept of similarity. From the intuitively understood meaning of similarity arises also one of the most powerful chemical principles - the principle of analogy - which in early days of chemistry served as the basis for the classification and systematization of molecules and reactions. The same principle underlies also the widely used idea that similar structures have similar properties which, in turn, is the basis for the existence of various empirical relations between the structure and activity known as QSAR relationships. Because of the fundamental role which similarity plays in so many different situations it is not surprising that its systematic investigation has become the focus of intense scientific interest. Main attention in this respect was devoted to the design of new quantitative measures of molecular similarity. The philosophy underlying the development of quantitative, similarity measures based on quantum theory, arises from the idea that properties of molecules, whether chemical, physical or biological are predetermined by the molecular structure. The rationalization of empirical structure-activity relationships is to considerable extent connected with the recent efforts in the design of new molecular descriptors based on quantum theory. The simplest of such quantities is the electron density function and most of the theoretical molecular descriptors are derived just from this quantity. Among them, a privileged place belongs to the so-called Molecular Quantum Similarity Measures (MQSM). These measures are generally based on the pairwise comparison of electron density functions of the corresponding molecules.This contribution pretends to present an up-to-date revision of Quantum Similarity concepts and their application to QSAR. The general form of MQSM is introduced, and the concrete definitions for practical implementations are specified. Two important topics related to the application of MQSM are discussed: first the Promolecular Atomic Shell Approximation (PASA), a method for fitting first-order molecular density functions for a fast and efficient calculation of the MQSM. Afterwards, a possible solution to the problem of molecular alignment, a determinant procedure in all 3D QSAR methodologies. Finally, the application of Quantum Similarity to QSAR is discussed in detail. Two kind of descriptors derived from molecular quantum similarity theory were used to construct QSAR models: molecular quantum similarity matrices and fragment quantum self-similarity measures. The practical implementation of those ideas has led to the publication of several papers, and finally, to the present work.
|
38 |
Estudi de mètodes de classificació borrosa i la seva aplicació a l'agrupació de zones geogràfiques en base a diverses característiques incertesClara i Lloret, Narcís 22 July 2004 (has links)
Aquesta memòria està estructurada en sis capítols amb l'objectiu final de fonamentar i desenvolupar les eines matemàtiques necessàries per a la classificació de conjunts de subconjuntsborrosos. El nucli teòric del treball el formen els capítols 3, 4 i 5; els dos primers són dos capítols de caire més general, i l'últim és una aplicació dels anteriors a la classificació delspaïsos de la Unió Europea en funció de determinades característiques borroses.En el capítol 1 s'analitzen les diferents connectives borroses posant una especial atenció en aquells aspectes que en altres capítols tindran una aplicació específica. És per aquest motiu que s'estudien les ordenacions de famílies de t-normes, donada la seva importància en la transitivitat de les relacions borroses. Laverificació del principi del terç exclòs és necessària per assegurar que un conjunt significatiu de mesures borroses generalitzades, introduïdes en el capítol 3, siguin reflexives.Estudiem per a quines t-normes es verifica aquesta propietat i introduïm un nou conjunt de t-normes que verifiquen aquest principi.En el capítol 2 es fa un recorregut general per les relacions borroses centrant-nos en l'estudi de la clausura transitiva per a qualsevol t-norma, el càlcul de la qual és en molts casosfonamental per portar a terme el procés de classificació. Al final del capítol s'exposa un procediment pràctic per al càlcul d'unarelació borrosa amb l'ajuda d'experts i de sèries estadístiques.El capítol 3 és un monogràfic sobre mesures borroses. El primer objectiu és relacionar les mesures (o distàncies) usualment utilitzades en les aplicacions borroses amb les mesuresconjuntistes crisp. Es tracta d'un enfocament diferent del tradicional enfocament geomètric. El principal resultat és la introducció d'una família parametritzada de mesures que verifiquenunes propietats de caràcter conjuntista prou satisfactòries.L'estudi de la verificació del principi del terç exclòs té aquí la seva aplicació sobre la reflexivitat d'aquestes mesures, que sónestudiades amb una certa profunditat en alguns casos particulars.El capítol 4 és, d'entrada, un repàs dels principals resultats i mètodes borrosos per a la classificació dels elements d'un mateixconjunt de subconjunts borrosos. És aquí on s'apliquen els resultats sobre les ordenacions de les famílies de t-normes i t-conormes estudiades en el capítol 1. S'introdueix un nou mètodede clusterització, canviant la matriu de la relació borrosa cada vegada que s'obté un nou clúster. Aquest mètode permet homogeneïtzar la metodologia del càlcul de la relació borrosa ambel mètode de clusterització.El capítol 5 tracta sobre l'agrupació d'objectes de diferent naturalesa; és a dir, subconjunts borrosos que pertanyen a diferents conjunts. Aquesta teoria ja ha estat desenvolupada en elcas binari; aquí, el que es presenta és la seva generalització al cas n-ari. Més endavant s'estudien certs aspectes de les projeccions de la relació sobre un cert espai i el recíproc,l'estudi de cilindres de relacions predeterminades. Una aplicació sobre l'agrupació de les comarques gironines en funció de certesvariables borroses es presenta al final del capítol.L'últim capítol és eminentment pràctic, ja que s'aplica allò estudiat principalment en els capítols 3 i 4 a la classificació dels països de la Unió Europea en funció de determinadescaracterístiques borroses. Per tal de fer previsions per a anys venidors s'han utilitzat sèries temporals i xarxes neuronals.S'han emprat diverses mesures i mètodes de clusterització per tal de poder comparar els diversos dendogrames que resulten del procésde clusterització.Finalment, als annexos es poden consultar les sèries estadístiques utilitzades, la seva extrapolació, els càlculs per a la construcció de les matrius de les relacions borroses, les matriusde mesura i les seves clausures. / This thesis is organized in six chapters with the final goal to found and explain the mathematical set of tools necessary to classify sets of fuzzy sets. The theoretic kernel is made by the chapters 3, 4 and 5; the first and second are more generals and the last one is an aplication of the precedent to make a classification of the union european countries in function of some vague attibutes.In the first chapter we analize the different fuzzy logic connectives making a special attention those aspects which will have a specific application in other chapters. Is for this reason that we study the order of families of t-norms, given its importance in the transivity of fuzzy relations. The verification of the third excluded principle is necessary to ensure that a significant set of generalized fuzzy measures, introduced in the chapter 3, were reflexive. We study for which t-norms is verified this property and we introduce a new set of t-norms which verify this principle.In the second chapter we study in a general way the fuzzy relations making a special attention in the transivity closure for any t-norm, its calculus is in a lot of cases basic to make the classification process. At the end of this chapter we describe a practical method to find a fuzzy relation with the help of experts and statistical series.The third chapter is a monographic about fuzzy measures. The first goal is to relate the measures (or distances) usually used in the fuzzy applications with the crisp measures. The question is to change the traditional geometrical point of view for another absolutely fuzzy. The first result is the introduction of a parametrized family of measures that verify a set of properties enough satisfactories. The study of the third exclude principle has here its application about the reflexivity of these measures which are studied with certain profundity in some particular cases.The fourth chapter is, at the beginning, a review of the main results and fuzzy methods for the classification of elements of a same set of fuzzy sets. Is now where we apply the results of orders for t-norms and t-conorms studied in the first chapter. We introduce a new method of fuzzy clustering, changing the fuzzy relation matrix each time that we obtain a new cluster. This method permit to homogenize the methodology of the calculus of the fuzzy relation with the clustering method.The fifth chapter is about the objects association of different nature; that is, fuzzy subsets that belong to different sets. This theory already has been developed in the binary case; here, we submit its generalization for the n dimensional case. Later, we study certain aspects of the fuzzy relation projection on a certain space and the reciprocal, the cilindrical extensions. An application about grouping regions of Girona in function of some uncertain attibutes finish the chapter.The last chapter is eminently applied, because we apply that studied in the 3 and 4 chapters to classify the union european countries in function of some fuzzy attributes. To do forecasts for coming years we have used time series and neural networks. We have used several measures and clustering methods in order to compare the dendograms that result of the clustering process.Finally, in the suplements we can consult the used time series, its extrapolation, the calculus to construct the fuzzy relations, the measure matrixs and its closures.
|
39 |
Modelagem de sistemas dinamicos não lineares utilizando sistemas fuzzy, algoritmos geneticos e funções de base ortonormal / Modeling of nonlinear dynamics systems using fuzzy systems, genetic algorithms and orthonormal basis functionsMedeiros, Anderson Vinicius de 23 January 2006 (has links)
Orientadores: Wagner Caradori do Amaral, Ricardo Jose Gabrielli Barreto Campello / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-06T08:36:39Z (GMT). No. of bitstreams: 1
Medeiros_AndersonViniciusde_M.pdf: 896535 bytes, checksum: 48d0d75d38fcbbd0f47f7c49823806f1 (MD5)
Previous issue date: 2006 / Resumo: Esta dissertação apresenta uma metodologia para a geração e otimização de modelos fuzzy Takagi-Sugeno (TS) com Funções de Base Ortonormal (FBO) para sistemas dinâmicos não lineares utilizando um algoritmo genético. Funções de base ortonormal têm sido utilizadas por proporcionarem aos modelos propriedades como ausência de recursão da saída e possibilidade de se alcançar uma razoável capacidade de representação com poucos parâmetros. Modelos fuzzy TS agregam a essas propriedades as características de interpretabilidade e facilidade de representação do conhecimento. Enfim, os algoritmos genéticos se apresentam como um método bem estabelecido na literatura na tarefa de sintonia de parâmetros de modelos fuzzy TS. Diante disso, desenvolveu-se um algoritmo genético para a otimização de duas arquiteturas, o modelo fuzzy TS FBO e sua extensão, o modelo fuzzy TS FBO Generalizado. Foram analisados modelos locais lineares e não lineares nos conseqüentes das regras fuzzy, assim como a diferença entre a estimação local e a global (utilizando o estimador de mínimos quadrados) dos parâmetros desses modelos locais. No algoritmo genético, cada arquitetura contou com uma representação cromossômica específica. Elaborou-se para ambas uma função de fitness baseada no critério de Akaike. Em relação aos operadores de reprodução, no operador de crossover aritmético foi introduzida uma alteração para a manutenção da diversidade da população e no operador de mutação gaussiana adotou-se uma distribuição variável ao longo das gerações e diferenciada para cada gene. Introduziu-se ainda um método de simplificação de soluções através de medidas de similaridade para a primeira arquitetura citada. A metodologia foi avaliada na tarefa de modelagem de dois sistemas dinâmicos não lineares: um processo de polimerização e um levitador magnético / Abstract: This work introduces a methodology for the generation and optimization of Takagi-Sugeno (TS) fuzzy models with Orthonormal Basis Functions (OBF) for nonlinear dynamic systems based on a genetic algorithm. Orthonormal basis functions have been used because they provide models with properties like absence of output feedback and the possibility to reach a reasonable approximation capability with just a few parameters. TS fuzzy models aggregate to these properties the characteristics of interpretability and easiness to knowledge representation in a linguistic manner. Genetic algorithms appear as a well-established method for tuning parameters of TS fuzzy models. In this context, it was developed a genetic algorithm for the optimization of two architectures, the OBF TS fuzzy model and its extension, the Generalized OBF TS fuzzy model. Local linear and nonlinear models in the consequent of the fuzzy rules were analyzed, as well as the difference between local and global estimation (using least squares estimation) of the parameters of these local models. Each architecture had a specific chromosome representation in the genetic algorithm. It was developed a fitness function based on the Akaike information criterion. With respect to the genetic operators, the arithmetic crossover was modified in order to maintain the population diversity and the Gaussian mutation had its distribution varied along the generations and differentiated for each gene. Besides, it was used, in the first architecture presented, a method for simplifying the solutions by using similarity measures. The whole methodology was evaluated in modeling two nonlinear dynamic systems, a polymerization process and a magnetic levitator / Mestrado / Automação / Mestre em Engenharia Elétrica
|
40 |
De l'usage de la sémantique dans la classification supervisée de textes : application au domaine médical / On the use of semantics in supervised text classification : application in the medical domainAlbitar, Shereen 12 December 2013 (has links)
Cette thèse porte sur l’impact de l’usage de la sémantique dans le processus de la classification supervisée de textes. Cet impact est évalué au travers d’une étude expérimentale sur des documents issus du domaine médical et en utilisant UMLS (Unified Medical Language System) en tant que ressource sémantique. Cette évaluation est faite selon quatre scénarii expérimentaux d’ajout de sémantique à plusieurs niveaux du processus de classification. Le premier scénario correspond à la conceptualisation où le texte est enrichi avant indexation par des concepts correspondant dans UMLS ; le deuxième et le troisième scénario concernent l’enrichissement des vecteurs représentant les textes après indexation dans un sac de concepts (BOC – bag of concepts) par des concepts similaires. Enfin le dernier scénario utilise la sémantique au niveau de la prédiction des classes, où les concepts ainsi que les relations entre eux, sont impliqués dans la prise de décision. Le premier scénario est testé en utilisant trois des méthodes de classification: Rocchio, NB et SVM. Les trois autres scénarii sont uniquement testés en utilisant Rocchio qui est le mieux à même d’accueillir les modifications nécessaires. Au travers de ces différentes expérimentations nous avons tout d’abord montré que des améliorations significatives pouvaient être obtenues avec la conceptualisation du texte avant l’indexation. Ensuite, à partir de représentations vectorielles conceptualisées, nous avons constaté des améliorations plus modérées avec d’une part l’enrichissement sémantique de cette représentation vectorielle après indexation, et d’autre part l’usage de mesures de similarité sémantique en prédiction. / The main interest of this research is the effect of using semantics in the process of supervised text classification. This effect is evaluated through an experimental study on documents related to the medical domain using the UMLS (Unified Medical Language System) as a semantic resource. This evaluation follows four scenarios involving semantics at different steps of the classification process: the first scenario incorporates the conceptualization step where text is enriched with corresponding concepts from UMLS; both the second and the third scenarios concern enriching vectors that represent text as Bag of Concepts (BOC) with similar concepts; the last scenario considers using semantics during class prediction, where concepts as well as the relations between them are involved in decision making. We test the first scenario using three popular classification techniques: Rocchio, NB and SVM. We choose Rocchio for the other scenarios for its extendibility with semantics. According to experiment, results demonstrated significant improvement in classification performance using conceptualization before indexing. Moderate improvements are reported using conceptualized text representation with semantic enrichment after indexing or with semantic text-to-text semantic similarity measures for prediction.
|
Page generated in 0.2411 seconds