Global ETD Search

251	Représentations vectorielles et apprentissage automatique pour l’alignement d’entités textuelles et de concepts d’ontologie : application à la biologie / Vector Representations and Machine Learning for Alignment of Text Entities with Ontology Concepts : Application to Biology Ferré, Arnaud 24 May 2019 (has links) L'augmentation considérable de la quantité des données textuelles rend aujourd’hui difficile leur analyse sans l’assistance d’outils. Or, un texte rédigé en langue naturelle est une donnée non-structurée, c’est-à-dire qu’elle n’est pas interprétable par un programme informatique spécialisé, sans lequel les informations des textes restent largement sous-exploitées. Parmi les outils d’extraction automatique d’information, nous nous intéressons aux méthodes d’interprétation automatique de texte pour la tâche de normalisation d’entité qui consiste en la mise en correspondance automatique des mentions d’entités de textes avec des concepts d’un référentiel. Pour réaliser cette tâche, nous proposons une nouvelle approche par alignement de deux types de représentations vectorielles d’entités capturant une partie de leur sens : les plongements lexicaux pour les mentions textuelles et des “plongements ontologiques” pour les concepts, conçus spécifiquement pour ce travail. L’alignement entre les deux se fait par apprentissage supervisé. Les méthodes développées ont été évaluées avec un jeu de données de référence du domaine biologique et elles représentent aujourd’hui l’état de l’art pour ce jeu de données. Ces méthodes sont intégrées dans une suite logicielle de traitement automatique des langues et les codes sont partagés librement. / The impressive increase in the quantity of textual data makes it difficult today to analyze them without the assistance of tools. However, a text written in natural language is unstructured data, i.e. it cannot be interpreted by a specialized computer program, without which the information in the texts remains largely under-exploited. Among the tools for automatic extraction of information from text, we are interested in automatic text interpretation methods for the entity normalization task that consists in automatically matching text entitiy mentions to concepts in a reference terminology. To accomplish this task, we propose a new approach by aligning two types of vector representations of entities that capture part of their meanings: word embeddings for text mentions and concept embeddings for concepts, designed specifically for this work. The alignment between the two is done through supervised learning. The developed methods have been evaluated on a reference dataset from the biological domain and they now represent the state of the art for this dataset. These methods are integrated into a natural language processing software suite and the codes are freely shared. Extraction d’information Normalisation Plongement lexical Intelligence artificielle Traitement automatique des langues Information extraction Normalization Word embedding Artificial intelligence Natural language processing
252	Stack Number, Track Number, and Layered Pathwidth Yelle, Céline 09 April 2020 (has links) In this thesis, we consider three parameters associated with graphs : stack number, track number, and layered pathwidth. Our first result is to show that the stack number of any graph is at most 4 times its layered pathwidth. This result complements an existing result of Dujmovic et al. that showed that the queue number of a graph is at most 3 times its layered pathwidth minus one (Dujmovic, Morin, and Wood [SIAM J. Comput., 553–579, 2005]). Our second result is to show that graphs of track number at most 3 have layered pathwidth at most 4. This answers an open question posed by Banister et al. (Bannister, Devanny, Dujmovic, Eppstein, and Wood [GD 2016, 499–510, 2016, Algorithmica, 1–23, 2018]). graph layout stack layout stack number book embedding page number track layout track number layered path decomposition layered pathwidth
253	Apprentissage et exploitation de représentations sémantiques pour la classification et la recherche d'images / Learning and exploiting semantic representations for image classification and retrieval Bucher, Maxime 27 November 2018 (has links) Dans cette thèse nous étudions différentes questions relatives à la mise en pratique de modèles d'apprentissage profond. En effet malgré les avancées prometteuses de ces algorithmes en vision par ordinateur, leur emploi dans certains cas d'usage réels reste difficile. Une première difficulté est, pour des tâches de classification d'images, de rassembler pour des milliers de catégories suffisamment de données d'entraînement pour chacune des classes. C'est pourquoi nous proposons deux nouvelles approches adaptées à ce scénario d'apprentissage, appelé <<classification zero-shot>>.L'utilisation d'information sémantique pour modéliser les classes permet de définir les modèles par description, par opposition à une modélisation à partir d'un ensemble d'exemples, et rend possible la modélisation sans donnée de référence. L'idée fondamentale du premier chapitre est d'obtenir une distribution d'attributs optimale grâce à l'apprentissage d'une métrique, capable à la fois de sélectionner et de transformer la distribution des données originales. Dans le chapitre suivant, contrairement aux approches standards de la littérature qui reposent sur l'apprentissage d'un espace d'intégration commun, nous proposons de générer des caractéristiques visuelles à partir d'un générateur conditionnel. Une fois générés ces exemples artificiels peuvent être utilisés conjointement avec des données réelles pour l'apprentissage d'un classifieur discriminant. Dans une seconde partie de ce manuscrit, nous abordons la question de l'intelligibilité des calculs pour les tâches de vision par ordinateur. En raison des nombreuses et complexes transformations des algorithmes profonds, il est difficile pour un utilisateur d'interpréter le résultat retourné. Notre proposition est d'introduire un <<goulot d'étranglement sémantique>> dans le processus de traitement. La représentation de l'image est exprimée entièrement en langage naturel, tout en conservant l'efficacité des représentations numériques. L'intelligibilité de la représentation permet à un utilisateur d'examiner sur quelle base l'inférence a été réalisée et ainsi d'accepter ou de rejeter la décision suivant sa connaissance et son expérience humaine. / In this thesis, we examine some practical difficulties of deep learning models.Indeed, despite the promising results in computer vision, implementing them in some situations raises some questions. For example, in classification tasks where thousands of categories have to be recognised, it is sometimes difficult to gather enough training data for each category.We propose two new approaches for this learning scenario, called <<zero-shot learning>>. We use semantic information to model classes which allows us to define models by description, as opposed to modelling from a set of examples.In the first chapter we propose to optimize a metric in order to transform the distribution of the original data and to obtain an optimal attribute distribution. In the following chapter, unlike the standard approaches of the literature that rely on the learning of a common integration space, we propose to generate visual features from a conditional generator. The artificial examples can be used in addition to real data for learning a discriminant classifier. In the second part of this thesis, we address the question of computational intelligibility for computer vision tasks. Due to the many and complex transformations of deep learning algorithms, it is difficult for a user to interpret the returned prediction. Our proposition is to introduce what we call a <<semantic bottleneck>> in the processing pipeline, which is a crossing point in which the representation of the image is entirely expressed with natural language, while retaining the efficiency of numerical representations. This semantic bottleneck allows to detect failure cases in the prediction process so as to accept or reject the decision. Classification zero-shot Attribapprentissage de métriqueuts Goulot d'étranglement sémantique Zero-shot learning Attributes Embedding, Metric learning Semantic bottleneck Retrieval
254	Properties of Sobolev Mappings / Properties of Sobolev Mappings Roskovec, Tomáš January 2017 (has links) We study the properties of Sobolev functions and mappings, especially we study the violation of some properties. In the first part we study the Sobolev Embedding Theorem that guarantees W1,p (Ω) ⊂ Lp∗ (Ω) for some parameter p∗ (p, n, Ω). We show that for a general domain this relation does not have to be smooth as a function of p and not even continuous and we give the example of the domain in question. In the second part we study the Cesari's counterexample of the continuous mapping in W1,n ([−1, 1]n , Rn ) violating Lusin (N) condition. We show that this example can be constructed as a gradient mapping. In the third part we generalize the Cesari's counterexample and Ponomarev's counte- rexample for the higher derivative Sobolev spaces Wk,p (Ω, Rn ) and characterize the validity of the Lusin (N) condition in dependence on the parameters k and p and dimension. 1
255	A Pure Embedding of Roles: Exploring 4-dimensional Dispatch for Roles in Structured Contexts Leuthäuser, Max 15 August 2017 (has links) Present-day software systems have to fulfill an increasing number of requirements, which makes them more and more complex. Many systems need to anticipate changing contexts or need to adapt to changing business rules or requirements. The challenge of 21th-century software development will be to cope with these aspects. We believe that the role concept offers a simple way to adapt an object-oriented program to its changing context. In a role-based application, an object plays multiple roles during its lifetime. If the contexts are represented as first-class entities, they provide dynamic views to the object-oriented program, and if a context changes, the dynamic views can be switched easily, and the software system adapts automatically. However, the concepts of roles and dynamic contexts have been discussed for a long time in many areas of computer science. So far, their employment in an existing object-oriented language requires a specific runtime environment. Also, classical object-oriented languages and their runtime systems are not able to cope with essential role-specific features, such as true delegation or dynamic binding of roles. In addition to that, contexts and views seem to be important in software development. The traditional code-oriented approach to software engineering becomes less and less satisfactory. The support for multiple views of a software system scales much better to the needs of todays systems. However, it relies on programming languages to provide roles for the construction of views. As a solution, this thesis presents an implementation pattern for role-playing objects that does not require a specific runtime system, the SCala ROles Language (SCROLL). Via this library approach, roles are embedded in a statically typed base language as dynamically evolving objects. The approach is pure in the sense that there is no need for an additional compiler or tooling. The implementation pattern is demonstrated on the basis of the Scala language. As technical support from Scala, the pattern requires dynamic mixins, compiler-translated function calls, and implicit conversions. The details how roles are implemented are hidden in a Scala library and therefore transparent to SCROLL programmers. The SCROLL library supports roles embedded in structured contexts. Additionally, a four-dimensional, context-aware dispatch at runtime is presented. It overcomes the subtle ambiguities introduced with the rich semantics of role-playing objects. SCROLL is written in Scala, which blends a modern object-oriented with a functional programming language. The size of the library is below 1400 lines of code so that it can be considered to have minimalistic design and to be easy to maintain. Our approach solves several practical problems arising in the area of dynamical extensibility and adaptation. info:eu-repo/classification/ddc/004 ddc:004
256	Knowledge Integration and Representation for Biomedical Analysis Alachram, Halima 04 February 2021 (has links) No description available. 510 Data integration Knowledge representation Biomedical ontologies Text mining Word embedding Machine learning Biomedical analysis Informatik (PPN619939052)
257	Text ranking based on semantic meaning of sentences / Textrankning baserad på semantisk betydelse hos meningar Stigeborn, Olivia January 2021 (has links) Finding a suitable candidate to client match is an important part of consultant companies work. It takes a lot of time and effort for the recruiters at the company to read possibly hundreds of resumes to find a suitable candidate. Natural language processing is capable of performing a ranking task where the goal is to rank the resumes with the most suitable candidates ranked the highest. This ensures that the recruiters are only required to look at the top ranked resumes and can quickly get candidates out in the field. Former research has used methods that count specific keywords in resumes and can make decisions on whether a candidate has an experience or not. The main goal of this thesis is to use the semantic meaning of the text in the resumes to get a deeper understanding of a candidate’s level of experience. It also evaluates if the model is possible to run on-device and if the database can contain a mix of English and Swedish resumes. An algorithm was created that uses the word embedding model DistilRoBERTa that is capable of capturing the semantic meaning of text. The algorithm was evaluated by generating job descriptions from the resumes by creating a summary of each resume. The run time, memory usage and the ranking the wanted candidate achieved was documented and used to analyze the results. When the candidate who was used to generate the job description is ranked in the top 10 the classification was considered to be correct. The accuracy was calculated using this method and an accuracy of 68.3% was achieved. The results show that the algorithm is capable of ranking resumes. The algorithm is able to rank both Swedish and English resumes with an accuracy of 67.7% for Swedish resumes and 74.7% for English. The run time was fast enough at an average of 578 ms but the memory usage was too large to make it possible to use the algorithm on-device. In conclusion the semantic meaning of resumes can be used to rank resumes and possible future work would be to combine this method with a method that counts keywords to research if the accuracy would increase. / Att hitta en lämplig kandidat till kundmatchning är en viktig del av ett konsultföretags arbete. Det tar mycket tid och ansträngning för rekryterare på företaget att läsa eventuellt hundratals CV:n för att hitta en lämplig kandidat. Det finns språkteknologiska metoder för att rangordna CV:n med de mest lämpliga kandidaterna rankade högst. Detta säkerställer att rekryterare endast behöver titta på de topprankade CV:erna och snabbt kan få kandidater ut i fältet. Tidigare forskning har använt metoder som räknar specifika nyckelord i ett CV och är kapabla att avgöra om en kandidat har specifika erfarenheter. Huvudmålet med denna avhandling är att använda den semantiska innebörden av texten iCV:n för att få en djupare förståelse för en kandidats erfarenhetsnivå. Den utvärderar också om modellen kan köras på mobila enheter och om algoritmen kan rangordna CV:n oberoende av om CV:erna är på svenska eller engelska. En algoritm skapades som använder ordinbäddningsmodellen DistilRoBERTa som är kapabel att fånga textens semantiska betydelse. Algoritmen utvärderades genom att generera jobbeskrivningar från CV:n genom att skapa en sammanfattning av varje CV. Körtiden, minnesanvändningen och rankningen som den önskade kandidaten fick dokumenterades och användes för att analysera resultatet. När den kandidat som användes för att generera jobbeskrivningen rankades i topp 10 ansågs klassificeringen vara korrekt. Noggrannheten beräknades med denna metod och en noggrannhet på 68,3 % uppnåddes. Resultaten visar att algoritmen kan rangordna CV:n. Algoritmen kan rangordna både svenska och engelska CV:n med en noggrannhet på 67,7 % för svenska och 74,7 % för engelska. Körtiden var i genomsnitt 578 ms vilket skulle möjliggöra att algoritmen kan köras på mobila enheter men minnesanvändningen var för stor. Sammanfattningsvis kan den semantiska betydelsen av CV:n användas för att rangordna CV:n och ett eventuellt framtida arbete är att kombinera denna metod med en metod som räknar nyckelord för att undersöka hur noggrannheten skulle påverkas. Natural language processing Word Embedding Resume Ranking Semantic meaning Språkteknologi Ordinbäddning CV rankning Semantisk betydelse Computer Sciences Datavetenskap (datalogi)
258	A Comparative study of Knowledge Graph Embedding Models for use in Fake News Detection Frimodig, Matilda, Lanhed Sivertsson, Tom January 2021 (has links) During the past few years online misinformation, generally referred to as fake news, has been identified as an increasingly dangerous threat. As the spread of misinformation online has increased, fake news detection has become an active line of research. One approach is to use knowledge graphs for the purpose of automated fake news detection. While large scale knowledge graphs are openly available these are rarely up to date, often missing the relevant information needed for the task of fake news detection. Creating new knowledge graphs from online sources is one way to obtain the missing information. However extracting information from unstructured text is far from straightforward. Using Natural Language Processing techniques we developed a pre-processing pipeline for extracting information from text for the purpose of creating knowledge graphs. In order to classify news as fake or not fake with the use of knowledge graphs, these need to be converted into a machine understandable format, called knowledge graph embeddings. These embeddings also allow new information to be inferred or classified based on the already existing information in the knowledge graph. Only one knowledge graph embedding model has previously been used for the purpose of fake news detection while several new models have recently been developed. We compare the performance of three different embedding models, all relying on different fundamental architectures, in the specific context of fake news detection. The models used were the geometric model TransE, the tensor decomposition model ComplEx and the deep learning model ConvKB. The results of this study shows that out of the three models, ConvKB is the best performing. However other aspects than performance need to be considered and as such these results do not necessarily mean that a deep learning approach is the most suitable for real world fake news detection. Machine Learning Fake News Detection Knowledge Graph Natural Language Processing Knowledge Graph Embedding Computer Sciences Datavetenskap (datalogi)
259	Generative modelling and inverse problem solving for networks in hyperbolic space Muscoloni, Alessandro 12 August 2019 (has links) The investigation of the latent geometrical space behind complex network topologies is a fervid topic in current network science and the hyperbolic space is one of the most studied, because it seems associated to the structural organization of many real complex systems. The popularity-similarity-optimization (PSO) generative model is able to grow random geometric graphs in the hyperbolic space with realistic properties such as clustering, small-worldness, scale-freeness and rich-clubness. However, it misses to reproduce an important feature of real complex systems, which is the community organization. Here, we introduce the nonuniform PSO (nPSO) generative model, a generalization of the PSO model with a tailored community structure, and we provide an efficient algorithmic implementation with a O(EN) time complexity, where N is the number of nodes and E the number of edges. Meanwhile, in recent years, the inverse problem has also gained increasing attention: given a network topology, how to provide an accurate mapping into its latent geometrical space. Unlike previous attempts based on a computationally expensive maximum likelihood optimization (whose time complexity is between O(N^3) and O(N^4)), here we show that a class of methods based on nonlinear dimensionality reduction can solve the problem with higher precision and reducing the time complexity to O(N^2). info:eu-repo/classification/ddc/004 ddc:004
260	Capturing Knowledge of Emerging Entities from the Extended Search Snippets Ngwobia, Sunday C. January 2019 (has links) No description available. Computer Science Information Systems Emerging entities Capturing Knowledge Knowledge Graph search snippets Entity embedding Enhanced corpus, entity types entailment

Search results