Global ETD Search

11	Linguistic Knowledge Transfer for Enriching Vector Representations Kim, Joo-Kyung 12 December 2017 (has links) No description available. Computer Science Artificial Intelligence Transfer Learning Word Embedding Intent Detection Slot Filling POS Tagging Adversarial Traning
12	Knowledge-based support for surgical workflow analysis and recognition / Assistance fondée sur les connaissances pour l'analyse et la reconnaissance du flux de travail chirurgical Dergachyova, Olga 28 November 2017 (has links) L'assistance informatique est devenue une partie indispensable pour la réalisation de procédures chirurgicales modernes. Le désir de créer une nouvelle génération de blocs opératoires intelligents a incité les chercheurs à explorer les problèmes de perception et de compréhension automatique de la situation chirurgicale. Dans ce contexte de prise de conscience de la situation, un domaine de recherche en plein essor adresse la reconnaissance automatique du flux chirurgical. De grands progrès ont été réalisés pour la reconnaissance des phases et des gestes chirurgicaux. Pourtant, il existe encore un vide entre ces deux niveaux de granularité dans la hiérarchie du processus chirurgical. Très peu de recherche se concentre sur les activités chirurgicales portant des informations sémantiques vitales pour la compréhension de la situation. Deux facteurs importants entravent la progression. Tout d'abord, la reconnaissance et la prédiction automatique des activités chirurgicales sont des tâches très difficiles en raison de la courte durée d'une activité, de leur grand nombre et d'un flux de travail très complexe et une large variabilité. Deuxièmement, une quantité très limitée de données cliniques ne fournit pas suffisamment d'informations pour un apprentissage réussi et une reconnaissance précise. À notre avis, avant de reconnaître les activités chirurgicales, une analyse soigneuse des éléments qui composent l'activité est nécessaire pour choisir les bons signaux et les capteurs qui faciliteront la reconnaissance. Nous avons utilisé une approche d'apprentissage profond pour évaluer l'impact de différents éléments sémantiques de l'activité sur sa reconnaissance. Grâce à une étude approfondie, nous avons déterminé un ensemble minimum d'éléments suffisants pour une reconnaissance précise. Les informations sur la structure anatomique et l'instrument chirurgical sont de première importance. Nous avons également abordé le problème de la carence en matière de données en proposant des méthodes de transfert de connaissances à partir d'autres domaines ou chirurgies. Les méthodes de ''word embedding'' et d'apprentissage par transfert ont été proposées. Ils ont démontré leur efficacité sur la tâche de prédiction d'activité suivante offrant une augmentation de précision de 22%. De plus, des observations pertinentes / Computer assistance became indispensable part of modern surgical procedures. Desire of creating new generation of intelligent operating rooms incited researchers to explore problems of automatic perception and understanding of surgical situations. Situation awareness includes automatic recognition of surgical workflow. A great progress was achieved in recognition of surgical phases and gestures. Yet, there is still a blank between these two granularity levels in the hierarchy of surgical process. Very few research is focused on surgical activities carrying important semantic information vital for situation understanding. Two important factors impede the progress. First, automatic recognition and prediction of surgical activities is a highly challenging task due to short duration of activities, their great number and a very complex workflow with multitude of possible execution and sequencing ways. Secondly, very limited amount of clinical data provides not enough information for successful learning and accurate recognition. In our opinion, before recognizing surgical activities a careful analysis of elements that compose activity is necessary in order to chose right signals and sensors that will facilitate recognition. We used a deep learning approach to assess the impact of different semantic elements of activity on its recognition. Through an in-depth study we determined a minimal set of elements sufficient for an accurate recognition. Information about operated anatomical structure and surgical instrument was shown to be the most important. We also addressed the problem of data deficiency proposing methods for transfer of knowledge from other domains or surgeries. The methods of word embedding and transfer learning were proposed. They demonstrated their effectiveness on the task of next activity prediction offering 22% increase in accuracy. In addition, pertinent observations about the surgical practice were made during the study. In this work, we also addressed the problem of insufficient and improper validation of recognition methods. We proposed new validation metrics and approaches for assessing the performance that connect methods to targeted applications and better characterize capacities of the method. The work described in this these aims at clearing obstacles blocking the progress of the domain and proposes a new perspective on the problem of surgical workflow recognition. Activités chirurgicales de bas niveau Analyse sémantique Word embedding Apprentissage par transfert Métriques de validation Low-Level surgical activities Surgical activity recognition Semantic analysis Word embedding Transfer learning Validation metrics
13	Modelos composicionais: análise e aplicação em previsões no mercado de ações Souza, Diego Falcão de, (92) 98128-4110 10 July 2017 (has links) Submitted by Márcia Silva (marcialbuquerq@yahoo.com.br) on 2017-11-21T15:13:35Z No. of bitstreams: 1 Dissertação_DFS_v26_final.pdf: 1805000 bytes, checksum: 4d76d6be8271bc5cada9495ca570805d (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-11-21T15:37:01Z (GMT) No. of bitstreams: 1 Dissertação_DFS_v26_final.pdf: 1805000 bytes, checksum: 4d76d6be8271bc5cada9495ca570805d (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-11-21T15:39:27Z (GMT) No. of bitstreams: 1 Dissertação_DFS_v26_final.pdf: 1805000 bytes, checksum: 4d76d6be8271bc5cada9495ca570805d (MD5) / Made available in DSpace on 2017-11-21T15:39:27Z (GMT). No. of bitstreams: 1 Dissertação_DFS_v26_final.pdf: 1805000 bytes, checksum: 4d76d6be8271bc5cada9495ca570805d (MD5) Previous issue date: 2017-07-10 / FAPEAM - Fundação de Amparo à Pesquisa do Estado do Amazonas / Among several textual representation techniques in the literature, the distributed representation of words is standing out recently in many tasks of Natural Language Processing through its representations based on dense vectors of 𝑑 dimensions that can capture syntactic and semantic information of the words. Therefore, it’s expected that similar words regarding to syntactic and sematic are closer of each other in the vector space. However, while this representation is becoming effective to isolated words, there isn’t a consensus in the literature regarding to the best way to represent more complex structures, such as phrases and sentences. The trend of recent years is the use of compositional models that represents these complex structures through the composition of the representations of its constituent structures using some combination function. However, it’s known that the obtained results by this technique depends directly of the domain in which they are applied. In this work, we analyzed several compositional models applied to the domain of stock price prediction in order to identify which of these models better represent the financial news title for various machine learning methods to predict the index polarity of the S & P 500 stock exchange. / Dentre as várias técnicas de representação textual existentes na literatura, a representação distribuída de palavras (word embedding) vem se destacando ultimamente em várias tarefas de processamento de linguagem natural através de suas representações baseadas em vetores densos de 𝑑 dimensões que são capazes de capturar informações semânticas e sintáticas das palavras. Desta forma, espera-se que as palavras com semelhanças sintáticas e semânticas estejam mais próximas umas das outras no espaço vetorial. No entanto, enquanto essa representação tem se mostrado eficaz para palavras isoladas, não há um consenso na literatura em relação à melhor forma de representar estruturas mais complexas, como frases e orações. A tendência dos últimos anos é a utilização dos modelos composicionais que representam essas estruturas complexas através da composição das representações de suas estruturas constituintes utilizando alguma função de combinação. Entretanto, sabe-se que os resultados obtidos pelos modelos composicionais dependem diretamente do domínio em que são aplicados. Nesse trabalho, nós analisamos diversos modelos de composição aplicados ao domínio de previsão de preços no mercado de ações com o objetivo de identificar qual desses modelos melhor representa os títulos de notícias financeiras para diversos métodos de aprendizado de máquina com o intuito de prever a polaridade do índice da bolsa de valore S & P 500. Representação distribuída de palavras Word embedding Modelos composicionais Aprendizado de máquina
14	Représentations vectorielles et apprentissage automatique pour l’alignement d’entités textuelles et de concepts d’ontologie : application à la biologie / Vector Representations and Machine Learning for Alignment of Text Entities with Ontology Concepts : Application to Biology Ferré, Arnaud 24 May 2019 (has links) L'augmentation considérable de la quantité des données textuelles rend aujourd’hui difficile leur analyse sans l’assistance d’outils. Or, un texte rédigé en langue naturelle est une donnée non-structurée, c’est-à-dire qu’elle n’est pas interprétable par un programme informatique spécialisé, sans lequel les informations des textes restent largement sous-exploitées. Parmi les outils d’extraction automatique d’information, nous nous intéressons aux méthodes d’interprétation automatique de texte pour la tâche de normalisation d’entité qui consiste en la mise en correspondance automatique des mentions d’entités de textes avec des concepts d’un référentiel. Pour réaliser cette tâche, nous proposons une nouvelle approche par alignement de deux types de représentations vectorielles d’entités capturant une partie de leur sens : les plongements lexicaux pour les mentions textuelles et des “plongements ontologiques” pour les concepts, conçus spécifiquement pour ce travail. L’alignement entre les deux se fait par apprentissage supervisé. Les méthodes développées ont été évaluées avec un jeu de données de référence du domaine biologique et elles représentent aujourd’hui l’état de l’art pour ce jeu de données. Ces méthodes sont intégrées dans une suite logicielle de traitement automatique des langues et les codes sont partagés librement. / The impressive increase in the quantity of textual data makes it difficult today to analyze them without the assistance of tools. However, a text written in natural language is unstructured data, i.e. it cannot be interpreted by a specialized computer program, without which the information in the texts remains largely under-exploited. Among the tools for automatic extraction of information from text, we are interested in automatic text interpretation methods for the entity normalization task that consists in automatically matching text entitiy mentions to concepts in a reference terminology. To accomplish this task, we propose a new approach by aligning two types of vector representations of entities that capture part of their meanings: word embeddings for text mentions and concept embeddings for concepts, designed specifically for this work. The alignment between the two is done through supervised learning. The developed methods have been evaluated on a reference dataset from the biological domain and they now represent the state of the art for this dataset. These methods are integrated into a natural language processing software suite and the codes are freely shared. Extraction d’information Normalisation Plongement lexical Intelligence artificielle Traitement automatique des langues Information extraction Normalization Word embedding Artificial intelligence Natural language processing
15	Knowledge Integration and Representation for Biomedical Analysis Alachram, Halima 04 February 2021 (has links) No description available. 510 Data integration Knowledge representation Biomedical ontologies Text mining Word embedding Machine learning Biomedical analysis Informatik (PPN619939052)
16	Text ranking based on semantic meaning of sentences / Textrankning baserad på semantisk betydelse hos meningar Stigeborn, Olivia January 2021 (has links) Finding a suitable candidate to client match is an important part of consultant companies work. It takes a lot of time and effort for the recruiters at the company to read possibly hundreds of resumes to find a suitable candidate. Natural language processing is capable of performing a ranking task where the goal is to rank the resumes with the most suitable candidates ranked the highest. This ensures that the recruiters are only required to look at the top ranked resumes and can quickly get candidates out in the field. Former research has used methods that count specific keywords in resumes and can make decisions on whether a candidate has an experience or not. The main goal of this thesis is to use the semantic meaning of the text in the resumes to get a deeper understanding of a candidate’s level of experience. It also evaluates if the model is possible to run on-device and if the database can contain a mix of English and Swedish resumes. An algorithm was created that uses the word embedding model DistilRoBERTa that is capable of capturing the semantic meaning of text. The algorithm was evaluated by generating job descriptions from the resumes by creating a summary of each resume. The run time, memory usage and the ranking the wanted candidate achieved was documented and used to analyze the results. When the candidate who was used to generate the job description is ranked in the top 10 the classification was considered to be correct. The accuracy was calculated using this method and an accuracy of 68.3% was achieved. The results show that the algorithm is capable of ranking resumes. The algorithm is able to rank both Swedish and English resumes with an accuracy of 67.7% for Swedish resumes and 74.7% for English. The run time was fast enough at an average of 578 ms but the memory usage was too large to make it possible to use the algorithm on-device. In conclusion the semantic meaning of resumes can be used to rank resumes and possible future work would be to combine this method with a method that counts keywords to research if the accuracy would increase. / Att hitta en lämplig kandidat till kundmatchning är en viktig del av ett konsultföretags arbete. Det tar mycket tid och ansträngning för rekryterare på företaget att läsa eventuellt hundratals CV:n för att hitta en lämplig kandidat. Det finns språkteknologiska metoder för att rangordna CV:n med de mest lämpliga kandidaterna rankade högst. Detta säkerställer att rekryterare endast behöver titta på de topprankade CV:erna och snabbt kan få kandidater ut i fältet. Tidigare forskning har använt metoder som räknar specifika nyckelord i ett CV och är kapabla att avgöra om en kandidat har specifika erfarenheter. Huvudmålet med denna avhandling är att använda den semantiska innebörden av texten iCV:n för att få en djupare förståelse för en kandidats erfarenhetsnivå. Den utvärderar också om modellen kan köras på mobila enheter och om algoritmen kan rangordna CV:n oberoende av om CV:erna är på svenska eller engelska. En algoritm skapades som använder ordinbäddningsmodellen DistilRoBERTa som är kapabel att fånga textens semantiska betydelse. Algoritmen utvärderades genom att generera jobbeskrivningar från CV:n genom att skapa en sammanfattning av varje CV. Körtiden, minnesanvändningen och rankningen som den önskade kandidaten fick dokumenterades och användes för att analysera resultatet. När den kandidat som användes för att generera jobbeskrivningen rankades i topp 10 ansågs klassificeringen vara korrekt. Noggrannheten beräknades med denna metod och en noggrannhet på 68,3 % uppnåddes. Resultaten visar att algoritmen kan rangordna CV:n. Algoritmen kan rangordna både svenska och engelska CV:n med en noggrannhet på 67,7 % för svenska och 74,7 % för engelska. Körtiden var i genomsnitt 578 ms vilket skulle möjliggöra att algoritmen kan köras på mobila enheter men minnesanvändningen var för stor. Sammanfattningsvis kan den semantiska betydelsen av CV:n användas för att rangordna CV:n och ett eventuellt framtida arbete är att kombinera denna metod med en metod som räknar nyckelord för att undersöka hur noggrannheten skulle påverkas. Natural language processing Word Embedding Resume Ranking Semantic meaning Språkteknologi Ordinbäddning CV rankning Semantisk betydelse Computer Sciences Datavetenskap (datalogi)
17	Addressing Semantic Interoperability and Text Annotations. Concerns in Electronic Health Records using Word Embedding, Ontology and Analogy Naveed, Arjmand January 2021 (has links) Electronic Health Record (EHR) creates a huge number of databases which are being updated dynamically. Major goal of interoperability in healthcare is to facilitate the seamless exchange of healthcare related data and an environment to supports interoperability and secure transfer of data. The health care organisations face difficulties in exchanging patient’s health care information and laboratory reports etc. due to a lack of semantic interoperability. Hence, there is a need of semantic web technologies for addressing healthcare interoperability problems by enabling various healthcare standards from various healthcare entities (doctors, clinics, hospitals etc.) to exchange data and its semantics which can be understood by both machines and humans. Thus, a framework with a similarity analyser has been proposed in the thesis that dealt with semantic interoperability. While dealing with semantic interoperability, another consideration was the use of word embedding and ontology for knowledge discovery. In medical domain, the main challenge for medical information extraction system is to find the required information by considering explicit and implicit clinical context with high degree of precision and accuracy. For semantic similarity of medical text at different levels (conceptual, sentence and document level), different methods and techniques have been widely presented, but I made sure that the semantic content of a text that is presented includes the correct meaning of words and sentences. A comparative analysis of approaches included ontology followed by word embedding or vice-versa have been applied to explore the methodology to define which approach gives better results for gaining higher semantic similarity. Selecting the Kidney Cancer dataset as a use case, I concluded that both approaches work better in different circumstances. However, the approach in which ontology is followed by word embedding to enrich data first has shown better results. Apart from enriching the EHR, extracting relevant information is also challenging. To solve this challenge, the concept of analogy has been applied to explain similarities between two different contents as analogies play a significant role in understanding new concepts. The concept of analogy helps healthcare professionals to communicate with patients effectively and help them understand their disease and treatment. So, I utilised analogies in this thesis to support the extraction of relevant information from the medical text. Since accessing EHR has been challenging, tweets text is used as an alternative for EHR as social media has appeared as a relevant data source in recent years. An algorithm has been proposed to analyse medical tweets based on analogous words. The results have been used to validate the proposed methods. Two experts from medical domain have given their views on the proposed methods in comparison with the similar method named as SemDeep. The quantitative and qualitative results have shown that the proposed analogy-based method bring diversity and are helpful in analysing the specific disease or in text classification. Electronic Health Record (EHR) Semantic annotations Word embedding Ontology Analogy Artificial intelligence (AI) Knowledge discovery Semantic interoperability
18	Word Clustering in an Interactive Text Analysis Tool / Klustring av ord i ett interaktivt textanalysverktyg Gränsbo, Gustav January 2019 (has links) A central operation of users of the text analysis tool Gavagai Explorer is to look through a list of words and arrange them in groups. This thesis explores the use of word clustering to automatically arrange the words in groups intended to help users. A new word clustering algorithm is introduced, which attempts to produce word clusters tailored to be small enough for a user to quickly grasp the common theme of the words. The proposed algorithm computes similarities among words using word embeddings, and clusters them using hierarchical graph clustering. Multiple variants of the algorithm are evaluated in an unsupervised manner by analysing the clusters they produce when applied to 110 data sets previously analysed by users of Gavagai Explorer. A supervised evaluation is performed to compare clusters to the groups of words previously created by users of Gavagai Explorer. Results show that it was possible to choose a set of hyperparameters deemed to perform well across most data sets in the unsupervised evaluation. These hyperparameters also performed among the best on the supervised evaluation. It was concluded that the choice of word embedding and graph clustering algorithm had little impact on the behaviour of the algorithm. Rather, limiting the maximum size of clusters and filtering out similarities between words had a much larger impact on behaviour. word clustering word embedding distributional semantics hierarchical clustering text analytics language technology natural language processing gavagai
19	Parallel Algorithms for Machine Learning Moon, Gordon Euhyun 02 October 2019 (has links) No description available. Computer Science Parallel Machine Learning Parallel Topic Modeling Parallel Latent Dirichlet Allocation Parallel Word2Vec Dimension Reduction Word Embedding
20	Multilabel text classification of public procurements using deep learning intent detection / Textklassificering av offentliga upphandlingar med djupa artificiella neuronnät och avsåtsdetektering Suta, Adin January 2019 (has links) Textual data is one of the most widespread forms of data and the amount of such data available in the world increases at a rapid rate. Text can be understood as either a sequence of characters or words, where the latter approach is the most common. With the breakthroughs within the area of applied artificial intelligence in recent years, more and more tasks are aided by automatic processing of text in various applications. The models introduced in the following sections rely on deep-learning sequence-processing in order to process and text to produce a regression algorithm for classification of what the text input refers to. We investigate and compare the performance of several model architectures along with different hyperparameters. The data set was provided by e-Avrop, a Swedish company which hosts a web platform for posting and bidding of public procurements. It consists of titles and descriptions of Swedish public procurements posted on the website of e-Avrop, along with the respective category/categories of each text. When the texts are described by several categories (multi label case) we suggest a deep learning sequence-processing regression algorithm, where a set of deep learning classifiers are used. Each model uses one of the several labels in the multi label case, along with the text input to produce a set of text - label observation pairs. The goal becomes to investigate whether these classifiers can carry out different levels of intent, an intent which should theoretically be imposed by the different training data sets used by each of the individual deep learning classifiers. / Data i form av text är en av de mest utbredda formerna av data och mängden tillgänglig textdata runt om i världen ökar i snabb takt. Text kan tolkas som en följd av bokstäver eller ord, där tolkning av text i form av ordföljder är absolut vanligast. Genombrott inom artificiell intelligens under de senaste åren har medfört att fler och fler arbetsuppgifter med koppling till text assisteras av automatisk textbearbetning. Modellerna som introduceras i denna uppsats är baserade på djupa artificiella neuronnät med sekventiell bearbetning av textdata, som med hjälp av regression förutspår tillhörande ämnesområde för den inmatade texten. Flera modeller och tillhörande hyperparametrar utreds och jämförs enligt prestanda. Datamängden som använts är tillhandahållet av e-Avrop, ett svenskt företag som erbjuder en webbtjänst för offentliggörande och budgivning av offentliga upphandlingar. Datamängden består av titlar, beskrivningar samt tillhörande ämneskategorier för offentliga upphandlingar inom Sverige, tagna från e-Avrops webtjänst. När texterna är märkta med ett flertal kategorier, föreslås en algoritm baserad på ett djupt artificiellt neuronnät med sekventiell bearbetning, där en mängd klassificeringsmodeller används. Varje sådan modell använder en av de märkta kategorierna tillsammans med den tillhörande texten, som skapar en mängd av text - kategori par. Målet är att utreda huruvida dessa klassificerare kan uppvisa olika former av uppsåt som teoretiskt sett borde vara medfört från de olika datamängderna modellerna mottagit. Natural language processing text classification deep learning applied mathematics recurrent neural network word embedding Maskininlärning textklassificering artificiella neruonnät tillämpad matematik Probability Theory and Statistics Sannolikhetsteori och statistik

Search results