Global ETD Search

11	Classification of User Stories using aNLP and Deep Learning Based Approach Kandikari, Bhavesh January 2023 (has links) No description available. Natural Language Processing Word Embedding Requirement Refining User Story Deep Learning Computer Sciences Datavetenskap (datalogi)
12	Topic discovery and document similarity via pre-trained word embeddings Chen, Simin January 2018 (has links) Throughout the history, humans continue to generate an ever-growing volume of documents about a wide range of topics. We now rely on computer programs to automatically process these vast collections of documents in various applications. Many applications require a quantitative measure of the document similarity. Traditional methods first learn a vector representation for each document using a large corpus, and then compute the distance between two document vectors as the document similarity.In contrast to this corpus-based approach, we propose a straightforward model that directly discovers the topics of a document by clustering its words, without the need of a corpus. We define a vector representation called normalized bag-of-topic-embeddings (nBTE) to encapsulate these discovered topics and compute the soft cosine similarity between two nBTE vectors as the document similarity. In addition, we propose a logistic word importance function that assigns words different importance weights based on their relative discriminating power.Our model is efficient in terms of the average time complexity. The nBTE representation is also interpretable as it allows for topic discovery of the document. On three labeled public data sets, our model achieved comparable k-nearest neighbor classification accuracy with five stateof-art baseline models. Furthermore, from these three data sets, we derived four multi-topic data sets where each label refers to a set of topics. Our model consistently outperforms the state-of-art baseline models by a large margin on these four challenging multi-topic data sets. These works together provide answers to the research question of this thesis:Can we construct an interpretable document represen-tation by clustering the words in a document, and effectively and efficiently estimate the document similarity? / Under hela historien fortsätter människor att skapa en växande mängd dokument om ett brett spektrum av publikationer. Vi förlitar oss nu på dataprogram för att automatiskt bearbeta dessa stora samlingar av dokument i olika applikationer. Många applikationer kräver en kvantitativmått av dokumentets likhet. Traditionella metoder först lära en vektorrepresentation för varje dokument med hjälp av en stor corpus och beräkna sedan avståndet mellan two document vektorer som dokumentets likhet.Till skillnad från detta corpusbaserade tillvägagångssätt, föreslår vi en rak modell som direkt upptäcker ämnena i ett dokument genom att klustra sina ord , utan behov av en corpus. Vi definierar en vektorrepresentation som kallas normalized bag-of-topic-embeddings (nBTE) för att inkapsla de upptäckta ämnena och beräkna den mjuka cosinuslikheten mellan två nBTE-vektorer som dokumentets likhet. Dessutom föreslår vi en logistisk ordbetydelsefunktion som tilldelar ord olika viktvikter baserat på relativ diskriminerande kraft.Vår modell är effektiv när det gäller den genomsnittliga tidskomplexiteten. nBTE-representationen är också tolkbar som möjliggör ämnesidentifiering av dokumentet. På tremärkta offentliga dataset uppnådde vår modell jämförbar närmaste grannklassningsnoggrannhet med fem toppmoderna modeller. Vidare härledde vi från de tre dataseten fyra multi-ämnesdatasatser där varje etikett hänvisar till en uppsättning ämnen. Vår modell överensstämmer överens med de högteknologiska baslinjemodellerna med en stor marginal av fyra utmanande multi-ämnesdatasatser. Dessa arbetsstöd ger svar på forskningsproblemet av tisthesis:Kan vi konstruera en tolkbar dokumentrepresentation genom att klustra orden i ett dokument och effektivt och effektivt uppskatta dokumentets likhet? Computer and Information Sciences Data- och informationsvetenskap
13	Linguistic Knowledge Transfer for Enriching Vector Representations Kim, Joo-Kyung 12 December 2017 (has links) No description available. Computer Science Artificial Intelligence Transfer Learning Word Embedding Intent Detection Slot Filling POS Tagging Adversarial Traning
14	Knowledge-based support for surgical workflow analysis and recognition / Assistance fondée sur les connaissances pour l'analyse et la reconnaissance du flux de travail chirurgical Dergachyova, Olga 28 November 2017 (has links) L'assistance informatique est devenue une partie indispensable pour la réalisation de procédures chirurgicales modernes. Le désir de créer une nouvelle génération de blocs opératoires intelligents a incité les chercheurs à explorer les problèmes de perception et de compréhension automatique de la situation chirurgicale. Dans ce contexte de prise de conscience de la situation, un domaine de recherche en plein essor adresse la reconnaissance automatique du flux chirurgical. De grands progrès ont été réalisés pour la reconnaissance des phases et des gestes chirurgicaux. Pourtant, il existe encore un vide entre ces deux niveaux de granularité dans la hiérarchie du processus chirurgical. Très peu de recherche se concentre sur les activités chirurgicales portant des informations sémantiques vitales pour la compréhension de la situation. Deux facteurs importants entravent la progression. Tout d'abord, la reconnaissance et la prédiction automatique des activités chirurgicales sont des tâches très difficiles en raison de la courte durée d'une activité, de leur grand nombre et d'un flux de travail très complexe et une large variabilité. Deuxièmement, une quantité très limitée de données cliniques ne fournit pas suffisamment d'informations pour un apprentissage réussi et une reconnaissance précise. À notre avis, avant de reconnaître les activités chirurgicales, une analyse soigneuse des éléments qui composent l'activité est nécessaire pour choisir les bons signaux et les capteurs qui faciliteront la reconnaissance. Nous avons utilisé une approche d'apprentissage profond pour évaluer l'impact de différents éléments sémantiques de l'activité sur sa reconnaissance. Grâce à une étude approfondie, nous avons déterminé un ensemble minimum d'éléments suffisants pour une reconnaissance précise. Les informations sur la structure anatomique et l'instrument chirurgical sont de première importance. Nous avons également abordé le problème de la carence en matière de données en proposant des méthodes de transfert de connaissances à partir d'autres domaines ou chirurgies. Les méthodes de ''word embedding'' et d'apprentissage par transfert ont été proposées. Ils ont démontré leur efficacité sur la tâche de prédiction d'activité suivante offrant une augmentation de précision de 22%. De plus, des observations pertinentes / Computer assistance became indispensable part of modern surgical procedures. Desire of creating new generation of intelligent operating rooms incited researchers to explore problems of automatic perception and understanding of surgical situations. Situation awareness includes automatic recognition of surgical workflow. A great progress was achieved in recognition of surgical phases and gestures. Yet, there is still a blank between these two granularity levels in the hierarchy of surgical process. Very few research is focused on surgical activities carrying important semantic information vital for situation understanding. Two important factors impede the progress. First, automatic recognition and prediction of surgical activities is a highly challenging task due to short duration of activities, their great number and a very complex workflow with multitude of possible execution and sequencing ways. Secondly, very limited amount of clinical data provides not enough information for successful learning and accurate recognition. In our opinion, before recognizing surgical activities a careful analysis of elements that compose activity is necessary in order to chose right signals and sensors that will facilitate recognition. We used a deep learning approach to assess the impact of different semantic elements of activity on its recognition. Through an in-depth study we determined a minimal set of elements sufficient for an accurate recognition. Information about operated anatomical structure and surgical instrument was shown to be the most important. We also addressed the problem of data deficiency proposing methods for transfer of knowledge from other domains or surgeries. The methods of word embedding and transfer learning were proposed. They demonstrated their effectiveness on the task of next activity prediction offering 22% increase in accuracy. In addition, pertinent observations about the surgical practice were made during the study. In this work, we also addressed the problem of insufficient and improper validation of recognition methods. We proposed new validation metrics and approaches for assessing the performance that connect methods to targeted applications and better characterize capacities of the method. The work described in this these aims at clearing obstacles blocking the progress of the domain and proposes a new perspective on the problem of surgical workflow recognition. Activités chirurgicales de bas niveau Analyse sémantique Word embedding Apprentissage par transfert Métriques de validation Low-Level surgical activities Surgical activity recognition Semantic analysis Word embedding Transfer learning Validation metrics
15	Modelos composicionais: análise e aplicação em previsões no mercado de ações Souza, Diego Falcão de, (92) 98128-4110 10 July 2017 (has links) Submitted by Márcia Silva (marcialbuquerq@yahoo.com.br) on 2017-11-21T15:13:35Z No. of bitstreams: 1 Dissertação_DFS_v26_final.pdf: 1805000 bytes, checksum: 4d76d6be8271bc5cada9495ca570805d (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-11-21T15:37:01Z (GMT) No. of bitstreams: 1 Dissertação_DFS_v26_final.pdf: 1805000 bytes, checksum: 4d76d6be8271bc5cada9495ca570805d (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-11-21T15:39:27Z (GMT) No. of bitstreams: 1 Dissertação_DFS_v26_final.pdf: 1805000 bytes, checksum: 4d76d6be8271bc5cada9495ca570805d (MD5) / Made available in DSpace on 2017-11-21T15:39:27Z (GMT). No. of bitstreams: 1 Dissertação_DFS_v26_final.pdf: 1805000 bytes, checksum: 4d76d6be8271bc5cada9495ca570805d (MD5) Previous issue date: 2017-07-10 / FAPEAM - Fundação de Amparo à Pesquisa do Estado do Amazonas / Among several textual representation techniques in the literature, the distributed representation of words is standing out recently in many tasks of Natural Language Processing through its representations based on dense vectors of 𝑑 dimensions that can capture syntactic and semantic information of the words. Therefore, it’s expected that similar words regarding to syntactic and sematic are closer of each other in the vector space. However, while this representation is becoming effective to isolated words, there isn’t a consensus in the literature regarding to the best way to represent more complex structures, such as phrases and sentences. The trend of recent years is the use of compositional models that represents these complex structures through the composition of the representations of its constituent structures using some combination function. However, it’s known that the obtained results by this technique depends directly of the domain in which they are applied. In this work, we analyzed several compositional models applied to the domain of stock price prediction in order to identify which of these models better represent the financial news title for various machine learning methods to predict the index polarity of the S & P 500 stock exchange. / Dentre as várias técnicas de representação textual existentes na literatura, a representação distribuída de palavras (word embedding) vem se destacando ultimamente em várias tarefas de processamento de linguagem natural através de suas representações baseadas em vetores densos de 𝑑 dimensões que são capazes de capturar informações semânticas e sintáticas das palavras. Desta forma, espera-se que as palavras com semelhanças sintáticas e semânticas estejam mais próximas umas das outras no espaço vetorial. No entanto, enquanto essa representação tem se mostrado eficaz para palavras isoladas, não há um consenso na literatura em relação à melhor forma de representar estruturas mais complexas, como frases e orações. A tendência dos últimos anos é a utilização dos modelos composicionais que representam essas estruturas complexas através da composição das representações de suas estruturas constituintes utilizando alguma função de combinação. Entretanto, sabe-se que os resultados obtidos pelos modelos composicionais dependem diretamente do domínio em que são aplicados. Nesse trabalho, nós analisamos diversos modelos de composição aplicados ao domínio de previsão de preços no mercado de ações com o objetivo de identificar qual desses modelos melhor representa os títulos de notícias financeiras para diversos métodos de aprendizado de máquina com o intuito de prever a polaridade do índice da bolsa de valore S & P 500. Representação distribuída de palavras Word embedding Modelos composicionais Aprendizado de máquina
16	Représentations vectorielles et apprentissage automatique pour l’alignement d’entités textuelles et de concepts d’ontologie : application à la biologie / Vector Representations and Machine Learning for Alignment of Text Entities with Ontology Concepts : Application to Biology Ferré, Arnaud 24 May 2019 (has links) L'augmentation considérable de la quantité des données textuelles rend aujourd’hui difficile leur analyse sans l’assistance d’outils. Or, un texte rédigé en langue naturelle est une donnée non-structurée, c’est-à-dire qu’elle n’est pas interprétable par un programme informatique spécialisé, sans lequel les informations des textes restent largement sous-exploitées. Parmi les outils d’extraction automatique d’information, nous nous intéressons aux méthodes d’interprétation automatique de texte pour la tâche de normalisation d’entité qui consiste en la mise en correspondance automatique des mentions d’entités de textes avec des concepts d’un référentiel. Pour réaliser cette tâche, nous proposons une nouvelle approche par alignement de deux types de représentations vectorielles d’entités capturant une partie de leur sens : les plongements lexicaux pour les mentions textuelles et des “plongements ontologiques” pour les concepts, conçus spécifiquement pour ce travail. L’alignement entre les deux se fait par apprentissage supervisé. Les méthodes développées ont été évaluées avec un jeu de données de référence du domaine biologique et elles représentent aujourd’hui l’état de l’art pour ce jeu de données. Ces méthodes sont intégrées dans une suite logicielle de traitement automatique des langues et les codes sont partagés librement. / The impressive increase in the quantity of textual data makes it difficult today to analyze them without the assistance of tools. However, a text written in natural language is unstructured data, i.e. it cannot be interpreted by a specialized computer program, without which the information in the texts remains largely under-exploited. Among the tools for automatic extraction of information from text, we are interested in automatic text interpretation methods for the entity normalization task that consists in automatically matching text entitiy mentions to concepts in a reference terminology. To accomplish this task, we propose a new approach by aligning two types of vector representations of entities that capture part of their meanings: word embeddings for text mentions and concept embeddings for concepts, designed specifically for this work. The alignment between the two is done through supervised learning. The developed methods have been evaluated on a reference dataset from the biological domain and they now represent the state of the art for this dataset. These methods are integrated into a natural language processing software suite and the codes are freely shared. Extraction d’information Normalisation Plongement lexical Intelligence artificielle Traitement automatique des langues Information extraction Normalization Word embedding Artificial intelligence Natural language processing
17	Knowledge Integration and Representation for Biomedical Analysis Alachram, Halima 04 February 2021 (has links) No description available. 510 Data integration Knowledge representation Biomedical ontologies Text mining Word embedding Machine learning Biomedical analysis Informatik (PPN619939052)
18	Text ranking based on semantic meaning of sentences / Textrankning baserad på semantisk betydelse hos meningar Stigeborn, Olivia January 2021 (has links) Finding a suitable candidate to client match is an important part of consultant companies work. It takes a lot of time and effort for the recruiters at the company to read possibly hundreds of resumes to find a suitable candidate. Natural language processing is capable of performing a ranking task where the goal is to rank the resumes with the most suitable candidates ranked the highest. This ensures that the recruiters are only required to look at the top ranked resumes and can quickly get candidates out in the field. Former research has used methods that count specific keywords in resumes and can make decisions on whether a candidate has an experience or not. The main goal of this thesis is to use the semantic meaning of the text in the resumes to get a deeper understanding of a candidate’s level of experience. It also evaluates if the model is possible to run on-device and if the database can contain a mix of English and Swedish resumes. An algorithm was created that uses the word embedding model DistilRoBERTa that is capable of capturing the semantic meaning of text. The algorithm was evaluated by generating job descriptions from the resumes by creating a summary of each resume. The run time, memory usage and the ranking the wanted candidate achieved was documented and used to analyze the results. When the candidate who was used to generate the job description is ranked in the top 10 the classification was considered to be correct. The accuracy was calculated using this method and an accuracy of 68.3% was achieved. The results show that the algorithm is capable of ranking resumes. The algorithm is able to rank both Swedish and English resumes with an accuracy of 67.7% for Swedish resumes and 74.7% for English. The run time was fast enough at an average of 578 ms but the memory usage was too large to make it possible to use the algorithm on-device. In conclusion the semantic meaning of resumes can be used to rank resumes and possible future work would be to combine this method with a method that counts keywords to research if the accuracy would increase. / Att hitta en lämplig kandidat till kundmatchning är en viktig del av ett konsultföretags arbete. Det tar mycket tid och ansträngning för rekryterare på företaget att läsa eventuellt hundratals CV:n för att hitta en lämplig kandidat. Det finns språkteknologiska metoder för att rangordna CV:n med de mest lämpliga kandidaterna rankade högst. Detta säkerställer att rekryterare endast behöver titta på de topprankade CV:erna och snabbt kan få kandidater ut i fältet. Tidigare forskning har använt metoder som räknar specifika nyckelord i ett CV och är kapabla att avgöra om en kandidat har specifika erfarenheter. Huvudmålet med denna avhandling är att använda den semantiska innebörden av texten iCV:n för att få en djupare förståelse för en kandidats erfarenhetsnivå. Den utvärderar också om modellen kan köras på mobila enheter och om algoritmen kan rangordna CV:n oberoende av om CV:erna är på svenska eller engelska. En algoritm skapades som använder ordinbäddningsmodellen DistilRoBERTa som är kapabel att fånga textens semantiska betydelse. Algoritmen utvärderades genom att generera jobbeskrivningar från CV:n genom att skapa en sammanfattning av varje CV. Körtiden, minnesanvändningen och rankningen som den önskade kandidaten fick dokumenterades och användes för att analysera resultatet. När den kandidat som användes för att generera jobbeskrivningen rankades i topp 10 ansågs klassificeringen vara korrekt. Noggrannheten beräknades med denna metod och en noggrannhet på 68,3 % uppnåddes. Resultaten visar att algoritmen kan rangordna CV:n. Algoritmen kan rangordna både svenska och engelska CV:n med en noggrannhet på 67,7 % för svenska och 74,7 % för engelska. Körtiden var i genomsnitt 578 ms vilket skulle möjliggöra att algoritmen kan köras på mobila enheter men minnesanvändningen var för stor. Sammanfattningsvis kan den semantiska betydelsen av CV:n användas för att rangordna CV:n och ett eventuellt framtida arbete är att kombinera denna metod med en metod som räknar nyckelord för att undersöka hur noggrannheten skulle påverkas. Natural language processing Word Embedding Resume Ranking Semantic meaning Språkteknologi Ordinbäddning CV rankning Semantisk betydelse Computer Sciences Datavetenskap (datalogi)
19	Addressing Semantic Interoperability and Text Annotations. Concerns in Electronic Health Records using Word Embedding, Ontology and Analogy Naveed, Arjmand January 2021 (has links) Electronic Health Record (EHR) creates a huge number of databases which are being updated dynamically. Major goal of interoperability in healthcare is to facilitate the seamless exchange of healthcare related data and an environment to supports interoperability and secure transfer of data. The health care organisations face difficulties in exchanging patient’s health care information and laboratory reports etc. due to a lack of semantic interoperability. Hence, there is a need of semantic web technologies for addressing healthcare interoperability problems by enabling various healthcare standards from various healthcare entities (doctors, clinics, hospitals etc.) to exchange data and its semantics which can be understood by both machines and humans. Thus, a framework with a similarity analyser has been proposed in the thesis that dealt with semantic interoperability. While dealing with semantic interoperability, another consideration was the use of word embedding and ontology for knowledge discovery. In medical domain, the main challenge for medical information extraction system is to find the required information by considering explicit and implicit clinical context with high degree of precision and accuracy. For semantic similarity of medical text at different levels (conceptual, sentence and document level), different methods and techniques have been widely presented, but I made sure that the semantic content of a text that is presented includes the correct meaning of words and sentences. A comparative analysis of approaches included ontology followed by word embedding or vice-versa have been applied to explore the methodology to define which approach gives better results for gaining higher semantic similarity. Selecting the Kidney Cancer dataset as a use case, I concluded that both approaches work better in different circumstances. However, the approach in which ontology is followed by word embedding to enrich data first has shown better results. Apart from enriching the EHR, extracting relevant information is also challenging. To solve this challenge, the concept of analogy has been applied to explain similarities between two different contents as analogies play a significant role in understanding new concepts. The concept of analogy helps healthcare professionals to communicate with patients effectively and help them understand their disease and treatment. So, I utilised analogies in this thesis to support the extraction of relevant information from the medical text. Since accessing EHR has been challenging, tweets text is used as an alternative for EHR as social media has appeared as a relevant data source in recent years. An algorithm has been proposed to analyse medical tweets based on analogous words. The results have been used to validate the proposed methods. Two experts from medical domain have given their views on the proposed methods in comparison with the similar method named as SemDeep. The quantitative and qualitative results have shown that the proposed analogy-based method bring diversity and are helpful in analysing the specific disease or in text classification. Electronic Health Record (EHR) Semantic annotations Word embedding Ontology Analogy Artificial intelligence (AI) Knowledge discovery Semantic interoperability
20	Modeling Customers and Products with Word Embeddings from Receipt Data Woltmann, Lucas, Thiele, Maik, Lehner, Wolfgang 15 September 2022 (has links) For many tasks in market research it is important to model customers and products as comparable instances. Usually, the integration of customers and products into one model is not trivial. In this paper, we will detail an approach for a combined vector space of customers and products based on word embeddings learned from receipt data. To highlight the strengths of this approach we propose four different applications: recommender systems, customer and product segmentation and purchase prediction. Experimental results on a real-world dataset with 200M order receipts for 2M customers show that our word embedding approach is promising and helps to improve the quality in these applications scenarios. info:eu-repo/classification/ddc/004 ddc:004

Search results