Spelling suggestions: "subject:"deembedding"" "subject:"disembedding""
261 |
Text ranking based on semantic meaning of sentences / Textrankning baserad på semantisk betydelse hos meningarStigeborn, Olivia January 2021 (has links)
Finding a suitable candidate to client match is an important part of consultant companies work. It takes a lot of time and effort for the recruiters at the company to read possibly hundreds of resumes to find a suitable candidate. Natural language processing is capable of performing a ranking task where the goal is to rank the resumes with the most suitable candidates ranked the highest. This ensures that the recruiters are only required to look at the top ranked resumes and can quickly get candidates out in the field. Former research has used methods that count specific keywords in resumes and can make decisions on whether a candidate has an experience or not. The main goal of this thesis is to use the semantic meaning of the text in the resumes to get a deeper understanding of a candidate’s level of experience. It also evaluates if the model is possible to run on-device and if the database can contain a mix of English and Swedish resumes. An algorithm was created that uses the word embedding model DistilRoBERTa that is capable of capturing the semantic meaning of text. The algorithm was evaluated by generating job descriptions from the resumes by creating a summary of each resume. The run time, memory usage and the ranking the wanted candidate achieved was documented and used to analyze the results. When the candidate who was used to generate the job description is ranked in the top 10 the classification was considered to be correct. The accuracy was calculated using this method and an accuracy of 68.3% was achieved. The results show that the algorithm is capable of ranking resumes. The algorithm is able to rank both Swedish and English resumes with an accuracy of 67.7% for Swedish resumes and 74.7% for English. The run time was fast enough at an average of 578 ms but the memory usage was too large to make it possible to use the algorithm on-device. In conclusion the semantic meaning of resumes can be used to rank resumes and possible future work would be to combine this method with a method that counts keywords to research if the accuracy would increase. / Att hitta en lämplig kandidat till kundmatchning är en viktig del av ett konsultföretags arbete. Det tar mycket tid och ansträngning för rekryterare på företaget att läsa eventuellt hundratals CV:n för att hitta en lämplig kandidat. Det finns språkteknologiska metoder för att rangordna CV:n med de mest lämpliga kandidaterna rankade högst. Detta säkerställer att rekryterare endast behöver titta på de topprankade CV:erna och snabbt kan få kandidater ut i fältet. Tidigare forskning har använt metoder som räknar specifika nyckelord i ett CV och är kapabla att avgöra om en kandidat har specifika erfarenheter. Huvudmålet med denna avhandling är att använda den semantiska innebörden av texten iCV:n för att få en djupare förståelse för en kandidats erfarenhetsnivå. Den utvärderar också om modellen kan köras på mobila enheter och om algoritmen kan rangordna CV:n oberoende av om CV:erna är på svenska eller engelska. En algoritm skapades som använder ordinbäddningsmodellen DistilRoBERTa som är kapabel att fånga textens semantiska betydelse. Algoritmen utvärderades genom att generera jobbeskrivningar från CV:n genom att skapa en sammanfattning av varje CV. Körtiden, minnesanvändningen och rankningen som den önskade kandidaten fick dokumenterades och användes för att analysera resultatet. När den kandidat som användes för att generera jobbeskrivningen rankades i topp 10 ansågs klassificeringen vara korrekt. Noggrannheten beräknades med denna metod och en noggrannhet på 68,3 % uppnåddes. Resultaten visar att algoritmen kan rangordna CV:n. Algoritmen kan rangordna både svenska och engelska CV:n med en noggrannhet på 67,7 % för svenska och 74,7 % för engelska. Körtiden var i genomsnitt 578 ms vilket skulle möjliggöra att algoritmen kan köras på mobila enheter men minnesanvändningen var för stor. Sammanfattningsvis kan den semantiska betydelsen av CV:n användas för att rangordna CV:n och ett eventuellt framtida arbete är att kombinera denna metod med en metod som räknar nyckelord för att undersöka hur noggrannheten skulle påverkas.
|
262 |
A Comparative study of Knowledge Graph Embedding Models for use in Fake News DetectionFrimodig, Matilda, Lanhed Sivertsson, Tom January 2021 (has links)
During the past few years online misinformation, generally referred to as fake news, has been identified as an increasingly dangerous threat. As the spread of misinformation online has increased, fake news detection has become an active line of research. One approach is to use knowledge graphs for the purpose of automated fake news detection. While large scale knowledge graphs are openly available these are rarely up to date, often missing the relevant information needed for the task of fake news detection. Creating new knowledge graphs from online sources is one way to obtain the missing information. However extracting information from unstructured text is far from straightforward. Using Natural Language Processing techniques we developed a pre-processing pipeline for extracting information from text for the purpose of creating knowledge graphs. In order to classify news as fake or not fake with the use of knowledge graphs, these need to be converted into a machine understandable format, called knowledge graph embeddings. These embeddings also allow new information to be inferred or classified based on the already existing information in the knowledge graph. Only one knowledge graph embedding model has previously been used for the purpose of fake news detection while several new models have recently been developed. We compare the performance of three different embedding models, all relying on different fundamental architectures, in the specific context of fake news detection. The models used were the geometric model TransE, the tensor decomposition model ComplEx and the deep learning model ConvKB. The results of this study shows that out of the three models, ConvKB is the best performing. However other aspects than performance need to be considered and as such these results do not necessarily mean that a deep learning approach is the most suitable for real world fake news detection.
|
263 |
Generative modelling and inverse problem solving for networks in hyperbolic spaceMuscoloni, Alessandro 12 August 2019 (has links)
The investigation of the latent geometrical space behind complex network topologies is a fervid topic in current network science and the hyperbolic space is one of the most studied, because it seems associated to the structural organization of many real complex systems. The popularity-similarity-optimization (PSO) generative model is able to grow random geometric graphs in the hyperbolic space with realistic properties such as clustering, small-worldness, scale-freeness and rich-clubness. However, it misses to reproduce an important feature of real complex systems, which is the community organization. Here, we introduce the nonuniform PSO (nPSO) generative model, a generalization of the PSO model with a tailored community structure, and we provide an efficient algorithmic implementation with a O(EN) time complexity, where N is the number of nodes and E the number of edges. Meanwhile, in recent years, the inverse problem has also gained increasing attention: given a network topology, how to provide an accurate mapping into its latent geometrical space. Unlike previous attempts based on a computationally expensive maximum likelihood optimization (whose time complexity is between O(N^3) and O(N^4)), here we show that a class of methods based on nonlinear dimensionality reduction can solve the problem with higher precision and reducing the time complexity to O(N^2).
|
264 |
Capturing Knowledge of Emerging Entities from the Extended Search SnippetsNgwobia, Sunday C. January 2019 (has links)
No description available.
|
265 |
Embedded Corporate Sustainability as a Driver for CompetitivenessNordin, Neda January 2015 (has links)
Sustainability is increasingly requested by society due to rising global issues. However the majority of companies, particularly the large and hierarchical ones, face huge challenges in properly integrating sustainability in their business, and most importantly - in understanding the opportunities sustainability offers both for their competitiveness and shared value creation. The purpose of the thesis is to holistically define and explain the major aspects that are critical for win-win corporate sustainability (CS) embedding into large established companies. Firstly, a framework of CS embedding has been developed which is supported by a simple CS three-stage model to be used in assessing the CS integration stages and processes in a company. The framework in particular focuses on three major CS aspects: strategic and operational integration, innovation, and organisational culture. Secondly, the created model is applied in the case study of the large power company Vattenfall AB in order to assess its overall CS implementation situation, the challenges it faces and the stage of CS practices. The analysis resulted in structured findings and a list of major strategic recommendations to advice the company on CS advancement. The outcomes of the study can be applied as learning material in other large conservative companies of similar complexity that struggle with sustainability performance. The research has contributed in filling the knowledge gap of understanding how CS embedding works, its major aspects, challenges and opportunities it provides. The framework developed for embedding CS when used in conjunction with the CS three-stage model could be used for further empirical research or alternatively for practical application by companies themselves.
|
266 |
A graph representation of event intervals for efficient clustering and classification / En grafrepresentation av händelsesintervall föreffektiv klustering och klassificeringLee, Zed Heeje January 2020 (has links)
Sequences of event intervals occur in several application domains, while their inherent complexity hinders scalable solutions to tasks such as clustering and classification. In this thesis, we propose a novel spectral embedding representation of event interval sequences that relies on bipartite graphs. More concretely, each event interval sequence is represented by a bipartite graph by following three main steps: (1) creating a hash table that can quickly convert a collection of event interval sequences into a bipartite graph representation, (2) creating and regularizing a bi-adjacency matrix corresponding to the bipartite graph, (3) defining a spectral embedding mapping on the bi-adjacency matrix. In addition, we show that substantial improvements can be achieved with regard to classification performance through pruning parameters that capture the nature of the relations formed by the event intervals. We demonstrate through extensive experimental evaluation on five real-world datasets that our approach can obtain runtime speedups of up to two orders of magnitude compared to other state-of-the-art methods and similar or better clustering and classification performance. / Sekvenser av händelsesintervall förekommer i flera applikationsdomäner, medan deras inneboende komplexitet hindrar skalbara lösningar på uppgifter som kluster och klassificering. I den här avhandlingen föreslår vi en ny spektral inbäddningsrepresentation av händelsens intervallsekvenser som förlitar sig på bipartitgrafer. Mer konkret representeras varje händelsesintervalsekvens av en bipartitgraf genom att följa tre huvudsteg: (1) skapa en hashtabell som snabbt kan konvertera en samling händelsintervalsekvenser till en bipartig grafrepresentation, (2) skapa och reglera en bi-adjacency-matris som motsvarar bipartitgrafen, (3) definiera en spektral inbäddning på bi-adjacensmatrisen. Dessutom visar vi att väsentliga förbättringar kan uppnås med avseende på klassificeringsprestanda genom beskärningsparametrar som fångar arten av relationerna som bildas av händelsesintervallen. Vi demonstrerar genom omfattande experimentell utvärdering på fem verkliga datasätt att vår strategi kan erhålla runtime-hastigheter på upp till två storlekar jämfört med andra modernaste metoder och liknande eller bättre kluster- och klassificerings- prestanda.
|
267 |
Addressing Semantic Interoperability and Text Annotations. Concerns in Electronic Health Records using Word Embedding, Ontology and AnalogyNaveed, Arjmand January 2021 (has links)
Electronic Health Record (EHR) creates a huge number of databases which are
being updated dynamically. Major goal of interoperability in healthcare is to
facilitate the seamless exchange of healthcare related data and an environment
to supports interoperability and secure transfer of data. The health care
organisations face difficulties in exchanging patient’s health care information
and laboratory reports etc. due to a lack of semantic interoperability. Hence,
there is a need of semantic web technologies for addressing healthcare
interoperability problems by enabling various healthcare standards from various
healthcare entities (doctors, clinics, hospitals etc.) to exchange data and its
semantics which can be understood by both machines and humans. Thus, a
framework with a similarity analyser has been proposed in the thesis that dealt
with semantic interoperability. While dealing with semantic interoperability,
another consideration was the use of word embedding and ontology for
knowledge discovery. In medical domain, the main challenge for medical
information extraction system is to find the required information by considering
explicit and implicit clinical context with high degree of precision and accuracy.
For semantic similarity of medical text at different levels (conceptual, sentence
and document level), different methods and techniques have been widely
presented, but I made sure that the semantic content of a text that is presented
includes the correct meaning of words and sentences. A comparative analysis
of approaches included ontology followed by word embedding or vice-versa
have been applied to explore the methodology to define which approach gives
better results for gaining higher semantic similarity. Selecting the Kidney Cancer
dataset as a use case, I concluded that both approaches work better in different circumstances. However, the approach in which ontology is followed by word
embedding to enrich data first has shown better results. Apart from enriching
the EHR, extracting relevant information is also challenging. To solve this
challenge, the concept of analogy has been applied to explain similarities
between two different contents as analogies play a significant role in
understanding new concepts. The concept of analogy helps healthcare
professionals to communicate with patients effectively and help them
understand their disease and treatment. So, I utilised analogies in this thesis to
support the extraction of relevant information from the medical text. Since
accessing EHR has been challenging, tweets text is used as an alternative for
EHR as social media has appeared as a relevant data source in recent years.
An algorithm has been proposed to analyse medical tweets based on analogous
words. The results have been used to validate the proposed methods. Two
experts from medical domain have given their views on the proposed methods
in comparison with the similar method named as SemDeep. The quantitative
and qualitative results have shown that the proposed analogy-based method
bring diversity and are helpful in analysing the specific disease or in text
classification.
|
268 |
Drug Repositioning through the Development of Diverse Computational Methods using Machine Learning, Deep Learning, and Graph MiningThafar, Maha A. 30 June 2022 (has links)
The rapidly increasing number of existing drugs with genomic, biomedical, and pharmacological data make computational analyses possible, which reduces the search space for drugs and facilitates drug repositioning (DR). Thus, artificial intelligence, machine learning, and data mining have been used to identify biological interactions such as drug-target interactions (DTI), drug-disease associations, and drug-response. The prediction of these biological interactions is seen as a critical phase needed to make drug development more sustainable. Furthermore, late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed. In this dissertation, we tried to address three crucial problems associated with the DR pipeline and presents several novel computational methods developed for DR.
First, we developed three network-based DTI prediction methods using machine learning, graph embedding, and graph mining. These methods significantly improved prediction performance, and the best-performing method reduces the error rate by more than 33% across all datasets compared to the best state-of-the-art method. Second, because it is more insightful to predict continuous values that indicate how tightly the drug binds to a specific target, we conducted a comparison study of current regression-based methods that predict drug-target binding affinities (DTBA). We discussed how to develop more robust DTBA methods and subsequently developed Affinity2Vec, the first regression-based method that formulates the entire task as a graph-based method and combines several computational techniques from feature representation learning, graph mining, and machine learning with no 3D structural data of proteins. Affinity2Vec outperforms the state-of-the-art methods. Finally, since drug development failure is associated with sub-optimal target identification, we developed the first DL-based computational method (OncoRTT) to identify cancer-specific therapeutic targets for the ten most common cancers worldwide. Implementing our approach required creating a suitable dataset that could be used by the computational method to identify oncology-related DTIs. Thus, we created the OncologyTT datasets to build and evaluate our OncoRTT method. Our methods demonstrated their efficiency by achieving high prediction performance and identifying therapeutic targets for several cancer types.
Overall, in this dissertation, we developed several computational methods to solve biomedical domain problems, specifically drug repositioning, and demonstrated their efficiencies and capabilities.
|
269 |
Modeling Customers and Products with Word Embeddings from Receipt DataWoltmann, Lucas, Thiele, Maik, Lehner, Wolfgang 15 September 2022 (has links)
For many tasks in market research it is important to model customers and products as comparable instances. Usually, the integration of customers and products into one model is not trivial. In this paper, we will detail an approach for a combined vector space of customers and products based on word embeddings learned from receipt data. To highlight the strengths of this approach we propose four different applications: recommender systems, customer and product segmentation and purchase prediction. Experimental results on a real-world dataset with 200M order receipts for 2M customers show that our word embedding approach is promising and helps to improve the quality in these applications scenarios.
|
270 |
Physically-Based Realizable Modeling and Network Synthesis of Subscriber Loops Utilized in DSL TechnologyYoho, Jason Jon 07 December 2001 (has links)
Performance analysis of Digital Subscriber Line (DSL) technologies, which are implemented on existing telephone subscriber loops, is of vital importance to DSL service providers. This type of analysis requires accurate prediction of the local loop structure and precise identification of the cable parameters. These cables are the main components of the loop and are typically comprised of multi-conductor twisted pair type currently being used on existing telephone subscriber loops. This system identification problem was investigated through the application of single port measurements, with preference being placed on measurements taken from the service provider's end of the loop under investigation. Once the cabling system has been identified, the performance analysis of the loop was obtained through simulation.
Accurate modeling is an important aspect of any system identification solution; therefore, the modeling of the twisted pair cables was thoroughly investigated in this research. Early modeling attempts of twisted pair cabling systems for use with (DSL) technology has not been vigorously investigated due to the difficulty in obtaining wideband physical data necessary for the task as well as the limitations of simulators to accurately model the skin effects of the conductors. Models are developed in this research that produce a wideband representation of the twisted pair cables through the use of the data measured in high frequency spectra.
The twisted-pair cable models were then applied to the system identification problem through a de-embedding type approach. The identification process accurately characterizes the sections of the subscriber loop closest to the measurements node, and these identified sections were then modeled and de-embedded from the system measurement in a layer removing, or "peeling", type process. After each identified section was de-embedded from the system measurement, the process was repeated until the entire system was identified.
Upon completion of the system identification process, the resulting system model was simulated between the central office (CO) and resulting identified customer nodes for the evaluation of performance analysis. The performance analysis allows the providers to identify points where the DSL technology is feasible, and where so, the rates of the data transfer to the nodes that can be expected. / Ph. D.
|
Page generated in 0.1078 seconds