Global ETD Search

1	Semantic Text Matching Using Convolutional Neural Networks Wang, Run Fen January 2018 (has links) Semantic text matching is a fundamental task for many applications in NaturalLanguage Processing (NLP). Traditional methods using term frequencyinversedocument frequency (TF-IDF) to match exact words in documentshave one strong drawback which is TF-IDF is unable to capture semanticrelations between closely-related words which will lead to a disappointingmatching result. Neural networks have recently been used for various applicationsin NLP, and achieved state-of-the-art performances on many tasks.Recurrent Neural Networks (RNN) have been tested on text classificationand text matching, but it did not gain any remarkable results, which is dueto RNNs working more effectively on texts with a short length, but longdocuments. In this paper, Convolutional Neural Networks (CNN) will beapplied to match texts in a semantic aspect. It uses word embedding representationsof two texts as inputs to the CNN construction to extract thesemantic features between the two texts and give a score as the output ofhow certain the CNN model is that they match. The results show that aftersome tuning of the parameters the CNN model could produce accuracy,prediction, recall and F1-scores all over 80%. This is a great improvementover the previous TF-IDF results and further improvements could be madeby using dynamic word vectors, better pre-processing of the data, generatelarger and more feature rich data sets and further tuning of the parameters. Text matching CNN TF-IDF Word embedding Word2vec NLP
2	Automatiserad matchning av relaterad data från olika datakällor / Automated matching of related data from different data sources Harch, Gais, Ullström, Robin January 2014 (has links) Sociala medier innehåller idag massor av information som kan bidra till att ge applikationer och produkter ett stort mervärde genom att ge en förbättrad användarupplevelse. I vissa fall kan sådan information inte erhållas utan att först matcha data från en eller flera datakällor genom en data fusion. Eniro Initiatives AB vill undersöka möjligheter för att genomföra en automatiserad data fusion genom att koppla företag från sitt API till motsvarande företag på sociala medier. Problematiken ligger i att den enda helt säkra källan till matchning av alla svenska företag är dess organisationsnummer, vilket är data som inte finns tillgänglig hos API:er från utländska företag. Syftet var att undersöka möjligheter för att på automatiserat sätt kunna matcha relaterad data från olika datakällor. I detta examensarbete har en prototyp utvecklats som matchar företag från Eniros API med företags sidor från Facebooks API. Resultatet från tester av denna prototyp visar dock brister, då det uppkom fall där redundant information bidrog till att prototypen kunde godkänna inofficiella sidor med koppling till det relevanta företaget, vilket inte var önskvärt. / Social media today contains a lot of information that can add a great value for applications and products by achieve an improved user experience. In some cases, such information cannot be obtained without matching data from one or several data sources through a data fusion. Eniro Initiatives AB wants to explore opportunities to implement an automated data fusion model by matching companies from its own API to the corresponding company on social media. The problem is that the only completely secured data of matching of all Swedish companies is its corporate identity, which is data that is not available with APIs that origin from foreign companies. The aim was to explore possibilities for the automated way to match related data from different data sources. In this thesis, a prototype was developed to match companies from Eniro’s API with company pages from Facebook's API. The results from the tests of this prototype shows small deficiencies where redundant information made the prototype able to approve unofficial pages with links to the relevant company, which was not desirable. data fusion text recognition text matching company data data fusion text recognition textmatchning företagsdata Computer Engineering Datorteknik
3	An Extension of The Berry-Ravindran Algorithm for protein and DNA data Riekkola, Jesper January 2022 (has links) String matching algorithms are the algorithms used to search through different types of text in search of a certain pattern. Many of these algorithms achieve their impressive performance by analysing the pattern and saving that information. That information is then continuously used during the searching phase to know what parts of the text can be skipped. One such algorithm is the Berry-Ravindran. The Berry-Ravindran checks the two characters past the current try for a match and sees if those characters exist in the pattern. This thesis compares the Berry-Ravindran algorithm to new versions of itself that check three and four characters instead of two, along with the Boyer-Moore algorithm. Checking more characters improves the amount of the text that can be skipped by reducing the number of attempts needed but exponentially increases the pre-processing time. The improved performance in attempts does not necessarily mean a faster run-time because of the increased pre-processing time. The variable impacting the pre-processing time the biggest is the size of the alphabet that the text uses. This is researched by testing these algorithms with patterns ranging from 4 to 100 characters long on two different data sets. Protein data which has an alphabet size of 27 and DNA data which has an alphabet size of 4. Berry-Ravindran text matching string matching exact-matching string algorithms pattern matching Berry-Ravindran textmatchning strängmatchning Engineering and Technology Teknik och teknologier

1

Page generated in 0.0694 seconds