Global ETD Search

11	Weighting Edit Distance to Improve Spelling Correction in Music Entity Search / Viktat ändringsavstånd för förbättrad stavningskorrigering vid sökning i en musikdatabas Samuelsson, Axel January 2017 (has links) This master’s thesis project undertook investigation of whether the extant Damerau- Levenshtein edit distance measurement between two strings could be made more useful for detecting and adjusting misspellings in a search query. The idea was to use the knowledge that many users type their queries using the QWERTY keyboard layout, and weighting the edit distance in a manner that makes it cheaper to correct misspellings caused by confusion of nearer keys. Two different weighting approaches were tested, one with a linear spread from 2/9 to 2 depending on the keyboard distance, and the other had neighbors preferred over non-neighbors (either with half the cost or no cost at all). They were tested against an unweighted baseline as well as inverted versions of themselves (nearer keys more expensive to replace) against a dataset of 1,162,145 searches. No significant improvement in the retrieval of search results were observed when compared to the baseline. However, each of the weightings performed better than its corresponding inversion on a p < 0.05 significance level. This means that while the weighted edit distance did not outperform the baseline, the data still clearly points toward a correlation between the physical position of keys on the keyboard, and what spelling mistakes are made. / Detta examensarbete åtog sig att undersöka om det etablerade Damerau-Levenshtein-avståndet som mäter avståndet kan anpassas för att bättre hitta och korrigera stavningsfel i sökfrågor. Tanken var att använda det faktum att många användare skriver sina sökfrågor på ett tangentbord med QWERTY-layout, och att vikta ändrings- avståndet så att det blir billigare att korrigera stavfel orsakade av hopblandning av två knappar som är närmare varandra. Två olika viktningar testades, en hade vikterna utspridda linjärt mellan 2/9 och 2, och den andra föredrog grannar över icke-grannar (antingen halva kostnaden eller ingen alls). De testades mot ett oviktat referensavstånd samt inversen av sig själva (så att närmare knappar blev dyrare att byta ut) mot ett dataset bestående av 1 162 145 sökningar. Ingen signifikant förbättring uppmättes gentemot referensen. Däremot presterade var och en av viktningarna bättre än sin inverterade motpart på konfidensnivå p < 0,05. Det innebär att trots att de viktade distansavstånden inte presterade bättre än referensen så pekar datan tydligt mot en korrelation mellan den fysiska positioneringen av knapparna på tangentbordet och vilka stavningsmisstag som begås. Spelling correction edit distance search music spotify trie Damerau Levenshtein Computer Sciences Datavetenskap (datalogi)
12	A Performance Analysis Framework for Coreference Resolution Algorithms Patel, Chandankumar Johakhim 29 August 2016 (has links) No description available. Computer Science Computer Engineering Coreference Levenshtein LCS OWL RDF longest common substring jaro winkler
13	Efficient number similarity check Simonsson, David January 2024 (has links) Efficiency in algorithms is important, especially in terms of execution time, as it directly impacts user experience. For example, when a customer visits a website, even a mere one-second delay can significantly reduce their patience, and the likelihood of them abandoning the site increases. This principle applies to search algorithms as well. This project is about implementing a time-efficient tree-based search algorithm that focuses on finding similarities between search input and stored data. The objective is to achieve an execution time as close to O(1) regardless of the data size. The implemented algorithm will be compared with a linear search algorithm, which has an execution time that grows along with the data size. By measuring the executiontimes of both search methods, the project aims to demonstrate the superiority of the tree-based search algorithm in terms of time efficiency. Data set number sequence string similarity Levenshtein distance tree-based search Computer Engineering Datorteknik
14	Concept Based Knowledge Discovery from Biomedical Literature. Radovanovic, Aleksandar. January 2009 (has links) <p>This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology&nbsp / resented can be integrated with the researchers&rsquo / own knowledge, experimentation and observations for optimal progression of scientific research.</p> Bioinformatics Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning.
15	Concept Based Knowledge Discovery from Biomedical Literature. Radovanovic, Aleksandar. January 2009 (has links) <p>This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology&nbsp / resented can be integrated with the researchers&rsquo / own knowledge, experimentation and observations for optimal progression of scientific research.</p> Bioinformatics Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning.
16	Cell Formation: A Real Life Application Uyanik, Basar 01 September 2005 (has links) (PDF) In this study, the plant layout problem of a worldwide Printed Circuit Board (PCB) producer company is analyzed. Machines are grouped into cells using grouping methodologies of Tabular Algorithm, K-means clustering algorithm, and Hierarchical grouping with Levenshtein distances. Production plant layouts, which are formed by using different techniques, are evaluated using technical and economical indicators. TA Engineering Meteorology 197-198
17	Concept Based Knowledge Discovery from Biomedical Literature Radovanovic, Aleksandar. January 2009 (has links) Philosophiae Doctor - PhD / This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology, resented can be integrated with the researchers own knowledge, experimentation and observations for optimal progression of scientific research. / South Africa Bioinformatics Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning
18	Concept Based Knowledge Discovery From Biomedical Literature Radovanovic, Aleksandar January 2009 (has links) Philosophiae Doctor - PhD / Advancement in biomedical research and continuous growth of scientific literature available in electronic form, calls for innovative methods and tools for information management, knowledge discovery, and data integration. Many biomedical fields such as genomics, proteomics, metabolomics, genetics, and emerging disciplines like systems biology and conceptual biology require synergy between experimental, computational, data mining and text mining technologies. A large amount of biomedical information available in various repositories, such as the US National Library of Medicine Bibliographic Database, emerge as a potential source of textual data for knowledge discovery. Text mining and its application of natural language processing and machine learning technologies to problems of knowledge discovery, is one of the most challenging fields in bioinformatics. This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology presented can be integrated with the researchers' own knowledge, experimentation and observations for optimal progression of scientific research. Bioinformaties Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning
19	A Rule-Based Normalization System for Greek Noisy User-Generated Text Toska, Marsida January 2020 (has links) The ever-growing usage of social media platforms generates daily vast amounts of textual data which could potentially serve as a great source of information. Therefore, mining user-generated data for commercial, academic, or other purposes has already attracted the interest of the research community. However, the informal writing which often characterizes online user-generated texts poses a challenge for automatic text processing with Natural Language Processing (NLP) tools. To mitigate the effect of noise in these texts, lexical normalization has been proposed as a preprocessing method which in short is the task of converting non-standard word forms into a canonical one. The present work aims to contribute to this field by developing a rule-based normalization system for Greek Tweets. We perform an analysis of the categories of the out-of-vocabulary (OOV) word forms identified in the dataset and define hand-crafted rules which we combine with edit distance (Levenshtein distance approach) to tackle noise in the cases under scope. To evaluate the performance of the system we perform both an intrinsic and an extrinsic evaluation in order to explore the effect of normalization on the part-of-speech-tagging. The results of the intrinsic evaluation suggest that our system has an accuracy of approx. 95% compared to approx. 81% for the baseline. In the extrinsic evaluation, it is observed a boost of approx. 8% in the tagging performance when the text has been preprocessed through lexical normalization. nlp noisy text preprocessing rule-based levenshtein twitter normalization Greek
20	Spelling Normalisation and Linguistic Analysis of Historical Text for Information Extraction Pettersson, Eva January 2016 (has links) Historical text constitutes a rich source of information for historians and other researchers in humanities. Many texts are however not available in an electronic format, and even if they are, there is a lack of NLP tools designed to handle historical text. In my thesis, I aim to provide a generic workflow for automatic linguistic analysis and information extraction from historical text, with spelling normalisation as a core component in the pipeline. In the spelling normalisation step, the historical input text is automatically normalised to a more modern spelling, enabling the use of existing taggers and parsers trained on modern language data in the succeeding linguistic analysis step. In the final information extraction step, certain linguistic structures are identified based on the annotation labels given by the NLP tools, and ranked in accordance with the specific information need expressed by the user. An important consideration in my implementation is that the pipeline should be applicable to different languages, time periods, genres, and information needs by simply substituting the language resources used in each module. Furthermore, the reuse of existing NLP tools developed for the modern language is crucial, considering the lack of linguistically annotated historical data combined with the high variability in historical text, making it hard to train NLP tools specifically aimed at analysing historical text. In my evaluation, I show that spelling normalisation can be a very useful technique for easy access to historical information content, even in cases where there is little (or no) annotated historical training data available. For the specific information extraction task of automatically identifying verb phrases describing work in Early Modern Swedish text, 91 out of the 100 top-ranked instances are true positives in the best setting. NLP for historical text spelling normalisation digital humanities information extraction SMT Levenshtein edit distance language technology computational linguistics

Search results