Global ETD Search

771	Concept Based Knowledge Discovery from Biomedical Literature. Radovanovic, Aleksandar. January 2009 (has links) <p>This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology&nbsp / resented can be integrated with the researchers&rsquo / own knowledge, experimentation and observations for optimal progression of scientific research.</p> Bioinformatics Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning.
772	Konzeption eines dreistufigen Transfers für die maschinelle Übersetzung natürlicher Sprachen Laube, Annett, Karl, Hans-Ulrich 14 December 2012 (has links) (PDF) 0 VORWORT Die für die Übersetzung von Programmiersprachen benötigten Analyse- und Synthesealgorithmen können bereits seit geraumer Zeit relativ gut sprachunabhängig formuliert werden. Dies findet seinen Ausdruck unter anderem in einer Vielzahl von Generatoren, die den Übersetzungsproze? ganz oder teilweise automatisieren lassen. Die Syntax der zu verarbeitenden Sprache steht gewöhnlich in Datenform (Graphen, Listen) auf der Basis formaler Beschreibungsmittel (z.B. BNF) zur Verfügung. Im Bereich der Übersetzung natürlicher Sprachen ist die Trennung von Sprache und Verarbeitungsalgorithmen - wenn überhaupt - erst ansatzweise vollzogen. Die Gründe liegen auf der Hand. Natürliche Sprachen sind mächtiger, ihre formale Darstellung schwierig. Soll die Übersetzung auch die mündliche Kommunikation umfassen, d.h. den menschlichen Dolmetscher auf einer internationalen Konferenz oder beim Telefonieren mit einem Partner, der eine andere Sprache spricht, ersetzen, kommen Echtzeitanforderungen dazu, die dazu zwingen werden, hochparallele Ansätze zu verfolgen. Der Prozess der Übersetzung ist auch dann, wenn keine Echtzeiterforderungen vorliegen, außerordentlich komplex. Lösungen werden mit Hilfe des Interlingua- und des Transferansatzes gesucht. Verstärkt werden dabei formale Beschreibungsmittel realtiv gut erforschter Teilgebiete der Informatik eingesetzt (Operationen über dekorierten Bäumen, Baum-zu-Baum-Übersetzungsstrategien), von denen man hofft, daß die Ergebnisse weiter führen werden als spektakuläre Prototypen, die sich jetzt schon am Markt befinden und oft aus heuristischen Ansätzen abgeleitet sind. [...] maschinelle Übersetzung natürliche Sprachen Programmiersprachen Syntaxanalyse translation natural language programming natural language processing ddc:004 rvk:SS 5514
773	A constraint-based hypergraph partitioning approach to coreference resolution Sapena Masip, Emili 16 May 2012 (has links) The objectives of this thesis are focused on research in machine learning for coreference resolution. Coreference resolution is a natural language processing task that consists of determining the expressions in a discourse that mention or refer to the same entity. The main contributions of this thesis are (i) a new approach to coreference resolution based on constraint satisfaction, using a hypergraph to represent the problem and solving it by relaxation labeling; and (ii) research towards improving coreference resolution performance using world knowledge extracted from Wikipedia. The developed approach is able to use entity-mention classi cation model with more expressiveness than the pair-based ones, and overcome the weaknesses of previous approaches in the state of the art such as linking contradictions, classi cations without context and lack of information evaluating pairs. Furthermore, the approach allows the incorporation of new information by adding constraints, and a research has been done in order to use world knowledge to improve performances. RelaxCor, the implementation of the approach, achieved results in the state of the art, and participated in international competitions: SemEval-2010 and CoNLL-2011. RelaxCor achieved second position in CoNLL-2011. / La resolució de correferències és una tasca de processament del llenguatge natural que consisteix en determinar les expressions d'un discurs que es refereixen a la mateixa entitat del mon real. La tasca té un efecte directe en la minería de textos així com en moltes tasques de llenguatge natural que requereixin interpretació del discurs com resumidors, responedors de preguntes o traducció automàtica. Resoldre les correferències és essencial si es vol poder “entendre” un text o un discurs. Els objectius d'aquesta tesi es centren en la recerca en resolució de correferències amb aprenentatge automàtic. Concretament, els objectius de la recerca es centren en els següents camps: + Models de classificació: Els models de classificació més comuns a l'estat de l'art estan basats en la classificació independent de parelles de mencions. Més recentment han aparegut models que classifiquen grups de mencions. Un dels objectius de la tesi és incorporar el model entity-mention a l'aproximació desenvolupada. + Representació del problema: Encara no hi ha una representació definitiva del problema. En aquesta tesi es presenta una representació en hypergraf. + Algorismes de resolució. Depenent de la representació del problema i del model de classificació, els algorismes de ressolució poden ser molt diversos. Un dels objectius d'aquesta tesi és trobar un algorisme de resolució capaç d'utilitzar els models de classificació en la representació d'hypergraf. + Representació del coneixement: Per poder administrar coneixement de diverses fonts, cal una representació simbòlica i expressiva d'aquest coneixement. En aquesta tesi es proposa l'ús de restriccions. + Incorporació de coneixement del mon: Algunes correferències no es poden resoldre només amb informació lingüística. Sovint cal sentit comú i coneixement del mon per poder resoldre coreferències. En aquesta tesi es proposa un mètode per extreure coneixement del mon de Wikipedia i incorporar-lo al sistem de resolució. Les contribucions principals d'aquesta tesi son (i) una nova aproximació al problema de resolució de correferències basada en satisfacció de restriccions, fent servir un hypergraf per representar el problema, i resolent-ho amb l'algorisme relaxation labeling; i (ii) una recerca per millorar els resultats afegint informació del mon extreta de la Wikipedia. L'aproximació presentada pot fer servir els models mention-pair i entity-mention de forma combinada evitant així els problemes que es troben moltes altres aproximacions de l'estat de l'art com per exemple: contradiccions de classificacions independents, falta de context i falta d'informació. A més a més, l'aproximació presentada permet incorporar informació afegint restriccions i s'ha fet recerca per aconseguir afegir informació del mon que millori els resultats. RelaxCor, el sistema que ha estat implementat durant la tesi per experimentar amb l'aproximació proposada, ha aconseguit uns resultats comparables als millors que hi ha a l'estat de l'art. S'ha participat a les competicions internacionals SemEval-2010 i CoNLL-2011. RelaxCor va obtenir la segona posició al CoNLL-2010. coreference resolution hypergraph partitioning constraint satisfaction relaxation labeling natural language processing artificial intelligence machine learning information extraction 004
774	Identifying Architectural Concerns From Non-functional Requirements Using Support Vector Machine Gokyer, Gokhan 01 August 2008 (has links) (PDF) There has been no commonsense on how to identify problem domain concerns in architectural modeling of software systems. Even, there is no commonly accepted method for modeling the Non-Functional Requirements (NFRs) effectively associated with the architectural aspects in the solution domain. This thesis introduces the use of a Machine Learning (ML) method based on Support Vector Machines to relate NFRs to classified &quot / architectural concerns&quot / in an automated way. This method uses Natural Language Processing techniques to fragment the plain NFR texts under the supervision of domain experts. The contribution of this approach lies in continuously applying ML techniques against previously discovered &ldquo / NFR - architectural concerns&rdquo / associations to improve the intelligence of repositories for requirements engineering. The study illustrates a charted roadmap and demonstrates the automated requirements engineering toolset for this roadmap. It also validates the approach and effectiveness of the toolset on the snapshot of a real-life project. QA Computer Software 76.75-76.765
775	Processing Turkish Radiology Reports Hadimli, Kerem 01 May 2011 (has links) (PDF) Radiology departments utilize various visualization techniques of patients&rsquo / bodies, and narrative free text reports describing the findings in these visualizations are written by medical doctors. The information within these narrative reports is required to be extracted for medical information systems. Turkish is an highly agglutinative language and this poses problems in information retrieval and extraction from Turkish free texts. In this thesis one rule-based and one data-driven alternate methods for information retrieval and structured information extraction from Turkish radiology reports are presented. Contrary to previous studies in medical NLP systems, both of these methods do not utilize any medical lexicon or ontology. Information extraction is performed on the level of extracting medically related phrases from the sentence. The aim is to measure baseline performance Turkish language can provide for medical information extraction and retrieval, in isolation of other factors.
776	Ensembles of Semantic Spaces : On Combining Models of Distributional Semantics with Applications in Healthcare Henriksson, Aron January 2015 (has links) Distributional semantics allows models of linguistic meaning to be derived from observations of language use in large amounts of text. By modeling the meaning of words in semantic (vector) space on the basis of co-occurrence information, distributional semantics permits a quantitative interpretation of (relative) word meaning in an unsupervised setting, i.e., human annotations are not required. The ability to obtain inexpensive word representations in this manner helps to alleviate the bottleneck of fully supervised approaches to natural language processing, especially since models of distributional semantics are data-driven and hence agnostic to both language and domain. All that is required to obtain distributed word representations is a sizeable corpus; however, the composition of the semantic space is not only affected by the underlying data but also by certain model hyperparameters. While these can be optimized for a specific downstream task, there are currently limitations to the extent the many aspects of semantics can be captured in a single model. This dissertation investigates the possibility of capturing multiple aspects of lexical semantics by adopting the ensemble methodology within a distributional semantic framework to create ensembles of semantic spaces. To that end, various strategies for creating the constituent semantic spaces, as well as for combining them, are explored in a number of studies. The notion of semantic space ensembles is generalizable across languages and domains; however, the use of unsupervised methods is particularly valuable in low-resource settings, in particular when annotated corpora are scarce, as in the domain of Swedish healthcare. The semantic space ensembles are here empirically evaluated for tasks that have promising applications in healthcare. It is shown that semantic space ensembles – created by exploiting various corpora and data types, as well as by adjusting model hyperparameters such as the size of the context window and the strategy for handling word order within the context window – are able to outperform the use of any single constituent model on a range of tasks. The semantic space ensembles are used both directly for k-nearest neighbors retrieval and for semi-supervised machine learning. Applying semantic space ensembles to important medical problems facilitates the secondary use of healthcare data, which, despite its abundance and transformative potential, is grossly underutilized. / <p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 4 and 5: Unpublished conference papers.</p> / High-Performance Data Mining for Drug Effect Detection natural language processing machine learning distributional semantics ensemble learning semantic space ensembles medical informatics electronic health records
777	Automatic lemmatisation for Afrikaans / by Hendrik J. Groenewald Groenewald, Hendrik Johannes January 2006 (has links) A lemmatiser is an important component of various human language technology applicalions for any language. At present, a rule-based le~nmatiserf or Afrikaans already exists, but this lermrlatiser produces disappoinringly low accuracy figures. The performimce of the current lemmatiser serves as motivation for developing another lemmatiser based on an alternative approach than language-specific rules. The alternalive method of lemmatiser corlstruction investigated in this study is memory-based learning. Thus, in this research project we develop an automatic lemmatiser for Afrikaans called Liu "Le~?rnru-idc~)~rifisv~ir'e Arfdr(i~ku~u-n s" 'hmmatiser for Afrikaans'. In order to construct Liu, thc following research objectives are sel: i) to define the classes for Afrikaans lemmatisation, ii) to determine the influence of data size and various feature options on the performance of I h , iii) to uutomalically determine the algorithm and parameters settings that deliver the best performancc in Lcrms of linguistic accuracy, execution time and memory usage. In order to achieve the first objective, we investigate the processes of inflecrion and derivation in Afrikaans, since automatic lemmatisation requires a clear distinction between inflection and derivation. We proceed to define the inflectional calegories for Afrikaans, which represent a number of affixes that should be removed from word-forms during lemmatisation. The classes for automatic lemmatisation in Afrikaans are derived from these affixes. It is subsequently shown that accuracy as well as memory usagc and execution lime increase as the amount of training dala is increased and that Ihe various feature options bave a significant effect on the performance of Lia. The algorithmic parameters and data representation that deliver the best results are determincd by the use of I'Senrck, a programme that implements Wrapped Progre~sive Sampling in order determine a set of possibly optimal algorithmic parameters for each of the TiMBL classification algorithms. Aulornaric Lcmlnalisa~ionf or Afrikaans - - Evaluation indicates that an accuracy figure of 92,896 is obtained when training Lia with the best performing parameters for the IB1 algorithm on feature-aligned data with 20 features. This result indicates that memory-based learning is indeed more suitable than rule-based methods for Afrikaans lenlmatiser construction. / Thesis (M.Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2007. Lemmatisation Machine learning Memory-based learning Human language technology Natural language processing Computer engineering TIMBL Afrikaans Morphology
778	Outomatiese Afrikaanse tekseenheididentifisering / deur Martin J. Puttkammer Puttkammer, Martin Johannes January 2006 (has links) An important core technology in the development of human language technology applications is an automatic morphological analyser. Such a morphological analyser consists of various modules, one of which is a tokeniser. At present no tokeniser exists for Afrikaans and it has therefore been impossible to develop a morphological analyser for Afrikaans. Thus, in this research project such a tokeniser is being developed, and the project therefore has two objectives: i)to postulate a tag set for integrated tokenisation, and ii) to develop an algorithm for integrated tokenisation. In order to achieve the first object, a tag set for the tagging of sentences, named-entities, words, abbreviations and punctuation is proposed specifically for the annotation of Afrikaans texts. It consists of 51 tags, which can be expanded in future in order to establish a larger, more specific tag set. The postulated tag set can also be simplified according to the level of specificity required by the user. It is subsequently shown that an effective tokeniser cannot be developed using only linguistic, or only statistical methods. This is due to the complexity of the task: rule-based modules should be used for certain processes (for example sentence recognition), while other processes (for example named-entity recognition) can only be executed successfully by means of a machine-learning module. It is argued that a hybrid system (a system where rule-based and statistical components are integrated) would achieve the best results on Afrikaans tokenisation. Various rule-based and statistical techniques, including a TiMBL-based classifier, are then employed to develop such a hybrid tokeniser for Afrikaans. The final tokeniser achieves an ∫-score of 97.25% when the complete set of tags is used. For sentence recognition an ∫-score of 100% is achieved. The tokeniser also recognises 81.39% of named entities. When a simplified tag set (consisting of only 12 tags) is used to annotate named entities, the ∫-score rises to 94.74%. The conclusion of the study is that a hybrid approach is indeed suitable for Afrikaans sentencisation, named-entity recognition and tokenisation. The tokeniser will improve if it is trained with more data, while the expansion of gazetteers as well as the tag set will also lead to a more accurate system / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2006. Afrikaans Tokenisation Sentence recognition Named-entity recognition Sentence Named entity Word Morphological analysis Natural language processing Computational linguistics TIMBL
779	Answering complex questions : supervised approaches Sadid-Al-Hasan, Sheikh, University of Lethbridge. Faculty of Arts and Science January 2009 (has links) The term “Google” has become a verb for most of us. Search engines, however, have certain limitations. For example ask it for the impact of the current global financial crisis in different parts of the world, and you can expect to sift through thousands of results for the answer. This motivates the research in complex question answering where the purpose is to create summaries of large volumes of information as answers to complex questions, rather than simply offering a listing of sources. Unlike simple questions, complex questions cannot be answered easily as they often require inferencing and synthesizing information from multiple documents. Hence, this task is accomplished by the query-focused multidocument summarization systems. In this thesis we apply different supervised learning techniques to confront the complex question answering problem. To run our experiments, we consider the DUC-2007 main task. A huge amount of labeled data is a prerequisite for supervised training. It is expensive and time consuming when humans perform the labeling task manually. Automatic labeling can be a good remedy to this problem. We employ five different automatic annotation techniques to build extracts from human abstracts using ROUGE, Basic Element (BE) overlap, syntactic similarity measure, semantic similarity measure and Extended String Subsequence Kernel (ESSK). The representative supervised methods we use are Support Vector Machines (SVM), Conditional Random Fields (CRF), Hidden Markov Models (HMM) and Maximum Entropy (MaxEnt). We annotate DUC-2006 data and use them to train our systems, whereas 25 topics of DUC-2007 data set are used as test data. The evaluation results reveal the impact of automatic labeling methods on the performance of the supervised approaches to complex question answering. We also experiment with two ensemble-based approaches that show promising results for this problem domain. / x, 108 leaves : ill. ; 29 cm Supervised learning (Machine learning) Semantic computing Computational linguistics Information retrieval Dissertations, Academic
780	Leveraging supplementary transcriptions and transliterations via re-ranking Bhargava, Aditya Unknown Date No description available. Natural language processing Computational linguistics Grapheme-to-phoneme conversion Machine transliteration Transliteration SVM re-ranking Supplemental data

Search results