Global ETD Search

981	Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approach Yang, Seungwon 22 January 2014 (has links) Identifying topics of a textual document is useful for many purposes. We can organize the documents by topics in digital libraries. Then, we could browse and search for the documents with specific topics. By examining the topics of a document, we can quickly understand what the document is about. To augment the traditional manual way of topic tagging tasks, which is labor-intensive, solutions using computers have been developed. This dissertation describes the design and development of a topic identification approach, in this case applied to disaster events. In a sense, this study represents the marriage of research analysis with an engineering effort in that it combines inspiration from Cognitive Informatics with a practical model from Information Retrieval. One of the design constraints, however, is that the Web was used as a universal knowledge source, which was essential in accessing the required information for inferring topics from texts. Retrieving specific information of interest from such a vast information source was achieved by querying a search engine's application programming interface. Specifically, the information gathered was processed mainly by incorporating the Vector Space Model from the Information Retrieval field. As a proof of concept, we subsequently developed and evaluated a prototype tool, Xpantrac, which is able to run in a batch mode to automatically process text documents. A user interface of Xpantrac also was constructed to support an interactive semi-automatic topic tagging application, which was subsequently assessed via a usability study. Throughout the design, development, and evaluation of these various study components, we detail how the hypotheses and research questions of this dissertation have been supported and answered. We also present that our overarching goal, which was the identification of topics in a human-comparable way without depending on a large training set or a corpus, has been achieved. / Ph. D. topic identification tagging cognitive informatics vector space model knowledge sources natural language processing digital libraries usability study
982	Natural language processing (NLP) in Artificial Intelligence (AI): a functional linguistic perspective Panesar, Kulvinder 07 October 2020 (has links) Yes / This chapter encapsulates the multi-disciplinary nature that facilitates NLP in AI and reports on a linguistically orientated conversational software agent (CSA) (Panesar 2017) framework sensitive to natural language processing (NLP), language in the agent environment. We present a novel computational approach of using the functional linguistic theory of Role and Reference Grammar (RRG) as the linguistic engine. Viewing language as action, utterances change the state of the world, and hence speakers and hearer’s mental state change as a result of these utterances. The plan-based method of discourse management (DM) using the BDI model architecture is deployed, to support a greater complexity of conversation. This CSA investigates the integration, intersection and interface of the language, knowledge, speech act constructions (SAC) as a grammatical object, and the sub-model of BDI and DM for NLP. We present an investigation into the intersection and interface between our linguistic and knowledge (belief base) models for both dialogue management and planning. The architecture has three-phase models: (1) a linguistic model based on RRG; (2) Agent Cognitive Model (ACM) with (a) knowledge representation model employing conceptual graphs (CGs) serialised to Resource Description Framework (RDF); (b) a planning model underpinned by BDI concepts and intentionality and rational interaction; and (3) a dialogue model employing common ground. Use of RRG as a linguistic engine for the CSA was successful. We identify the complexity of the semantic gap of internal representations with details of a conceptual bridging solution. Natural language processing (NLP) Artificial Intelligence (AI) Conversational software agent (CSA)
983	Retrieving Definitions from Scientific Text in the Salmon Fish Domain by Lexical Pattern Matching Gabbay, Igal 01 1900 (has links) While an information retrieval system takes as input a user query and returns a list of relevant documents chosen from a large collection, a question answering system attempts to produce an exact answer. Recent research, motivated by the question answering track of the Text REtrieval Conference (TREC) has focused mainly on answering ‘factoid’ questions concerned with names, places, dates etc. in the news domain. However, questions seeking definitions of terms are common in the logs of search engines. The objective of this project was therefore to investigate methods of retrieving definitions from scientific documents. The subject domain was salmon, and an appropriate test collection of articles was created, pre-processed and indexed. Relevant terms were obtained from salmon researchers and a fish database. A system was built which accepted a term as input, retrieved relevant documents from the collection using a search engine, identified definition phrases within them using a vocabulary of syntactic patterns and associated heuristics, and produced as output phrases explaining the term. Four experiments were carried out which progressively extended and refined the patterns. The performance of the system, measured using an appropriate form of precision, improved over the experiments from 8.6% to 63.6%. The main findings of the research were: (1) Definitions were diverse despite the documents’ homogeneity and found not only in the Introduction and Abstract sections but also in the Methods and References; (2) Nevertheless, syntactic patterns were a useful starting point in extracting them; (3) Three patterns accounted for 90% of candidate phrases; (4) Statistically, the ordinal number of the instance of the term in a document was a better indicator of the presence of a definition than either sentence position and length, or the number of sentences in the document. Next steps include classifying terms, using information extraction-like templates, resolving basic anaphors, ranking answers, exploiting the structure of scientific papers, and refining the evaluation process. question answering definition questions salmon definitional questions computational linguistics natural language processing QH301 QA75 P1 Q1 AI
984	Identifying Sensitive Data using Named Entity Recognition with Large Language Models : A comparison of transformer models fine-tuned for Named Entity Recognition Ström Boman, Alfred January 2024 (has links) Utvecklingen av artificiell intelligens och språkmodeller har ökat drastiskt under de senaste åren vilket medfört både möjligheter såväl som risker. Med en större användning av AI-relaterade produkter och människolika chattbotar har det medfört ett intresse av att kontrollera vilken sorts data som delas med dessa verktyg. Under särskilda omständigheter kan det förekomma data som till exempel information relaterat till personer, som inte får delas. Detta projekt har av denna anledning kretsat kring att använda och jämföra olika system för automatisk namnigenkänning, med målet att förhindra sådan data från att bli delad. I projektet jämfördes tre stycken olika alternativ för att implementera system för namnigenkänning, innan det mest lämpliga alternativet valdes för implementationen. Fortsättningsvis användes de tre förtränade transformer-modellerna GPT-SW3, TinyLlama och Mistral för implementationen där dessa tre blev finjusterade på två olika dataset. Implementationsfasen involverade applicering av tekniker för att öka datastorleken, databearbetning samt modellkvantisering innan de finjusterades för namnigenkänning. En uppsättning av utvärderingsmått bestående av bland annat F1-mått användes därefter för att mäta de tränade modellernas prestanda. De tre modellerna utvärderades och jämfördes med varandra utifrån resultatet från mätningen och träningen. Modellerna uppvisade varierande resultat och prestanda där både över- och underanpassning förekom. Avslutningsvis drogs slutsatsen om att TinyLlama var den bäst presterande modellen utifrån resultatet och övriga kringliggande aspekter. / The development of artificial intelligence and large language models has increased rapidly in recent years, bringing both opportunities and risks. With a broader use of AI related products such as human-like chatbots there has been an increase in interest in controlling the data that is being shared with them. In some scenarios there is data, such as personal or proprietary information, which should not be shared. This project has therefore revolved around utilizing and comparing different Named Entity Recognition systems to prevent such data from being shared. Three different approaches to implement Named Entity Recognition systems were compared before selecting the most appropriate one to further use for the actual implementation. Furthermore, three pre-trained transformer models, GPT-SW3, TinyLlama and Mistral, were used for the implementation where these were fine-tuned on two different datasets. The implementation phase included applying data augmentation techniques, data processing and model quantization before fine-tuning the models on Named Entity Recognition. A set of metrics including precision, recall and F1-score was further used to measure the performances of the trained models. The three models were compared and evaluated against each other based on the results obtained from the measurements and the training. The models showed varying results and performances where both overfitting and underfitting occured. Finally, the TinyLlama model was concluded to be the best model based on the obtained results and other considered aspects. Named Entity Recognition Natural Language Processing Machine Learning Fine-tuning. Namnigenkänning Språkteknologi Maskininlärning Finjustering Software Engineering Programvaruteknik
985	NLP-baserad kravhantering: möjligheter och utmaningar : En kvalitativ undersökning / NLP-based requirements management: opportunities and challenges : A qualitative study Blystedt, Theo, Sandberg, Albin January 2024 (has links) Denna studie utforskar det växande området för naturlig språkbehandling (NLP) och dess tillämpning inom kravhantering, ett kritiskt område i mjukvaruutveckling för att säkerställa att system uppfyller uppsatta standarder och användarförväntningar. Komplexiteten i moderna IT-projekt har ökat efterfrågan på effektiv kravhantering. Trots omfattande studier inom NLP finns det brist på fokuserad forskning om dess specifika möjligheter och utmaningar inom ett företags- och verksamhetsperspektiv för att förbättra processerna inom kravhantering. Studien utgår från en kvalitativ metod genom semistrukturerade intervjuer med respondenter inom kravhantering och AI för att få djupgående insikter i praktiska implikationer av NLP inom kravhantering. Genom en tematisk analys på den data som samlades in genom intervjuerna togs fem olika teman fram som var relevant för forskningsfrågorna. Tillsammans med detta genomförs även en litteratursökning som syftar att ge förståelse över insikter och kunskap utifrån relevant forskning. Resultatet som framförs utifrån intervjuerna jämfördes sedan med artiklarna i litteratursökningen. Resultatet visar att NLP har potentialen att effektivisera hanteringen av krav, men medför också betydande utmaningar och komplexitet. Teknikens förmåga att hantera stora datamängder och automatisera extraktion och tolkning av krav kan avsevärt påskynda projektets tidiga skeden. Tidig implementering låter organisationer att snabbt anpassa och identifiera krav baserat på föränderliga omständigheter och insikter. Specifikt så har generativa modeller, så som BERT, hög potential inom kravhanteringsfältet på grund av dess höga effektivitet jämfört med traditionella NLP-modeller. Dock är de största utmaningarna kopplade till risker inom säkerhet och sekretess då NLP-system ofta bearbetar stora mängder textdata som kan innehålla känslig eller konfidentiell information Tillförlitlighet är även en utmaning då systemen måste hantera språklig otydlighet och kontextberoendetolkningar utan att förlora noggrannhet. Kvalitén och mängden träningsdata är även en utmaning på grund av dess direkta påverkan på prestandan och effektiviteten av modellen. Utmaningarna och möjligheterna som denna studie presenterar kan hjälpa verksamheter och företag att implementera NLP-teknologier i kravhanteringsprocesser. / This thesis explores the evolving field of Natural Language Processing (NLP) and its application in requirement management, a critical area in software development ensuring that systems meet set standards and user expectations. The complexity of modern IT projects has heightened the demand for effective requirements management. Despite extensive studies on NLP, there is a lack of focused research on its specific opportunities and challenges from a company and business perspective regarding requirement management processes. This study adopts a qualitative approach through semi-structured interviews with respondents in the requirement management and AI field, to gain deep insights into the practical implications of NLP in requirements management. The study uses a thematic analysis to analyze the data gathered from the interviews and produce themes which are relevant to the research questions. The study also conducts a literature search to gain scientific insight, which will be used to compare the results from the interviews. The findings reveal that NLP has promising potential to streamline information handling and requirement interpretation, but also introduces significant risks and complexities. The technology's ability to process large data volumes and automate requirement extraction and interpretation can significantly speed up project stages. Early implementation allows organizations to swiftly adjust, and pinpoint requirements based on changing circumstances and insights. There is also a lot of potential regarding generative models, such as BERT, in the requirement management field due to its extreme efficiency compared to traditional NLP-models. However, major challenges include risks regarding security and secrecy due to the sensitive and confidential information which the NLP-system handles. Additionally, reliability remains a challenge as these systems must handle linguistic ambiguities and context-dependent interpretations without losing accuracy. The quality and the amount of training data regarding the NLP-models also is a major challenge due to its direct impact of the model’s performance and efficiency. The challenges and opportunities in this study can help organizations and businesses in adapting NLP-technologies into their requirement management processes. Artificial intelligence Natural language processing Requirement engineering Security Artificiell intelligens Kravteknik Naturlig språkbehandling Säkerhet Information Systems
986	A Comparative Analysis of Seven Translations of Dante’s Inferno into Japanese Hast, Anders January 2024 (has links) The Inferno from La Commedia, by Dante Alighieri has been translated into Japanese about a dozen times in the past 110 years. In this comparative analysis, seven of those are analysed and compared with regards to how cultural terms were translated and to what degree the different translators tried to make a word-by word translation for certain selected passages. Nine such passages, with a total of 93 lines were chosen. About 60 words were analysed per translator and the main goal was to determine whether if they have a tendency towards foreignisation or domestication. A Natural Language Processing analysis was also conducted to assess the similarity of translations in terms of word usage. Besides the oldest translation, which is quite different when it comes to both word usage and grammar and is therefore considered an outlier, two main groups emerge. One that tends to translate more word-by-word than the other, which is freer. All in the first group imitate the Italian pronunciation of cultural terms using katakana, while most in the latter prefer the current Japanese term. In between these groups appears one translator, which is more consequently following the change of words in a similar way like Dante did when referring to Virgilio, while others tend to use the same word several times. Dante Alighieri Japanese Translation Poetry Translation Comparative Analysis Foreignisation Domestication Cultural Terms Natural Language Processing Languages and Literature Språk och litteratur
987	On improving natural language processing through phrase-based and one-to-one syntactic algorithms Meyer, Christopher Henry January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William H. Hsu / Machine Translation (MT) is the practice of using computational methods to convert words from one natural language to another. Several approaches have been created since MT’s inception in the 1950s and, with the vast increase in computational resources since then, have continued to evolve and improve. In this thesis I summarize several branches of MT theory and introduce several newly developed software applications, several parsing techniques to improve Japanese-to-English text translation, and a new key algorithm to correct translation errors when converting from Japanese kanji to English. The overall translation improvement is measured using the BLEU metric (an objective, numerical standard in Machine Translation quality analysis). The baseline translation system was built by combining Giza++, the Thot Phrase-Based SMT toolkit, the SRILM toolkit, and the Pharaoh decoder. The input and output parsing applications were created as intermediary to improve the baseline MT system as to eliminate artificially high improvement metrics. This baseline was measured with and without the additional parsing provided by the thesis software applications, and also with and without the thesis kanji correction utility. The new algorithm corrected for many contextual definition mistakes that are common when converting from Japanese to English text. By training the new kanji correction utility on an existing dictionary, identifying source text in Japanese with a high number of possible translations, and checking the baseline translation against other translation possibilities; I was able to increase the translation performance of the baseline system from minimum normalized BKEU scores of .0273 to maximum normalized scores of .081. The preliminary phase of making improvements to Japanese-to-English translation focused on correcting segmentation mistakes that occur when attempting to parse Japanese text into meaningful tokens. The initial increase is not indicative of future potential and is artificially high as the baseline score was so low to begin with, but was needed to create a reasonable baseline score. The final results of the tests confirmed that a significant, measurable improvement had been achieved through improving the initial segmentation of the Japanese text through parsing the input corpora and through correcting kanji translations after the Pharaoh decoding process had completed. Artificial Intelligence Natural language processing Japanese Machine translation Contextual syntax Phrase-based translation Artificial Intelligence (0800) Computer Science (0984) Language, Modern (0291)
988	Time Dynamic Topic Models Jähnichen, Patrick 30 March 2016 (has links) (PDF) Information extraction from large corpora can be a useful tool for many applications in industry and academia. For instance, political communication science has just recently begun to use the opportunities that come with the availability of massive amounts of information available through the Internet and the computational tools that natural language processing can provide. We give a linguistically motivated interpretation of topic modeling, a state-of-the-art algorithm for extracting latent semantic sets of words from large text corpora, and extend this interpretation to cover issues and issue-cycles as theoretical constructs coming from political communication science. We build on a dynamic topic model, a model whose semantic sets of words are allowed to evolve over time governed by a Brownian motion stochastic process and apply a new form of analysis to its result. Generally this analysis is based on the notion of volatility as in the rate of change of stocks or derivatives known from econometrics. We claim that the rate of change of sets of semantically related words can be interpreted as issue-cycles, the word sets as describing the underlying issue. Generalizing over the existing work, we introduce dynamic topic models that are driven by general (Brownian motion is a special case of our model) Gaussian processes, a family of stochastic processes defined by the function that determines their covariance structure. We use the above assumption and apply a certain class of covariance functions to allow for an appropriate rate of change in word sets while preserving the semantic relatedness among words. Applying our findings to a large newspaper data set, the New York Times Annotated corpus (all articles between 1987 and 2007), we are able to identify sub-topics in time, \\\\textit{time-localized topics} and find patterns in their behavior over time. However, we have to drop the assumption of semantic relatedness over all available time for any one topic. Time-localized topics are consistent in themselves but do not necessarily share semantic meaning between each other. They can, however, be interpreted to capture the notion of issues and their behavior that of issue-cycles. Topic Modelle maschinelles Lernen Bayes Modelle Automatische Sprachverarbeitung Topic Models Machine Learning Bayesian Models Time Series Analysis Natural Language Processing ddc:500
989	Developing an enriched natural language grammar for prosodically-improved concent-to-speech synthesis Marais, Laurette 04 1900 (has links) The need for interacting with machines using spoken natural language is growing, along with the expectation that synthetic speech in this context sound natural. Such interaction includes answering questions, where prosody plays an important role in producing natural English synthetic speech by communicating the information structure of utterances. CCG is a theoretical framework that exploits the notion that, in English, information structure, prosodic structure and syntactic structure are isomorphic. This provides a way to convert a semantic representation of an utterance into a prosodically natural spoken utterance. GF is a framework for writing grammars, where abstract tree structures capture the semantic structure and concrete grammars render these structures in linearised strings. This research combines these frameworks to develop a system that converts semantic representations of utterances into linearised strings of natural language that are marked up to inform the prosody-generating component of a speech synthesis system. / Computing / M. Sc. (Computing) GF CCG Prosody Intonation Speech synthesis Concept-to-speech Information structure Syntax Question-answering Spoken natural language 006.54 Speech synthesis Computational linguistics
990	An embodied conversational agent with autistic behaviour Venter, Wessel Johannes 03 1900 (has links) Thesis (MSc)--Stellenbosch University, 2012. / ENGLISH ABSTRACT: In this thesis we describe the creation of an embodied conversational agent which exhibits the behavioural traits of a child who has Asperger Syndrome. The agent is rule-based, rather than arti cially intelligent, for which we give justi cation. We then describe the design and implementation of the agent, and pay particular attention to the interaction between emotion, personality and social context. A 3D demonstration program shows the typical output to conform to Asperger-like answers, with corresponding emotional responses. / AFRIKAANSE OPSOMMING: In hierdie tesis beskryf ons die ontwerp en implementasie van 'n gestaltegespreksagent wat die gedrag van 'n kind met Asperger se sindroom uitbeeld. Ons regverdig die besluit dat die agent reël-gebaseerd is, eerder as 'n ware skynintelligensie implementasie. Volgende beskryf ons die wisselwerking tussen emosies, persoonlikheid en sosiale konteks en hoe dit inskakel by die ontwerp en implementasie van die agent. 'n 3D demonstrasieprogram toon tipiese ooreenstemmende Asperger-agtige antwoorde op vrae, met gepaardgaande emosionele reaksies. Embodied conversational agent Intelligent agents (Computer software) Natural language processing Dissertations -- Computer science Theses -- Computer science Dissertations -- Mathematics Theses -- Mathematics Mathematical Sciences

Search results