Global ETD Search

61	Multilingual Transformer Models for Maltese Named Entity Recognition Farrugia, Kris January 2022 (has links) The recently developed state-of-the-art models for Named Entity Recognition are heavily dependent upon huge amounts of available annotated data. Consequently, it is extremely challenging for data-scarce languages to obtain significant result. Several approaches have been proposed to circumvent this issue, including cross-lingual transfer learning, which is the leveraging of knowledge obtained by available resources in the source language and transfer it to a target low-resource language. Maltese is one of the many majorly underresourced languages. The main purpose of this project is to research how recently developed transformer multilingual models (Multilingual BERT and XLM-RoBERTa) perform and to ultimately set up an evaluation benchmark in zero-shot cross-lingual transfer learning for Maltese Named Entity Recognition. The models are fine-tuned on Arabic, English, Italian, Spanish and Dutch. The experiments evaluated the efficacy of the source languages and the use of multilingual data in both the training and validation stages. The experiments demonstrated that feeding multilingual data to both the training and the validation phases was mostly beneficial to the performance. However, adding it to the validation phase only was generally detrimental. Furthermore, XLM-R achieved overall better scores however, employing mBERT and English as the source language yielded the best performance. low-resource named-entity information extraction Maltese
62	A Master's thesis consisting of I. Acting book for the role of Blanche Dubois in A Streetcar Named Desire; II. Production log book for the role of Madeleine in Victims of Duty Modyman, Linda January 1962 (has links) Thesis (M.F.A.)--Boston University A Streetcar Named Desire Victims of duty Dubois, Blanche Drama Acting books Production books
63	Information Extraction of Technical Details From Scholarly Articles Kaushal, Kulendra Kumar 16 June 2021 (has links) Researchers have made significant progress in information extraction from short documents in the last few years, including social media interaction, news articles, and email excerpts. This research aims to extract technical entities like hardware resources, computing platforms, compute time, programming language, and libraries from scholarly research articles. Research articles are generally long documents having both salient as well as non-salient entities. Analyzing the cross-sectional relation, filtering the relevant information, measuring the saliency of mentioned entities, and extracting novel entities are some of the technical challenges involved in this research. This work presents a detailed study about the performance, effectiveness, and scalability of rule-based weakly supervised algorithms. We also develop an automated end-to-end Research Entity and Relationship Extractor (E2R Extractor). Additionally, we perform a comprehensive study about the effectiveness of existing deep learning-based information extraction tools like Dygie, Dygie++, SciREX. The research also contributes a dataset containing novel entities annotated in BILUO format and represents the baseline results using the E2R extractor on the proposed dataset. The results indicate that the E2R extractor successfully extracts salient entities from research articles. / Master of Science / Information extraction is a process of automatically extracting meaningful information from unstructured text such as articles, news feeds and presenting it in a structured format. Researchers have made significant progress in this domain over the past few years. However, their work primarily focuses on short documents such as social media interactions, news articles, email excerpts, and not on long documents such as scholarly articles and research papers. Long documents contain a lot of redundant data, so filtering and extracting meaningful information is quite challenging. This work focuses on extracting entities such as hardware resources, compute platforms, and programming languages used in scholarly articles. We present a deep learning-based model to extract such entities from research articles and research papers. We evaluate the performance of our deep learning model against simple rule-based algorithms and other state-of-the-art models for extracting the desired entities. Our work also contributes a labeled dataset containing the entities mentioned above and results obtained on this dataset using our deep learning model. Information Extraction Long Documents Research Articles Named Entity Recognition Hardware Resources Compute Platform Programming Language and Libraries
64	Detection of Named Branch Origin for Git Commits Michaud, Heather M. 15 September 2015 (has links) No description available. Computer Science Git repository mining branching merging named branch commit origin software evolution commit
65	Identifying Single and Stacked News Triangles in Online News Articles - an Analysis of 31 Danish Online News Articles Annotated by 68 Journalists Njor, Miklas January 2015 (has links) While news articles for print use one News Triangle, where important information is at the top of the article, online news articles are supposed to use a series of Stacked News Triangles, due to online readers text- skimming habits[1]. To identify Stacked News Triangles presence, we analyse how 68 Danish journalists annotate 31 articles. We use keyword frequency as the measure of popularity. To explore if Named Entities influence News Triangle presence, we analyse Named Entities found in the articles and keywords.We find the presence of an overall News Triangle in 30 of 31 articles, while, for the presence of Stacked News Triangles, 14 of the 31 articles have Stacked News Triangles. For Named Entities in News Triangles we cannot see what their influences is. Nonetheless, we find difference in Named Entity Types in each category (Culture, Domestic, Economy, Sports). Online News Keyword popularity Keyword frequency Folksonomy Named Entity Engineering and Technology Teknik och teknologier
66	Hierarchical Joint Entity Recognition and Relation Extraction of Contextual Entities in Family History Records Segrera, Daniel 08 March 2023 (has links) (PDF) Entity extraction is an important step in document understanding. Higher accuracy entity extraction on fine-grained entities can be achieved by combining the utility of Named Entity Recognition (NER) and Relation Extraction (RE) models. In this paper, a cascading model is proposed that implements NER and Relation extraction. This model utilizes relations between entities to infer context-dependent fine-grain named entities in text corpora. The RE module runs independent of the NER module, which reduces error accumulation from sequential steps. This process improves on the fine-grained NER F1-score of existing state-of-the-art from .4753 to .8563 on our data, albeit on a strictly limited domain. This provides the potential for further applications in historical document processing. These applications will enable automated searching of historical documents, such as those used in economics research and family history. natural language processing named entity recognition relation extraction information extraction Physical Sciences and Mathematics
67	Direct Speech Translation Toward High-Quality, Inclusive, and Augmented Systems Gaido, Marco 28 April 2023 (has links) When this PhD started, the translation of speech into text in a different language was mainly tackled with a cascade of automatic speech recognition (ASR) and machine translation (MT) models, as the emerging direct speech translation (ST) models were not yet competitive. To close this gap, part of the PhD has been devoted to improving the quality of direct models, both in the simplified condition of test sets where the audio is split into well-formed sentences, and in the realistic condition in which the audio is automatically segmented. First, we investigated how to transfer knowledge from MT models trained on large corpora. Then, we defined encoder architectures that give different weights to the vectors in the input sequence, reflecting the variability of the amount of information over time in speech. Finally, we reduced the adverse effects caused by the suboptimal automatic audio segmentation in two ways: on one side, we created models robust to this condition; on the other, we enhanced the audio segmentation itself. The good results achieved in terms of overall translation quality allowed us to investigate specific behaviors of direct ST systems, which are crucial to satisfy real users’ needs. On one side, driven by the ethical goal of inclusive systems, we disclosed that established technical choices geared toward high general performance (statistical word segmentation of the target text, knowledge distillation from MT) cause an exacerbation of the gender representational disparities in the training data. Along this line of work, we proposed mitigation techniques that reduce the gender bias of ST models, and showed how gender-specific systems can be used to control the translation of gendered words related to the speakers, regardless of their vocal traits. On the other side, motivated by the practical needs of interpreters and translators, we evaluated the potential of direct ST systems in the “augmented translation” scenario, focusing on the translation and recognition of named entities (NEs). Along this line of work, we proposed solutions to cope with the major weakness of ST models (handling person names), and introduced direct models that jointly perform ST and NE recognition showing their superiority over a pipeline of dedicated tools for the two tasks. Overall, we believe that this thesis moves a step forward toward adopting direct ST systems in real applications, increasing the awareness of their strengths and weaknesses compared to the traditional cascade paradigm. direct speech translation audio segmentation gender bias named entities augmented translation
68	Improving Automatic Transcription Using Natural Language Processing Kiefer, Anna 01 March 2024 (has links) (PDF) Digital Democracy is a CalMatters and California Polytechnic State University initia-tive to promote transparency in state government by increasing access to the Califor-nia legislature. While Digital Democracy is made up of many resources, one founda-tional step of the project is obtaining accurate, timely transcripts of California Senateand Assembly hearings. The information extracted from these transcripts providescrucial data for subsequent steps in the pipeline. In the context of Digital Democracy,upleveling is when humans verify, correct, and annotate the transcript results afterthe legislative hearings have been automatically transcribed. The upleveling processis done with the assistance of a software application called the Transcription Tool.The human upleveling process is the most costly and time-consuming step of the Dig-ital Democracy pipeline. In this thesis, we hypothesize that we can make significantreductions to the time needed for upleveling by using Natural Language Processing(NLP) systems and techniques. The main contribution of this thesis is engineeringa new automatic transcription pipeline. Specifically, this thesis integrates a new au-tomatic speech recognition service, a new speaker diarization model, additional textpost-processing changes, and a new process for speaker identification. To evaluate the system’s improvements, we measure the accuracy and speed of the newly integrated features and record editor upleveling time both before and after the additions. Natural Language Processing Named Entity Recognition Transcription Automatic Speech Recognition Artificial Intelligence and Robotics Software Engineering
69	Using Concept Maps as a Tool for Cross-Language Relevance Determination Richardson, W. Ryan 02 August 2007 (has links) Concept maps, introduced by Novak, aid learners' understanding. I hypothesize that concept maps also can function as a summary of large documents, e.g., electronic theses and dissertations (ETDs). I have built a system that automatically generates concept maps from English-language ETDs in the computing field. The system also will provide Spanish translations of these concept maps for native Spanish speakers. Using machine translation techniques, my approach leads to concept maps that could allow researchers to discover pertinent dissertations in languages they cannot read, helping them to decide if they want a potentially relevant dissertation translated. I am using a state-of-the-art natural language processing system, called Relex, to extract noun phrases and noun-verb-noun relations from ETDs, and then produce concept maps automatically. I also have incorporated information from the table of contents of ETDs to create novel styles of concept maps. I have conducted five user studies, to evaluate user perceptions about these different map styles. I am using several methods to translate node and link text in concept maps from English to Spanish. Nodes labeled with single words from a given technical area can be translated using wordlists, but phrases in specific technical fields can be difficult to translate. Thus I have amassed a collection of about 580 Spanish-language ETDs from Scirus and two Mexican universities and I am using this corpus to mine phrase translations that I could not find otherwise. The usefulness of the automatically-generated and translated concept maps has been assessed in an experiment at Universidad de las Americas (UDLA) in Puebla, Mexico. This experiment demonstrated that concept maps can augment abstracts (translated using a standard machine translation package) in helping Spanish speaking users find ETDs of interest. / Ph. D. named entity extraction computing ontologies concept mapping cross-language information retrieval
70	Protocole de routage pour l’architecture NDN / Routing protocol for NDN architecture Aubry, Elian 19 December 2017 (has links) Parmi les architectures orientées contenu, l'architecture NDN (Named-Data Networking) a su agréger la plus importante communauté de chercheurs et est la plus aboutie pour un Internet du futur. Dans le cadre de l'architecture NDN, au cours de ce doctorat, nous nous sommes concentrés sur les mécanismes de routage adaptés à cette nouvelle vision du réseau. En effet, la capacité à acheminer une requête vers la destination est fondamentale pour qu'une architecture réseau soit fonctionnelle et cette problématique avait été très peu étudiée jusqu'alors. Ainsi, dans ce manuscrit, nous proposons le protocole de routage SRSC (SDN-based Routing Scheme for CCN/NDN), qui repose sur l'utilisation du paradigme des réseaux logiciels (Software-Defined Networks\\, SDN). SRSC utilise un contrôleur capable de gérer le plan de contrôle du réseau NDN. En centralisant l'ensemble des informations telles que la topologie du réseau, la localisation des différents contenus et le contenu des mémoires cache des nœuds du réseau, le contrôleur va pouvoir établir la meilleure route pour acheminer les requêtes vers le contenu. SRSC permet également un routage de type anycast, c'est à dire qu'il permet d'acheminer les requêtes vers le nœud le plus proche qui dispose des données, permettant d'optimiser la distribution des requêtes dans le réseau et de répartir la charge parmi tous les nœuds. De plus, SRSC utilise uniquement les messages Interest et Data de l'architecture NDN et tient son originalité du fait qu'il s'affranchit complètement de l'infrastructure TCP/IP existante. Dans un premier temps, SRSC a été évalué via simulation avec le logiciel NS-3 où nous l'avons comparé à la méthode d'inondation des requêtes, appelée flooding, initialement proposée par NDN. SRSC a ensuite été implanté dans NDNx, l'implantation open source de l'architecture NDN, puis déployé sur notre testbed utilisant la technologie Docker. Ce testbed permet de virtualiser des nœuds NDN et d'observer un réel déploiement de cette architecture réseau à large échelle. Nous avons ainsi évalué les performances de notre protocole SRSC sur notre testbed virtualisé et nous l'avons comparé au protocole NLSR, (Named-Data Link State Routing Protocol), le protocole de routage du projet NDN / Internet is a mondial content network and its use grows since several years. Content delivery such as P2P or video streaming generates the main part of the Internet traffic and Named Data Networks (NDN) appear as an appropriate architecture to satisfy the user needs. Named-Data Networking is a novel clean-slate architecture for Future Internet. It has been designed to deliver content at large scale and integrates several features such as in-network caching, security, multi-path. However, the lack of scalable routing scheme is one of the main obstacles that slow down a large deployment of NDN at an Internet-scale. As it relies on content names instead of host address, it cannot reuse the traditional routing scheme on the Internet. In this thesis, we propose to use the Software-Defined Networking (SDN) paradigm to decouple data plane and control plane and present SRSC, a new routing scheme for NDN based on SDN paradigm. Our solution is a clean-slate approach, using only NDN messages and the SDN paradigm. We implemented our solution into the NS-3 simulator and perform extensive simulations of our proposal. SRSC show better performances than the flooding scheme used by default in NDN. We also present a new NDN testbed and the implementation of our protocol SRSC, a Controlled-based Routing Scheme for NDN. We implemented SRSC into NDNx, the NDN implementation, and deployed it into a virtual environment through Docker. Our experiments demonstrate the ability of our proposal to forward Interest, while keeping a low computation time for the Controller and low delay to access Content. Moreover, we propose a solution to easily deploy and evaluate NDN network, and we compare SRSC with NLSR, the current routing protocol used in NDNx Réseaux orientés contenu Réseaux logiciel Routage Protocole Performance Named-Data Networking (NDN) ICN SDN Information centric networks Software defined Networks Routing Protocol Performance Named-Data Networking (NDN) ICN SDN 004.62

Search results