Global ETD Search

161	Knowledge-Enabled Entity Extraction Al-Olimat, Hussein S. January 2019 (has links) No description available. Artificial Intelligence Computer Science Named Entity Recognition Information Extraction Active Learning Knowledge-enabled Learning Minimal Supervision Entity Set Expansion Statistical Language Models Dictionary-based Extraction Sequence Labeling Candidate Entity Extraction
162	[en] EXTRACTING RELIABLE INFORMATION FROM LARGE COLLECTIONS OF LEGAL DECISIONS / [pt] EXTRAINDO INFORMAÇÕES CONFIÁVEIS DE GRANDES COLEÇÕES DE DECISÕES JUDICIAIS FERNANDO ALBERTO CORREIA DOS SANTOS JUNIOR 09 June 2022 (has links) [pt] Como uma consequência natural da digitalização do sistema judiciário brasileiro, um grande e crescente número de documentos jurídicos tornou-se disponível na internet, especialmente decisões judiciais. Como ilustração, em 2020, o Judiciário brasileiro produziu 25 milhões de decisões. Neste mesmo ano, o Supremo Tribunal Federal (STF), a mais alta corte do judiciário brasileiro, produziu 99.5 mil decisões. Alinhados a esses valores, observamos uma demanda crescente por estudos voltados para a extração e exploração do conhecimento jurídico de grandes acervos de documentos legais. Porém, ao contrário do conteúdo de textos comuns (como por exemplo, livro, notícias e postagem de blog), o texto jurídico constitui um caso particular de uso de uma linguagem altamente convencionalizada. Infelizmente, pouca atenção é dada à extração de informações em domínios especializados, como textos legais. Do ponto de vista temporal, o Judiciário é uma instituição em constante evolução, que se molda para atender às demandas da sociedade. Com isso, o nosso objetivo é propor um processo confiável de extração de informações jurídicas de grandes acervos de documentos jurídicos, tomando como base o STF e as decisões monocráticas publicadas por este tribunal nos anos entre 2000 e 2018. Para tanto, pretendemos explorar a combinação de diferentes técnicas de Processamento de Linguagem Natural (PLN) e Extração de Informação (EI) no contexto jurídico. Da PLN, pretendemos explorar as estratégias automatizadas de reconhecimento de entidades nomeadas no domínio legal. Do ponto da EI, pretendemos explorar a modelagem dinâmica de tópicos utilizando a decomposição tensorial como ferramenta para investigar mudanças no raciocinio juridico presente nas decisões ao lonfo do tempo, a partir da evolução do textos e da presença de entidades nomeadas legais. Para avaliar a confiabilidade, exploramos a interpretabilidade do método empregado, e recursos visuais para facilitar a interpretação por parte de um especialista de domínio. Como resultado final, a proposta de um processo confiável e de baixo custo para subsidiar novos estudos no domínio jurídico e, também, propostas de novas estratégias de extração de informações em grandes acervos de documentos. / [en] As a natural consequence of the Brazilian Judicial System’s digitization, a large and increasing number of legal documents have become available on the Internet, especially judicial decisions. As an illustration, in 2020, 25 million decisions were produced by the Brazilian Judiciary. Meanwhile, the Brazilian Supreme Court (STF), the highest judicial body in Brazil, alone has produced 99.5 thousand decisions. In line with those numbers, we face a growing demand for studies focused on extracting and exploring the legal knowledge hidden in those large collections of legal documents. However, unlike typical textual content (e.g., book, news, and blog post), the legal text constitutes a particular case of highly conventionalized language. Little attention is paid to information extraction in specialized domains such as legal texts. From a temporal perspective, the Judiciary itself is a constantly evolving institution, which molds itself to cope with the demands of society. Therefore, our goal is to propose a reliable process for legal information extraction from large collections of legal documents, based on the STF scenario and the monocratic decisions published by it between 2000 and 2018. To do so, we intend to explore the combination of different Natural Language Processing (NLP) and Information Extraction (IE) techniques on legal domain. From NLP, we explore automated named entity recognition strategies in the legal domain. From IE, we explore dynamic topic modeling with tensor decomposition as a tool to investigate the legal reasoning changes embedded in those decisions over time through textual evolution and the presence of the legal named entities. For reliability, we explore the interpretability of the methods employed. Also, we add visual resources to facilitate interpretation by a domain specialist. As a final result, we expect to propose a reliable and cost-effective process to support further studies in the legal domain and, also, to propose new strategies for information extraction on a large collection of documents. [pt] DIREITO [pt] DECOMPOSICAO TENSORIAL [pt] MODELAGEM DINAMICA DE TOPICOS [pt] RECONHECIMENTO DE ENTIDADE NOMEADA [pt] DOMINIO JURIDICO [pt] EXTRACAO DE INFORMACAO [en] LAW [en] TENSOR DECOMPOSITION [en] DYNAMIC TOPIC MODELING [en] NAMED ENTITY RECOGNITION [en] LEGAL DOMAIN [en] EXTRATION OF INFORMATION
163	A Step Toward GDPR Compliance : Processing of Personal Data in Email Olby, Linnea, Thomander, Isabel January 2018 (has links) The General Data Protection Regulation enforced on the 25th of may in 2018 is a response to the growing importance of IT in today’s society, accompanied by public demand for control over personal data. In contrast to the previous directive, the new regulation applies to personal data stored in an unstructured format, such as email, rather than solely structured data. Companies are now forced to accommodate to this change, among others, in order to be compliant. This study aims to provide a code of conduct for the processing of personal data in email as a measure for reaching compliance. Furthermore, this study investigates whether Named Entity Recognition (NER) can aid this process as a means of finding personal data in the form of names. A literature review of current research and recommendations was conducted for the code of conduct proposal. A NER system was constructed using a hybrid approach with Binary Logistic Regression, hand-crafted rules and gazetteers. The model was applied to a selection of emails, including attachments, obtained from a small consultancy company in the automotive industry. The proposed code of conduct consists of six items, applied to the consultancy firm. The NER-model demonstrated low ability to identify names and was therefore deemed insufficient for this task. / Dataskyddsförordningen började gälla den 25e maj 2018, och uppstod som ett svar på den okände betydelsen av IT i dagens samhälle samt allmänhetens krav på ökad kontroll över personuppgifter för den enskilde individen. Till skillnad från det tidigare direktivet, omfattar den nya förordningen även personuppgifter som är lagrad i ostrukturerad form, som till exempel e-post, snarare än endast i strukturerad form. Många företag tvingas därmed att anpassa sig efter detta, tillsammans med ett flertal andra nya krav, i syfte att efterfölja förordningen. Den här studien syftar till att lägga fram ett förslag på en uppförandekod för behandling av personuppgifter i e-post som ett verktyg för att nå medgörlighet. Utöver detta undersöks det om Named Entity Recognition (NER) kan användas som ett hjälpmedel vid identifiering av personuppgifter, mer specifikt namn. En litteraturstudie kring tidigare forskning och aktuella rekommendationer utfördes inför utformningen av uppförandekoden. Ett NER-system konstruerades med hjälp av Binär Logistisk Regression, handgjorda regler och ordlistor. Modellen applicerades på ett urval av e-postmeddelanden, med eventuella bilagor, som tillhandahölls från ett litet konsultbolag aktivt inom bilindustrin. Den rekommenderade uppförandekoden består av sex punkter, applicerade på konsultbolaget. NER-modellen påvisade en låg förmåga att identifiera namn och ansågs därför inte vara lämplig för den utsatta uppgiften. Information extraction named entity recognition machine learning binary logistic regression GDPR code of conduct Computer and Information Sciences Data- och informationsvetenskap Computer Sciences Datavetenskap (datalogi)
164	Utilizing Transformers with Domain-Specific Pretraining and Active Learning to Enable Mining of Product Labels Norén, Erik January 2023 (has links) Structured Product Labels (SPLs), the package inserts that accompany drugs governed by the Food and Drugs Administration (FDA), hold information about Adverse Drug Reactions (ADRs) that exists associated with drugs post-market. This information is valuable for actors working in the field of pharmacovigilance aiming to improve the safety of drugs. One such actor is Uppsala Monitoring Centre (UMC), a non-profit conducting pharmacovigilance research. In order to access the valuable information of the package inserts, UMC have constructed an SPL mining pipeline in order to mine SPLs for ADRs. This project aims to investigate new approaches to the solution to the Scan problem, the part of the pipeline responsible for extracting mentions of ADRs. The Scan problem is solved by approaching the problem as a Named Entity Recognition task, a subtask of Natural Language Processing. By using the transformer-based deep learning model BERT, with domain-specific pre-training, an F1-score of 0.8220 was achieved. Furthermore, the chosen model was used in an iteration of Active Learning in order to efficiently extend the available data pool with the most informative examples. Active Learning improved the F1-score to 0.8337. However, the Active Learning was benchmarked against a data set extended with random examples, showing similar improved scores, therefore this application of Active Learning could not be determined to be effective in this project. Machine Learning Natural Language Processing NLP BERT Active Learning AL Pharmacovigilance UMC Uppsala Monitoring Centre Structured Product Label SPL Transformers Domain-Specific Pretraining NER Named Entity Recognition Adverse Drug Reaction ADR Computer and Information Sciences Data- och informationsvetenskap
165	"Why do hurt people hurt people?" A SERIES OF CASE STUDIES EXPLORING ABUSIVE RELATIONSHIPS IN DRAMATIC TEXTS AND ONSTAGE WITH TONI KOCHENSPARGER'S MILKWHITE Lane, Michelle I. 27 April 2017 (has links) No description available. Art Criticism Fine Arts Performing Arts Theater Theater History Theater Studies Theater theatre Milkwhite A Streetcar Named Desire Blasted The Collection Toni Kochensparger Harold Pinter Sarah Kane Tennessee Williams abuse psychological abuse physical abuse relationship abuse play domestic violence
166	Playing the Big Easy: A History of New Orleans in Film and Television Joseph, Robert Gordon 18 April 2018 (has links) No description available. Film Studies American Studies Geography History Mass Media New Orleans Film History Film Studies Cultural Studies Geography Television Studies Cityscapes Spectacle Cultural Studies Hurricane Katrina A Streetcar Named Desire Panic in the Streets The Big Easy Deja Vu Treme Tourism Creole Cajun
167	[pt] EXTRAÇÃO DE INFORMAÇÕES DE SENTENÇAS JUDICIAIS EM PORTUGUÊS / [en] INFORMATION EXTRACTION FROM LEGAL OPINIONS IN BRAZILIAN PORTUGUESE GUSTAVO MARTINS CAMPOS COELHO 03 October 2022 (has links) [pt] A Extração de Informação é uma tarefa importante no domínio jurídico. Embora a presença de dados estruturados seja escassa, dados não estruturados na forma de documentos jurídicos, como sentenças, estão amplamente disponíveis. Se processados adequadamente, tais documentos podem fornecer informações valiosas sobre processos judiciais anteriores, permitindo uma melhor avaliação por profissionais do direito e apoiando aplicativos baseados em dados. Este estudo aborda a Extração de Informação no domínio jurídico, extraindo valor de sentenças relacionados a reclamações de consumidores. Mais especificamente, a extração de cláusulas categóricas é abordada através de classificação, onde seis modelos baseados em diferentes estruturas são analisados. Complementarmente, a extração de valores monetários relacionados a indenizações por danos morais é abordada por um modelo de Reconhecimento de Entidade Nomeada. Para avaliação, um conjunto de dados foi criado, contendo 964 sentenças anotados manualmente (escritas em português) emitidas por juízes de primeira instância. Os resultados mostram uma média de aproximadamente 97 por cento de acurácia na extração de cláusulas categóricas, e 98,9 por cento na aplicação de NER para a extração de indenizações por danos morais. / [en] Information Extraction is an important task in the legal domain. While the presence of structured and machine-processable data is scarce, unstructured data in the form of legal documents, such as legal opinions, is largely available. If properly processed, such documents can provide valuable information with regards to past lawsuits, allowing better assessment by legal professionals and supporting data-driven applications. This study addresses Information Extraction in the legal domain by extracting value from legal opinions related to consumer complaints. More specifically, the extraction of categorical provisions is addressed by classification, where six models based on different frameworks are analyzed. Moreover, the extraction of monetary values related to moral damage compensations is addressed by a Named Entity Recognition (NER) model. For evaluation, a dataset was constructed, containing 964 manually annotated legal opinions (written in Brazilian Portuguese) enacted by lower court judges. The results show an average of approximately 97 percent of accuracy when extracting categorical provisions, and 98.9 percent when applying NER for the extraction of moral damage compensations. [pt] EXTRACAO DE INFORMACAO [pt] EXTRACAO DE VARIAVEIS EM TEXTOS [pt] PROCESSAMENTO DE LINGUAGEM NATURAL [pt] CLASSIFICACAO DE TEXTOS [en] EXTRATION OF INFORMATION [en] TEXT FEATURE EXTRACTION [en] NAMED ENTITY RECOGNITION [en] NATURAL LANGUAGE PROCESSING [en] TEXT CLASSIFICATION
168	Geo-Locating Tweets with Latent Location Information Lee, Sunshin 13 February 2017 (has links) As part of our work on the NSF funded Integrated Digital Event Archiving and Library (IDEAL) project and the Global Event and Trend Archive Research (GETAR) project, we collected over 1.4 billion tweets using over 1,000 keywords, key phrases, mentions, or hashtags, starting from 2009. Since many tweets talk about events (with useful location information), such as natural disasters, emergencies, and accidents, it is important to geo-locate those tweets whenever possible. Due to possible location ambiguity, finding a tweet's location often is challenging. Many distinct places have the same geoname, e.g., "Greenville" matches 50 different locations in the U.S.A. Frequently, in tweets, explicit location information, like geonames mentioned, is insufficient, because tweets are often brief and incomplete. They have a small fraction of the full location information of an event due to the 140 character limitation. Location indicative words (LIWs) may include latent location information, for example, "Water main break near White House" does not have any geonames but it is related to a location "1600 Pennsylvania Ave NW, Washington, DC 20500 USA" indicated by the key phrase 'White House'. To disambiguate tweet locations, we first extracted geospatial named entities (geonames) and predicted implicit state (e.g., Virginia or California) information from entities using machine learning algorithms including Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest (RF). Implicit state information helps reduce ambiguity. We also studied how location information of events is expressed in tweets and how latent location indicative information can help to geo-locate tweets. We then used a machine learning (ML) approach to predict the implicit state using geonames and LIWs. We conducted experiments with tweets (e.g., about potholes), and found significant improvement in disambiguating tweet locations using a ML algorithm along with the Stanford NER. Adding state information predicted by our classifiers increased the possibility to find the state-level geo-location unambiguously by up to 80%. We also studied over 6 million tweets (3 mid-size and 2 big-size collections about water main breaks, sinkholes, potholes, car crashes, and car accidents), covering 17 months. We found that up to 91.1% of tweets have at least one type of location information (geo-coordinates or geonames), or LIWs. We also demonstrated that in most cases adding LIWs helps geo-locate tweets with less ambiguity using a geo-coding API. Finally, we conducted additional experiments with the five different tweet collections, and found significant improvement in disambiguating tweet locations using a ML approach with geonames and all LIWs that are present in tweet texts as features. / Ph. D. Classification Events Geo-coding Geo-locating Geo-parsing Google Geo-coding API Hadoop cluster Location Indicative Words (LIWs) Machine learning Naïve Bayes Named Entity Recognition Natural Language Processing
169	Le repérage automatique des entités nommées dans la langue arabe : vers la création d'un système à base de règles Zaghouani, Wajdi January 2009 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal. Extraction d'information Fouille de textes Extraction des entités nommées Noms propres Langue arabe Traitement automatique de la langue Système à base de règles Constitution de corpus Évaluation Information extraction Text mining Named entity extraction Proper names Arabic language Natural Language Processing (NLP) Rule-based system Corpus development Evaluation
170	威廉斯三部劇本裡的家庭失序與社會批判 / Spiritual anomie of the family and social criticism in Tennessee Williams's three plays 溫鳳祺, Uen, Fong-Chyi Unknown Date (has links) 田納西‧威廉斯的劇本中經常描述家庭隱涵的不安與緊張關係，以及新興文明對美國南方傳統文化的衝擊，憑藉威廉斯的許多劇本和訪談錄可以看出作者對傳統與現代文化態度的改變。本論文旨在探討作者的早期寫作生涯（約在 1960 年以前，評家稱此時期為田納西‧威廉斯的劇本創作黃金時期）中三部重要劇本裡面對家庭和社會的看法，此論文希望能找出作者人生態度改變的原因和方式。《玻璃動物園》、《慾望街車》、《朱門巧婦》這三部劇本本身不但具備不可磨滅的藝術價值，主題也前後鉤連，劇本內在關係環環相扣，前後緊密一致。本論文將分成五個部分，除了導論和結論其中的三章討論三個劇本的情節。各章皆針對風景、對話風格、角色的個性、象徵意涵、社會地位與扮演的角色細緻探索檢視，藉此暴露社會的現象和文化的激盪；除了文本的詮釋剖析，論文將佐以部分的威廉斯生平資料，藉此探討作者在劇本中如何揭露他對社會的看法和藝術創作的蛻變過程。 / Praised as one of the greatest American dramatists, Tennessee Williams is obsessed with delineating conflicts among family members and cultural clashes in the American South. However, the artist's attitude towards modern society seems to change in his separate plays. The purpose of this thesis is to trace Tennessee Williams's three plays, that is, The Glass Menagerie, A Streetcar Named Desire, and Cat on a Hot Tin Roof, to find out why and how his attitide or view of life changes. These plays are the most popular and frequently discussed ones that stress impossible relationships among family members. Enormous in their aesthetic values, these three plays are thematically related. This thesis will be divided into five parts: Introduction, three chapters dealing with these three plays respectively, and Conclusion. Each chapter includes a discussion of major characters, probing into separate symbolic meanings and social status and roles in different circumstances, and linguistic styles; setting of the play, the interactions of the environment and characters; male-female interrelationship; shades of difference of the author's ideological concepts and author's attitude toward the wider contextual values. By searching for autobiographical elements and the social background, I hope this thesis can restore historciacl as well as textual meanings as represented in these three plays, thereby reexamining the playwright's views toward the external world and the evolution of man's mental processes. 田納西威廉斯家庭失序社會批判玻璃動物園慾望街車朱門巧婦 Williams, Tennessee Anomie of the family Social criticism The glass menagerie A streetcar named desire Cat on a hot tin roof

Search results