51 |
英文介系詞片語定位與英文介系詞推薦 / Attachment of English prepositional phrases and suggestions of English prepositions蔡家琦, Tsai, Chia Chi Unknown Date (has links)
英文介系詞在句子裡所扮演的角色通常是用來使介系詞片語更精確地補述上下文,英文的母語使用者可以很直覺地使用。然而電腦不瞭解語義,因此不容易判斷介系詞修飾對象;非英文母語使用者則不容易直覺地使用正確的介系詞。所以本研究將專注於介系詞片語定位與介系詞推薦的議題。
在本研究將這二個介系詞議題抽象化為一個決策問題,並提出一個一般化的解決方法。這二個問題共通的部分在於動詞片語,一個簡單的動詞片語含有最重要的四個中心詞(headword):動詞、名詞一、介系詞和名詞二。由這四個中心詞做為出發點,透過WordNet做階層式的選擇,在大量的案例中尋找語義上共通的部分,再利用機器學習的方法建構一般化的模型。此外,針對介系詞片語定的問題,我們挑選較具挑戰性介系詞做實驗。
藉由使用真實生活語料,我們的方法處理介系詞片語定位的問題,比同樣考慮四個中心詞的最大熵值法(Max Entropy)好;但與考慮上下文的Stanford剖析器差不多。而在介系詞推薦的問題裡,較難有全面比較的對象,但我們的方法精準度可達到53.14%。
本研究發現,高層次的語義可以使分類器有不錯的分類效果,而透過階層式的選擇語義能使分類效果更佳。這顯示我們確實可以透過語義歸納一套準則,用於這二個介系詞的議題。相信成果在未來會對機器翻譯與文本校對的相關研究有所價值。 / This thesis focuses on problems of attachment of prepositional phrases (PPs) and problems of prepositional suggestions. Determining the correct PP attachment is not easy for computers. Using correct prepositions is not easy for learners of English as a second language.
I transform the problems of PPs attachment and prepositional suggestion into an abstract model, and apply the same computational procedures to solve these two problems. The common model features four headwords, i.e., the verb, the first noun, the preposition, and the second noun in the prepositional phrases. My methods consider the semantic features of the headwords in WordNet to train classification models, and apply the learned models for tackling the attachment and suggestion problems. This exploration of PP attachment problems is special in that only those PPs that are almost equally possible to attach to the verb and the first noun were used in the study.
The proposed models consider only four headwords to achieve satisfactory performances. In experiments for PP attachment, my methods outperformed a Maximum Entropy classifier which also considered four headwords. The performances of my methods and of the Stanford parsers were similar, while the Stanford parsers had access to the complete sentences to judge the attachments. In experiments for prepositional suggestions, my methods found the correct prepositions 53.14% of the time, which is not as good as the best performing system today.
This study reconfirms that semantic information is instrument for both PP attachment and prepositional suggestions. High level semantic information helped to offer good performances, and hierarchical semantic synsets helped to improve the observed results. I believe that the reported results are valuable for future studies of PP attachment and prepositional suggestions, which are key components for machine translation and text proofreading.
|
52 |
Méthodes et modèles de construction automatisée d'ontologies pour des domaines spécialisés / Methods and models for the learning the domain ontologyGoncharova, Olena 23 February 2017 (has links)
La thèse est préparée dans le cadre d’une convention de cotutelle sous la direction des Professeurs Jean-Hugues Chauchat (ERIC-Lyon2) et N.V. Charonova (Université Nationale Polytechnique de Kharkov en Ukraine).1. Les résultats obtenus peuvent se résumer ainsi : Rétrospective des fondations théoriques sur la formalisation des connaissances et langue naturelle en tant que précurseurs de l’ingénierie des ontologies. Actualisation de l’état de l’art sur les approches générales dans le domaine de l’apprentissage d’ontologie, et sur les méthodes d’extraction des termes et des relations sémantiques. Panorama des plateformes et outils de construction et d’apprentissage des ontologies ; répertoire des ressources lexicales disponibles en ligne et susceptibles d’appuyer l’apprentissage d’ontologie (apprentissage des concepts et relation). 2. Propositions méthodologiques : Une méthode d’apprentissage des patrons morphosyntaxiques et d’installation de taxonomies partielles de termes. Une méthode de formation de classes sémantiques représentant les concepts et les relations pour le domaine de la sécurité radiologique. Un cadre (famework) d’organisation des étapes de travaux menant à la construction de l’ontologie du domaine de la sécurité radiologique.3. Implémentation et expérimentations : Installation de deux corpus spécialisés dans le domaine de la protection radiologique, en français et en russe, comprenant respectivement 1 500 000 et 600 000 unités lexicales. Implémentation des trois méthodes proposées et analyse des résultats obtenus. Les résultats ont été présentés dans 13 publications, revues et actes de conférences nationales et internationales, entre 2010 et 2016, notamment IMS-2012, TIA-2013, TOTH-2014, Eastern-European Journal of Eenterprise Technologies, Bionica Intellecta (Бионика интеллекта), Herald of the NTU «~KhPI~» (Вестник НТУ «~ХПИ~»). / The thesis has been prepared within a co-supervision agreement with the Professors Jean-Hugues Chauchat (ERIC-Lyon2) and N.V. Charonova (National Polytechnic University of Kharkov in Ukraine).The results obtained can be summarized as follows:1. State of the art:Retrospective of theoretical foundations concerning the formalization of knowledge and natural language as precursors of ontology engineering.Update of the state of the art on general approaches in the field of ontology learning, and on methods for extracting terms and semantic relations.Overview of platforms and tools for ontology construction and learning; list of lexical resources available online able to support ontology learning (concept learning and relationship).2. Methodological proposals:Learning morphosyntactic patterns and implementing partial taxonomies of terms.Finding semantic classes representing concepts and relationships for the field of radiological safety.Building a frame for the various stages of the work leading to the construction of the ontology in the field of radiological safety.3. Implementation and experiments:Loading of two corpuses specialized in radiological protection, in French and Russian, with 1,500,000 and 600,000 lexical units respectively.Implementation of the three previous methods and analysis of the results obtained.The results have been published in 13 national and international journals and proceedings, between 2010 and 2016, including IMS-2012, TIA-2013, TOTH-2014, Bionica Intellecta (Бионика интеллекта) , Herald of the NTU "~ KhPI ~" (Вестник НТУ "~ ХПИ ~").
|
53 |
De la sublimación del amorBallón Aguirre, Enrique 25 September 2017 (has links)
El tema de la “sublimación del amor”, ampliamente debatido en las estéticas de Kant y Hegel, ha sido ilustrado en la poesía hispanoamericana colonial con sendos poemas de Diego Dávalos y Figueroa, Sor Juana Inés de la Cruz, y Juan del Valle y Caviedes. A partir de su examen semánticotextual, en este artículo se determina los alcances de las respectivas poéticas en el marco de dicho enfrentamiento teórico. / The “love’s sublimation” subject was under debate in the Kant and Hegel aesthetic doctrine. This controversy was also illustrated by the colonial Hispanic-American poetry of Diego Dávalos y Figueroa, Sor Juana Inés de la Cruz and Juan del Valle y Caviedes. This article looks into the semantic structure of tree poems in order to determine their poetic scopes in the frame of the Kant-Hegel aesthetic discussion.
|
54 |
Analysis of similarity and differences between articles using semanticsBihi, Ahmed January 2017 (has links)
Adding semantic analysis in the process of comparing news articles enables a deeper level of analysis than traditional keyword matching. In this bachelor’s thesis, we have compared, implemented, and evaluated three commonly used approaches for document-level similarity. The three similarity measurement selected were, keyword matching, TF-IDF vector distance, and Latent Semantic Indexing. Each method was evaluated on a coherent set of news articles where the majority of the articles were written about Donald Trump and the American election the 9th of November 2016, there were several control articles, about random topics, in the set of articles. TF-IDF vector distance combined with Cosine similarity and Latent Semantic Indexing gave the best results on the set of articles by separating the control articles from the Trump articles. Keyword matching and TF-IDF distance using Euclidean distance did not separate the Trump articles from the control articles. We implemented and performed sentiment analysis on the set of news articles in the classes positive, negative and neutral and then validated them against human readers classifying the articles. With the sentiment analysis (positive, negative, and neutral) implementation, we got a high correlation with human readers (100%).
|
55 |
Semantic Analysis of Natural Language and Definite Clause Grammar using Statistical Parsing and ThesauriDagerman, Björn January 2013 (has links)
Services that rely on the semantic computations of users’ natural linguistic inputs are becoming more frequent. Computing semantic relatedness between texts is problematic due to the inherit ambiguity of natural language. The purpose of this thesis was to show how a sentence could be compared to a predefined semantic Definite Clause Grammar (DCG). Furthermore, it should show how a DCG-based system could benefit from such capabilities. Our approach combines openly available specialized NLP frameworks for statistical parsing, part-of-speech tagging and word-sense disambiguation. We compute the semantic relatedness using a large lexical and conceptual-semantic thesaurus. Also, we extend an existing programming language for multimodal interfaces, which uses static predefined DCGs: COactive Language Definition (COLD). That is, every word that should be acceptable by COLD needs to be explicitly defined. By applying our solution, we show how our approach can remove dependencies on word definitions and improve grammar definitions in DCG-based systems.
|
56 |
Using Topic Models to Study Journalist-Audience Convergence and Divergence: The Case of Human Trafficking Coverage on British Online NewspapersPapadouka, Maria Eirini 08 1900 (has links)
Despite the accessibility of online news and availability of sophisticated methods for analyzing news content, no previous study has focused on the simultaneous examination of news coverage on human trafficking and audiences' interpretations of this coverage. In my research, I have examined both journalists' and commenters' topic choices in coverage and discussion of human trafficking from the online platforms of three British newspapers covering the period 2009–2015. I used latent semantic analysis (LSA) to identify emergent topics in my corpus of newspaper articles and readers' comments, and I then quantitatively investigated topic preferences to identify convergence and divergence on the topics discussed by journalists and their readers. I addressed my research questions in two distinctive studies. The first case study implemented topic modelling techniques and further quantitative analyses on article and comment paragraphs from The Guardian. The second extensive study included article and comment paragraphs from the online platforms of three British newspapers: The Guardian, The Times and the Daily Mail. The findings indicate that the theories of "agenda setting" and of "active audience" are not mutually exclusive, and the scope of explanation of each depends partly on the specific topic or subtopic that is analyzed. Taking into account further theoretical concepts related to agenda setting, four more additional research questions were addressed. Topic convergence and divergence was further identified when taking into account the newspapers' political orientation and the articles' and comments' year of publication.
|
57 |
Semantic Role Agency in Perceptions of the Lexical Items Sick and EvilSimmons, Nathan G. 18 November 2008 (has links)
Inspired by an ongoing debate in the clinical sciences concerning the value of evil as a label for human behavior (Mowrer 1960, Staub 1999, Wellman 2000, Williams 2004 etc.), this thesis examines the semantic role of AGENT in the lexical items sick and evil. Williams makes the argument that the label evil removes responsibility from the doctor, whereas, the label sick empowers the doctor in bringing about a cure. While this view is not universally accepted in the field, it does bring to light an interesting question in applied linguistic semantics as to the assignment of agency with respect to sick and evil. Based on the close association of the meanings of sick and evil that stems from historical, psychological, and legal perspectives, this thesis assumes that the semantic feature [+/-RESPONSIBILITY] is assigned to either sick or evil at some point along a continuum. This continuum establishes EVIL at one pole and receives [+RESPONSIBILITY] while SICK is at the opposite pole and receives [-RESPONSIBILITY]. Using a variety of prompts to survey 106 respondents, the continuum model is shown to be only partially true. There is a correlation between NON-RESPONSIBILITY and SICK. Also, a continuum exists that allows the assignment of PARTIAL RESPONSIBILITY to both terms. However, there is no definitive significant correlation between RESPONSIBILITY and EVIL. Further conclusions include the indication of adherence to a legal model of guilt, innocence, and insanity in the general conceptions of SICK and EVIL. Also, demographic variation shows little predictive potential in how people perceive SICK and EVIL. This thesis concludes with a proposal for an alternative model using a Greimas Square to represent the conceptions of SICK and EVIL that more appropriately fits the trends found in the survey data.
|
58 |
Nové metody zpracování textu pro klasifikaci emocí / New methods for emotion recognition from textOnderka, Jakub January 2015 (has links)
This master’s thesis is about a method for sentimental analysis, especially machine learning methods without teacher. In detail are described method for semantic modeling LSA, pLSA a LDA. It was created a LDA implementation in Java language, which was used to emotional classification of 860 Czech documents to six different emotional categories. Maximal accuracy was 24 % if optimized parameters was used.
|
59 |
Attention to COVID-19 : A content analysis study of Swedish interim reportsStröm, Andreas January 2021 (has links)
The purpose of this study was to examine the attention to the COVID-19 pandemic displayed by top-level management in companies listed on Nasdaq OMX Stockholm Large Cap, and how this aforementioned attention was affected by the board size and board gender diversity of respective company. To accomplish this, a content analysis on a word-by-word level was conducted of all interim reports produced in 2020 by each company, and data regarding the board size and board gender diversity was gathered, for each company. The frequency with which each company mentioned select keywords concerning the COVID-19 pandemic was measured and then used as a comparative measure of the attention to the COVID-19 pandemic. In order to determine the magnitude of the impact of the independent variables a control variable, firm size, was introduced and linear fits were constructed of various combinations of variables. The resulting fits all clearly displayed an absence of correlation between either board size or board gender diversity and the attention paid to the COVID-19 pandemic by top-level management in large Swedish companies. Hence, this study suggests that there is no increase in board activity regarding daily operations in large Swedish companies during crises.
|
60 |
An Analysis of Educational Technology Publications: Who, What and Where in the Last 20 YearsNatividad Beltrán del Río, Gloria Ofelia 05 1900 (has links)
This exploratory and descriptive study examines research articles published in ten of the top journals in the broad area of educational technology during the last 20 years: 1) Educational Technology Research and Development (ETR&D); 2) Instructional Science; 3) Journal of the Learning Sciences; 4) TechTrends; 5) Educational Technology: The Magazine for Managers of Change in Education; 6) Journal of Educational Technology & Society; 7) Computers and Education; 8) British Journal of Educational Technology (BJET); 9) Journal of Educational Computing Research; and 10) Journal of Research on Technology in Education. To discover research trends in the articles published from 1995-2014, abstracts from all contributing articles published in those ten prominent journals were analyzed to extract a latent semantic space of broad research areas, top authors, and top-cited publications. Concepts that have emerged, grown, or diminished in the field were noted in order to identify the most dominant in the last two decades; and the most frequent contributors to each journal as well as those who contributed to more than one of the journals studied were identified.
|
Page generated in 0.0504 seconds