Global ETD Search

191	Learning for semantic parsing with kernels under various forms of supervision Kate, Rohit Jaivant, January 1900 (has links) Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
192	Learning for semantic parsing and natural language generation using statistical machine translation techniques Wong, Yuk Wah, January 1900 (has links) Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
193	Designing intelligent language tutoring systems for integration into foreign language instruction Amaral, Luiz Alexandre Mattos do, January 2007 (has links) Thesis (Ph. D.)--Ohio State University, 2007. / Full text release at OhioLINK's ETD Center delayed at author's request
194	Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection Nahnsen, Thade, Uzuner, Ozlem, Katz, Boris 19 May 2005 (has links) We present a system to determine content similarity of documents. More specifically, our goal is to identify book chapters that are translations of the same original chapter; this task requires identification of not only the different topics in the documents but also the particular flow of these topics. We experiment with different representations employing n-grams of lexical chains and test these representations on a corpus of approximately 1000 chapters gathered from books with multiple parallel translations. Our representations include the cosine similarity of attribute vectors of n-grams of lexical chains, the cosine similarity of tf*idf-weighted keywords, and the cosine similarity of unweighted lexical chains (unigrams of lexical chains) as well as multiplicative combinations of the similarity measures produced by these approaches. Our results identify fourgrams of unordered lexical chains as a particularly useful representation for text similarity evaluation. AI Natural Language Processing N-grams Text Similarity Lexical Chains
195	Identifying Expression Fingerprints using Linguistic Information Uzuner, Ozlem 18 November 2005 (has links) This thesis presents a technology to complement taxation-based policy proposals aimed at addressing the digital copyright problem. Theapproach presented facilitates identification of intellectual propertyusing expression fingerprints. Copyright law protects expression of content. Recognizing literaryworks for copyright protection requires identification of theexpression of their content. The expression fingerprints described inthis thesis use a novel set of linguistic features that capture boththe content presented in documents and the manner of expression usedin conveying this content. These fingerprints consist of bothsyntactic and semantic elements of language. Examples of thesyntactic elements of expression include structures of embedding andembedded verb phrases. The semantic elements of expression consist ofhigh-level, broad semantic categories. Syntactic and semantic elements of expression enable generation ofmodels that correctly identify books and their paraphrases 82% of thetime, providing a significant (approximately 18%) improvement over modelsthat use tfidf-weighted keywords. The performance of models builtwith these features is also better than models created with standardfeatures used in stylometry (e.g., function words), which yield anaccuracy of 62%.In the non-digital world, copyright holders collect revenues bycontrolling distribution of their works. Current approaches to thedigital copyright problem attempt to provide copyright holders withthe same kind of control over distribution by employing Digital RightsManagement (DRM) systems. However, DRM systems also enable copyrightholders to control and limit fair use, to inhibit others' speech, andto collect private information about individual users of digitalworks.Digital tracking technologies enable alternate solutions to thedigital copyright problem; some of these solutions can protectcreative incentives of copyright holders in the absence of controlover distribution of works. Expression fingerprints facilitatedigital tracking even when literary works are DRM- and watermark-free,and even when they are paraphrased. As such, they enable meteringpopularity of works and make practicable solutions that encouragelarge-scale dissemination and unrestricted use of digital works andthat protect the revenues of copyright holders, for example throughtaxation-based revenue collection and distribution systems, withoutimposing limits on distribution. AI natural language processing syntactic information content expression
196	Linguistic Refactoring of Business Process Models Pittke, Fabian 11 1900 (has links) (PDF) In the past decades, organizations had to face numerous challenges due to intensifying globalization and internationalization, shorter innovation cycles and growing IT support for business. Business process management is seen as a comprehensive approach to align business strategy, organization, controlling, and business activities to react flexibly to market changes. For this purpose, business process models are increasingly utilized to document and redesign relevant parts of the organization's business operations. Since companies tend to have a growing number of business process models stored in a process model repository, analysis techniques are required that assess the quality of these process models in an automatic fashion. While available techniques can easily check the formal content of a process model, there are only a few techniques available that analyze the natural language content of a process model. Therefore, techniques are required that address linguistic issues caused by the actual use of natural language. In order to close this gap, this doctoral thesis explicitly targets inconsistencies caused by natural language and investigates the potential of automatically detecting and resolving them under a linguistic perspective. In particular, this doctoral thesis provides the following contributions. First, it defines a classification framework that structures existing work on process model analysis and refactoring. Second, it introduces the notion of atomicity, which implements a strict consistency condition between the formal content and the textual content of a process model. Based on an explorative investigation, we reveal several reoccurring violation patterns are not compliant with the notion of atomicity. Third, this thesis proposes an automatic refactoring technique that formalizes the identified patterns to transform a non-atomic process models into an atomic one. Fourth, this thesis defines an automatic technique for detecting and refactoring synonyms and homonyms in process models, which is eventually useful to unify the terminology used in an organization. Fifth and finally, this thesis proposes a recommendation-based refactoring approach that addresses process models suffering from incompleteness and leading to several possible interpretations. The efficiency and usefulness of the proposed techniques is further evaluated by real-world process model repositories from various industries. (author's abstract)
197	Predicting the programming language of questions and snippets of stack overflow using natural language processing Alrashedy, Kamel 11 September 2018 (has links) Stack Overflow is the most popular Q&A website among software developers. As a platform for knowledge sharing and acquisition, the questions posted in Stack Over- flow usually contain a code snippet. Stack Overflow relies on users to properly tag the programming language of a question and assumes that the programming language of the snippets inside a question is the same as the tag of the question itself. In this the- sis, a classifier is proposed to predict the programming language of questions posted in Stack Overflow using Natural Language Processing (NLP) and Machine Learning (ML). The classifier achieves an accuracy of 91.1% in predicting the 24 most popular programming languages by combining features from the title, body and code snippets of the question. We also propose a classifier that only uses the title and body of the question and has an accuracy of 81.1%. Finally, we propose a classifier of code snip- pets only that achieves an accuracy of 77.7%.Thus, deploying ML techniques on the combination of text and code snippets of a question provides the best performance. These results demonstrate that it is possible to identify the programming language of a snippet of only a few lines of source code. We visualize the feature space of two programming languages Java and SQL in order to identify some properties of the information inside the questions corresponding to these languages. / Graduate Stack overflow knowledge sharing Natural Language Processing Machine Learning
198	Advancing Biomedical Named Entity Recognition with Multivariate Feature Selection and Semantically Motivated Features January 2013 (has links) abstract: Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets. / Dissertation/Thesis / Ph.D. Computer Science 2013 Computer science Computational linguistics Information extraction Natural language processing
199	Concept Maps Mining for Text Summarization AGUIAR, C. Z. 31 March 2017 (has links) Made available in DSpace on 2018-08-02T00:03:48Z (GMT). No. of bitstreams: 1 tese_11160_CamilaZacche_dissertacao_final.pdf: 5437260 bytes, checksum: 0c96c6b2cce9c15ea234627fad78ac9a (MD5) Previous issue date: 2017-03-31 / 8 Resumo Os mapas conceituais são ferramentas gráficas para a representação e construção do conhecimento. Conceitos e relações formam a base para o aprendizado e, portanto, os mapas conceituais têm sido amplamente utilizados em diferentes situações e para diferentes propósitos na educação, sendo uma delas a represent ação do texto escrito. Mes mo um gramá tico e complexo texto pode ser representado por um mapa conceitual contendo apenas conceitos e relações que represente m o que foi expresso de uma forma mais complicada. No entanto, a construção manual de um mapa conceit ual exige bastante tempo e esforço na identificação e estruturação do conhecimento, especialmente quando o mapa não deve representar os conceitos da estrutura cognitiva do autor. Em vez disso, o mapa deve representar os conceitos expressos em um texto. Ass im, várias abordagens tecnológicas foram propostas para facilitar o processo de construção de mapas conceituais a partir de textos. Portanto, esta dissertação propõe uma nova abordagem para a construção automática de mapas conceituais como sumarização de t extos científicos. A sumarização pretende produzir um mapa conceitual como uma representação resumida do texto, mantendo suas diversas e mais importantes características. A sumarização pode facilitar a compreensão dos textos, uma vez que os alunos estão te ntando lidar com a sobrecarga cognitiva causada pela crescente quantidade de informação textual disponível atualmente. Este crescimento também pode ser prejudicial à construção do conhecimento. Assim, consideramos a hipótese de que a sumarização de um text o representado por um mapa conceitual pode atribuir características importantes para assimilar o conhecimento do texto, bem como diminuir a sua complexidade e o tempo necessário para processá - lo. Neste contexto, realizamos uma revisão da literatura entre o s anos de 1994 e 2016 sobre as abordagens que visam a construção automática de mapas conceituais a partir de textos. A partir disso, construímos uma categorização para melhor identificar e analisar os recursos e as características dessas abordagens tecnoló gicas. Além disso, buscamos identificar as limitações e reunir as melhores características dos trabalhos relacionados para propor nossa abordagem. 9 Ademais, apresentamos um processo Concept Map Mining elaborado seguindo quatro dimensões : Descrição da Fonte de Dados, Definição do Domínio, Identificação de Elementos e Visualização do Mapa. Com o intuito de desenvolver uma arquitetura computacional para construir automaticamente mapas conceituais como sumarização de textos acadêmicos, esta pesquisa resultou na ferramenta pública CMBuilder , uma ferramenta online para a construção automática de mapas conceituais a partir de textos, bem como uma api java chamada ExtroutNLP , que contém bibliotecas para extração de informações e serviços públicos. Para alcançar o objetivo proposto, direcionados esforços para áreas de processamento de linguagem natural e recuperação de informação. Ressaltamos que a principal tarefa para alcançar nosso objetivo é extrair do texto as proposições do tipo ( conceito, rela ção, conceito ). Sob essa premissa, a pesquisa introduz um pipeline que compreende: regras gramaticais e busca em profundidade para a extração de conceitos e relações a partir do texto; mapeamento de preposição, resolução de anáforas e exploração de entidad es nomeadas para a rotulação de conceitos; ranking de conceitos baseado na análise de frequência de elementos e na topologia do mapa; e sumarização de proposição baseada na topologia do grafo. Além disso, a abordagem também propõe o uso de técnicas de apre ndizagem supervisionada de clusterização e classificação associadas ao uso de um tesauro para a definição do domínio do texto e construção de um vocabulário conceitual de domínios. Finalmente, uma análise objetiva para validar a exatidão da biblioteca Extr outNLP é executada e apresenta 0.65 precision sobre o corpus . Além disso, uma análise subjetiva para validar a qualidade do mapa conceitual construído pela ferramenta CMBuilder é realizada , apresentando 0.75/0.45 para precision / recall de conceitos e 0.57/ 0.23 para precision/ recall de relações em idioma inglês e apresenta ndo 0.68/ 0.38 para precision/ recall de conceitos e 0.41/ 0.19 para precision/ recall de relações em idioma português. Ademais , um experimento para verificar se o mapa conceitual sumarizado pe lo CMBuilder tem influência para a compreensão do assunto abordado em um texto é realizado , atingindo 60% de acertos para mapas extraídos de pequenos textos com questões de múltipla escolha e 77% de acertos para m apas extraídos de textos extensos com quest ões discursivas Concept Map Concept Map Mining Natural Language Processing
200	Text data analysis for a smart city project in a developing nation Currin, Aubrey Jason January 2015 (has links) Increased urbanisation against the backdrop of limited resources is complicating city planning and management of functions including public safety. The smart city concept can help, but most previous smart city systems have focused on utilising automated sensors and analysing quantitative data. In developing nations, using the ubiquitous mobile phone as an enabler for crowdsourcing of qualitative public safety reports, from the public, is a more viable option due to limited resources and infrastructure limitations. However, there is no specific best method for the analysis of qualitative text reports for a smart city in a developing nation. The aim of this study, therefore, is the development of a model for enabling the analysis of unstructured natural language text for use in a public safety smart city project. Following the guidelines of the design science paradigm, the resulting model was developed through the inductive review of related literature, assessed and refined by observations of a crowdsourcing prototype and conversational analysis with industry experts and academics. The content analysis technique was applied to the public safety reports obtained from the prototype via computer assisted qualitative data analysis software. This has resulted in the development of a hierarchical ontology which forms an additional output of this research project. Thus, this study has shown how municipalities or local government can use CAQDAS and content analysis techniques to prepare large quantities of text data for use in a smart city. Human computation Human-computer interaction

Search results