Spelling suggestions: "subject:"batural language aprocessing"" "subject:"batural language eprocessing""
481 |
Context-Aware Adaptive Hybrid Semantic Relatedness in Biomedical ScienceJanuary 2016 (has links)
abstract: Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems such as relationship extraction, ontology creation and question / answering [1–6]. Several techniques exist in calculating semantic relatedness of two concepts. These techniques utilize different knowledge sources and corpora. So far, researchers attempted to find the best hybrid method for each domain by combining semantic relatedness techniques and data sources manually. In this work, attempts were made to eliminate the needs for manually combining semantic relatedness methods targeting any new contexts or resources through proposing an automated method, which attempted to find the best combination of semantic relatedness techniques and resources to achieve the best semantic relatedness score in every context. This may help the research community find the best hybrid method for each context considering the available algorithms and resources. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2016
|
482 |
Sentiment Analysis for Long-Term Stock PredictionJanuary 2016 (has links)
abstract: There have been extensive research in how news and twitter feeds can affect the outcome of a given stock. However, a majority of this research has studied the short term effects of sentiment with a given stock price. Within this research, I studied the long-term effects of a given stock price using fundamental analysis techniques. Within this research, I collected both sentiment data and fundamental data for Apple Inc., Microsoft Corp., and Peabody Energy Corp. Using a neural network algorithm, I found that sentiment does have an effect on the annual growth of these companies but the fundamentals are more relevant when determining overall growth. The stocks which show more consistent growth hold more importance on the previous year’s stock price but companies which have less consistency in their growth showed more reliance on the revenue growth and sentiment on the overall company and CEO. I discuss how I collected my research data and used a multi-layered perceptron to predict a threshold growth of a given stock. The threshold used for this particular research was 10%. I then showed the prediction of this threshold using my perceptron and afterwards, perform an f anova test on my choice of features. The results showed the fundamentals being the better predictor of stock information but fundamentals came in a close second in several cases, proving sentiment does hold an effect over long term growth. / Dissertation/Thesis / Masters Thesis Computer Science 2016
|
483 |
Programmable Insight: A Computational Methodology to Explore Online News Use of FramesJanuary 2017 (has links)
abstract: The Internet is a major source of online news content. Online news is a form of large-scale narrative text with rich, complex contents that embed deep meanings (facts, strategic communication frames, and biases) for shaping and transitioning standards, values, attitudes, and beliefs of the masses. Currently, this body of narrative text remains untapped due—in large part—to human limitations. The human ability to comprehend rich text and extract hidden meanings is far superior to known computational algorithms but remains unscalable. In this research, computational treatment is given to online news framing for exposing a deeper level of expressivity coined “double subjectivity” as characterized by its cumulative amplification effects. A visual language is offered for extracting spatial and temporal dynamics of double subjectivity that may give insight into social influence about critical issues, such as environmental, economic, or political discourse. This research offers benefits of 1) scalability for processing hidden meanings in big data and 2) visibility of the entire network dynamics over time and space to give users insight into the current status and future trends of mass communication. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2017
|
484 |
Word and Relation Embedding for Sentence RepresentationJanuary 2017 (has links)
abstract: In recent years, several methods have been proposed to encode sentences into fixed length continuous vectors called sentence representation or sentence embedding. With the recent advancements in various deep learning methods applied in Natural Language Processing (NLP), these representations play a crucial role in tasks such as named entity recognition, question answering and sentence classification.
Traditionally, sentence vector representations are learnt from its constituent word representations, also known as word embeddings. Various methods to learn the distributed representation (embedding) of words have been proposed using the notion of Distributional Semantics, i.e. “meaning of a word is characterized by the company it keeps”. However, principle of compositionality states that meaning of a sentence is a function of the meanings of words and also the way they are syntactically combined. In various recent methods for sentence representation, the syntactic information like dependency or relation between words have been largely ignored.
In this work, I have explored the effectiveness of sentence representations that are composed of the representation of both, its constituent words and the relations between the words in a sentence. The word and relation embeddings are learned based on their context. These general-purpose embeddings can also be used as off-the- shelf semantic and syntactic features for various NLP tasks. Similarity Evaluation tasks was performed on two datasets showing the usefulness of the learned word embeddings. Experiments were conducted on three different sentence classification tasks showing that our sentence representations outperform the original word-based sentence representations, when used with the state-of-the-art Neural Network architectures. / Dissertation/Thesis / Masters Thesis Computer Science 2017
|
485 |
Detecting Frames and Causal Relationships in Climate Change Related Text Databases Based on Semantic FeaturesJanuary 2018 (has links)
abstract: The subliminal impact of framing of social, political and environmental issues such as climate change has been studied for decades in political science and communications research. Media framing offers an “interpretative package" for average citizens on how to make sense of climate change and its consequences to their livelihoods, how to deal with its negative impacts, and which mitigation or adaptation policies to support. A line of related work has used bag of words and word-level features to detect frames automatically in text. Such works face limitations since standard keyword based features may not generalize well to accommodate surface variations in text when different keywords are used for similar concepts.
This thesis develops a unique type of textual features that generalize <subject, verb, object> triplets extracted from text, by clustering them into high-level concepts. These concepts are utilized as features to detect frames in text. Compared to uni-gram and bi-gram based models, classification and clustering using generalized concepts yield better discriminating features and a higher classification accuracy with a 12% boost (i.e. from 74% to 83% F-measure) and 0.91 clustering purity for Frame/Non-Frame detection.
The automatic discovery of complex causal chains among interlinked events and their participating actors has not yet been thoroughly studied. Previous studies related to extracting causal relationships from text were based on laborious and incomplete hand-developed lists of explicit causal verbs, such as “causes" and “results in." Such approaches result in limited recall because standard causal verbs may not generalize well to accommodate surface variations in texts when different keywords and phrases are used to express similar causal effects. Therefore, I present a system that utilizes generalized concepts to extract causal relationships. The proposed algorithms overcome surface variations in written expressions of causal relationships and discover the domino effects between climate events and human security. This semi-supervised approach alleviates the need for labor intensive keyword list development and annotated datasets. Experimental evaluations by domain experts achieve an average precision of 82%. Qualitative assessments of causal chains show that results are consistent with the 2014 IPCC report illuminating causal mechanisms underlying the linkages between climatic stresses and social instability. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2018
|
486 |
Inferência de emoções em fragmentos de textos obtidos do Facebook / Inference of emotions in fragments of texts obtained from the FacebookMedeiros, Richerland Pinto [UNESP] 27 April 2017 (has links)
Submitted by Richerland Pinto Medeiros null (rick.land@gmail.com) on 2017-06-27T15:12:38Z
No. of bitstreams: 1
DISSERTACAO_RICHERLAND_MEDEIROS.pdf: 1209454 bytes, checksum: 251490a058f4248162de9508b4627e65 (MD5) / Approved for entry into archive by LUIZA DE MENEZES ROMANETTO (luizamenezes@reitoria.unesp.br) on 2017-06-27T17:04:08Z (GMT) No. of bitstreams: 1
medeiros_rp_me_bauru.pdf: 1209454 bytes, checksum: 251490a058f4248162de9508b4627e65 (MD5) / Made available in DSpace on 2017-06-27T17:04:09Z (GMT). No. of bitstreams: 1
medeiros_rp_me_bauru.pdf: 1209454 bytes, checksum: 251490a058f4248162de9508b4627e65 (MD5)
Previous issue date: 2017-04-27 / Esta pesquisa tem como objetivo analisar o uso da técnica estatística de aprendizado de máquina Maximização de Entropia, voltado para tarefas de processamento de linguagem natural na inferência de emoções em textos obtidos da rede social Facebook. Foram estudados os conceitos primordiais das tarefas de processamento de linguagem natural, os conceitos inerentes a teoria da informação, bem como o aprofundamento no conceito de um modelo entrópico como classificador de textos. Os dados utilizados na presente pesquisa foram obtidos de textos curtos, ou seja, textos com no máximo 500 caracteres. A técnica em questão foi abordada dentro do aprendizado supervisionado de máquina, logo, parte dos dados coletados foram usados como exemplos marcados dentro de um conjunto de classes predefinidas, a fim de induzir o mecanismo de aprendizado a selecionar a classe de emoção mais provável dado o exemplo analisado. O método proposto obteve índice de assertividade médio de 90%, baseado no modelo de validação cruzada. / This research aims to analyze the use of entropy maximization machine learning statistical technique, focused on natural language processing tasks in the inferencing of emotions in short texts from Facebook social network. Were studied the primary concepts of natural language processing tasks, IT intrinsic concepts, as well as deepening the concept of Entropy model as a text classifier. All data used for this research came from short texts found in social networks and had 500 characters or less. The model was used within supervised machine learning, therefore, part of the collected data was used as examples marked within a set of predefined classes in order to induce the learning mechanism to select the most probable emotion class given the analyzed sample. The method has obtained the mean accuracy rate of 90%, based on the cross-validation model.
|
487 |
Identificação e tratamento de expressões multipalavras aplicado à recuperação de informação / Identification and treatment of multiword expressions applied to information retrievalAcosta, Otavio Costa January 2011 (has links)
A vasta utilização de Expressões Multipalavras em textos de linguagem natural requer atenção para um estudo aprofundado neste assunto, para que posteriormente seja possível a manipulação e o tratamento, de forma robusta, deste tipo de expressão. Uma Expressão Multipalavra costuma transmitir precisamente conceitos e ideias que geralmente não podem ser expressos por apenas uma palavra e estima-se que sua frequência, em um léxico de um falante nativo, seja semelhante à quantidade de palavras simples. A maioria das aplicações reais simplesmente ignora ou lista possíveis termos compostos, porém os identifica e trata seus itens lexicais individualmente e não como uma unidade de conceito. Para o sucesso de uma aplicação de Processamento de Linguagem Natural, que envolva processamento semântico, é necessário um tratamento diferenciado para essas expressões. Com o devido tratamento, é investigada a hipótese das Expressões Multipalavras possibilitarem uma melhora nos resultados de uma aplicação, tal como os sistemas de Recuperação de Informação. Os objetivos desse trabalho estão voltados ao estudo de técnicas de descoberta automática de Expressões Multipalavras, permitindo a criação de dicionários, para fins de indexação, em um mecanismo de Recuperação de Informação. Resultados experimentais apontaram melhorias na recuperação de documentos relevantes, ao identificar Expressões Multipalavras e tratá-las como uma unidade de indexação única. / The use of Multiword Expressions (MWE) in natural language texts requires a detailed study, to further support in manipulating and processing, robustly, these kinds of expression. A MWE typically gives concepts and ideas that usually cannot be expressed by a single word and it is estimated that the number of MWEs in the lexicon of a native speaker is similar to the number of single words. Most real applications simply ignore them or create a list of compounds, treating and identifying them as isolated lexical items and not as an individual unit. For the success of a Natural Language Processing (NLP) application, involving semantic processing, adequate treatment for these expressions is required. In this work we investigate the hypothesis that an appropriate identification of Multiword Expressions provide better results in an application, such as Information Retrieval (IR). The objectives of this work are to compare techniques of MWE extraction for creating MWE dictionaries, to be used for indexing purposes in IR. Experimental results show qualitative improvements on the retrieval of relevant documents when identifying MWEs and treating them as a single indexing unit.
|
488 |
Extração multilíngue de termos multipalavra em corpora comparáveisPrestes, Kassius Vargas January 2015 (has links)
Este trabalho investiga técnicas de extração de termos multipalavra a partir de corpora comparáveis, que são conjuntos de textos em duas (ou mais) línguas sobre o mesmo domínio. A extração de termos, especialmente termos multipalavra é muito importante para auxiliar a criação de terminologias, ontologias e o aperfeiçoamento de tradutores automáticos. Neste trabalho utilizamos um corpus comparável português/inglês e queremos encontrar termos e seus equivalentes em ambas as línguas. Para isso começamos com a extração dos termos separadamente em cada língua, utilizando padrões morfossintáticos para identificar os n-gramas (sequências de n palavras) mais prováveis de serem termos importantes para o domínio. A partir dos termos de cada língua, utilizamos o contexto, isto é, as palavras que ocorrem no entorno dos termos para comparar os termos das diferentes línguas e encontrar os equivalentes bilíngues. Tínhamos como objetivos principais neste trabalho fazer a identificação monolíngue de termos, aplicar as técnicas de alinhamento para o português e avaliar os diferentes parâmetros de tamanho e tipo (PoS utilizados) de janela para a extração de contexto. Esse é o primeiro trabalho a aplicar essa metodologia para o Português e apesar da falta de alguns recursos léxicos e computacionais (como dicionários bilíngues e parsers) para essa língua, conseguimos alcançar resultados comparáveis com o estado da arte para trabalhos em Francês/Inglês. / This work investigates techniques for multiword term extraction from comparable corpora, which are sets of texts in two (or more) languages about the same topic. Term extraction, specially multiword terms is very important to help the creation of terminologies, ontologies and the improvement of machine translation. In this work we use a comparable corpora Portuguese/ English and want to find terms and their equivalents in both languages. To do this we start with separate term extraction for each language. Using morphossintatic patterns to identify n-grams (sequences of n words) most likely to be important terms of the domain. From the terms of each language, we use their context, i. e., the words that occurr around the term to compare the terms of different languages and to find the bilingual equivalents. We had as main goals in this work identificate monolingual terms, apply alignment techniques for Portuguese and evaluate the different parameters of size and type (used PoS) of window to the context extraction. This is the first work to apply this methodology to Portuguese and in spite of the lack of lexical and computational resources (like bilingual dictionaries and parsers) for this language, we achieved results comparable to state of the art in French/English.
|
489 |
All Purpose Textual Data Information Extraction, Visualization and QueryingJanuary 2018 (has links)
abstract: Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and extraction worthy. Using data visualization theory and fast, interactive querying methods, leaving out information might not really be necessary. This thesis explores textual data visualization techniques, intuitive querying, and a novel approach to all-purpose textual information extraction to encode large text corpus to improve human understanding of the information present in textual data.
This thesis presents a modified traversal algorithm on dependency parse output of text to extract all subject predicate object pairs from text while ensuring that no information is missed out. To support full scale, all-purpose information extraction from large text corpuses, a data preprocessing pipeline is recommended to be used before the extraction is run. The output format is designed specifically to fit on a node-edge-node model and form the building blocks of a network which makes understanding of the text and querying of information from corpus quick and intuitive. It attempts to reduce reading time and enhancing understanding of the text using interactive graph and timeline. / Dissertation/Thesis / Masters Thesis Software Engineering 2018
|
490 |
A High Level Language for Human Robot InteractionJanuary 2012 (has links)
abstract: While developing autonomous intelligent robots has been the goal of many research programs, a more practical application involving intelligent robots is the formation of teams consisting of both humans and robots. An example of such an application is search and rescue operations where robots commanded by humans are sent to environments too dangerous for humans. For such human-robot interaction, natural language is considered a good communication medium as it allows humans with less training about the robot's internal language to be able to command and interact with the robot. However, any natural language communication from the human needs to be translated to a formal language that the robot can understand. Similarly, before the robot can communicate (in natural language) with the human, it needs to formulate its communique in some formal language which then gets translated into natural language. In this paper, I develop a high level language for communication between humans and robots and demonstrate various aspects through a robotics simulation. These language constructs borrow some ideas from action execution languages and are grounded with respect to simulated human-robot interaction transcripts. / Dissertation/Thesis / M.S. Computer Science 2012
|
Page generated in 0.0859 seconds