Spelling suggestions: "subject:"istatistical natural language processing"" "subject:"bystatistical natural language processing""
1 |
Efficient algorithms for infinite-state recursive stochastic models and Newton's methodStewart, Alistair Mark January 2015 (has links)
Some well-studied infinite-state stochastic models give rise to systems of nonlinear equations. These systems of equations have solutions that are probabilities, generally probabilities of termination in the model. We are interested in finding efficient, preferably polynomial time, algorithms for calculating probabilities associated with these models. The chief tool we use to solve systems of polynomial equations will be Newton’s method as suggested by [EY09]. The main contribution of this thesis is to the analysis of this and related algorithms. We give polynomial-time algorithms for calculating probabilities for broad classes of models for which none were known before. Stochastic models that give rise to such systems of equations include such classic and heavily-studied models as Multi-type Branching Processes, Stochastic Context- Free Grammars(SCFGs) and Quasi Birth-Death Processes. We also consider models that give rise to infinite-state Markov Decision Processes (MDPs) by giving algorithms for approximating optimal probabilities and finding policies that give probabilities close to the optimal probability, in several classes of infinite-state MDPs. Our algorithms for analysing infinite-state MDPs rely on a non-trivial generalization of Newton’s method that works for the max/min polynomial systems that arise as Bellman optimality equations in these models. For SCFGs, which are used in statistical natural language processing, in addition to approximating termination probabilities, we analyse algorithms for approximating the probability that a grammar produces a given string, or produces a string in a given regular language. In most cases, we show that we can calculate an approximation to the relevant probability in time polynomial in the size of the model and the number of bits of desired precision. We also consider more general systems of monotone polynomial equations. For such systems we cannot give a polynomial-time algorithm, which pre-existing hardness results render unlikely, but we can still give an algorithm with a complexity upper bound which is exponential only in some parameters that are likely to be bounded for the monotone polynomial equations that arise for many interesting stochastic models.
|
2 |
Topical Opinion RetrievalSkomorowski, Jason January 2006 (has links)
With a growing amount of subjective content distributed across the Web, there is a need for a domain-independent information retrieval system that would support ad hoc retrieval of documents expressing opinions on a specific topic of the user’s query. While the research area of opinion detection and sentiment analysis has received much attention in the recent years, little research has been done on identifying subjective content targeted at a specific topic, i.e. expressing topical opinion. This thesis presents a novel method for ad hoc retrieval of documents which contain subjective content on the topic of the query. Documents are ranked by the likelihood each document expresses an opinion on a query term, approximated as the likelihood any occurrence of the query term is modified by a subjective adjective. Domain-independent user-based evaluation of the proposed methods was conducted, and shows statistically significant gains over Google ranking as the baseline.
|
3 |
Topical Opinion RetrievalSkomorowski, Jason January 2006 (has links)
With a growing amount of subjective content distributed across the Web, there is a need for a domain-independent information retrieval system that would support ad hoc retrieval of documents expressing opinions on a specific topic of the user’s query. While the research area of opinion detection and sentiment analysis has received much attention in the recent years, little research has been done on identifying subjective content targeted at a specific topic, i.e. expressing topical opinion. This thesis presents a novel method for ad hoc retrieval of documents which contain subjective content on the topic of the query. Documents are ranked by the likelihood each document expresses an opinion on a query term, approximated as the likelihood any occurrence of the query term is modified by a subjective adjective. Domain-independent user-based evaluation of the proposed methods was conducted, and shows statistically significant gains over Google ranking as the baseline.
|
4 |
Extração de conhecimento de laudos de radiologia torácica utilizando técnicas de processamento estatístico de linguagem natural. / Knowledge extraction from reports of radiology thoracic using techniques of statistical processing of natural language.Zerbinatti, Leandro 15 April 2010 (has links)
Este trabalho promove um estudo em informática em saúde no qual se analisam laudos de radiologia torácica através de métodos de processamento estatístico de linguagem natural com o intuito de subsidiar a interoperabilidade entre sistemas de saúde. Foram utilizados 2000 laudos de radiologia do tórax para a extração de conhecimento identificando-se as palavras, n-gramas e frases que os compõem. Foi calculado o índice de Zipf e verificou-se que poucas palavras compõem a maioria dos laudos e que a maioria das palavras não tem representatividade estatística A partir dos termos identificados foi realizada a tradução e a comparação da existência desses em um vocabulário médico padronizado com terminologia internacional, o SNOMEDCT. Os termos que tinham uma relação completa e direta com os termos traduzidos foram incorporados nos termos de referência juntamente com a classe à qual o termo pertence e seu identificador. Foram selecionados outros 200 laudos de radiologia de tórax para realizar o experimento de rotulação dos termos em relação à referência. A eficiência obtida neste estágio, que é o percentual de rotulação dos laudos, foi de 45,55%. A partir de então foram incorporados aos termos de referência, sob a classe de conceito de ligação, artigos, preposições e pronomes. É importante ressaltar que esses termos não adicionam conhecimento de saúde ao texto. A eficiência obtida foi de 73,23%, aumentando significativamente a eficiência obtida anteriormente. Finalizamos o trabalho com algumas formas de aplicação dos laudos rotulados para a interoperabilidade de sistemas, utilizando para isto ontologias, o HL7 CDA (Clinical Documents Architecture) e o modelo de arquétipos da Fundação OpenEHR. / This work promotes a study in health informatics technology which analyses reports of chest X-ray through statistical natural language processing methods for the purpose of supporting the interoperability between health systems. Two thousand radiology reports were used for the extraction of knowledge by identifying the words, n-grams and phrases of reports. Zipfs constant was studied and it was determined that few words make up the majority of the reports and that most of the words do not have statistical significance. The translation and comparison with exisiting standardized medical vocabulary with international terminology, called SNOMED-CT, was done based on the terms identified. The terms that had a complete and direct correlation with the translated terms were incorporated into the reference terms along with its class and the word identifier. Another 200 reports of chest x-rays were selected to perform the terms tagging experiment of with respect to the reference. The efficiency obtained, which is the percentage of labeling of the reports, was 45.55%. Subsequentely, articles, prepositions and pronouns were incorporated into the terms of reference under the linkage concept of class. It is important to note that these terms do not carry health knowledge to the text. Thus, the efficiency ratio was 73.23%, significantly increasing the efficiency obtained previously. The study was concluded with some forms of application of the reports tagged for system interoperability, using different ontologies, the HL7 CDA (Clinical Documents Architecture) and the archetypes at OpenEHR Fondation.
|
5 |
Extração de conhecimento de laudos de radiologia torácica utilizando técnicas de processamento estatístico de linguagem natural. / Knowledge extraction from reports of radiology thoracic using techniques of statistical processing of natural language.Leandro Zerbinatti 15 April 2010 (has links)
Este trabalho promove um estudo em informática em saúde no qual se analisam laudos de radiologia torácica através de métodos de processamento estatístico de linguagem natural com o intuito de subsidiar a interoperabilidade entre sistemas de saúde. Foram utilizados 2000 laudos de radiologia do tórax para a extração de conhecimento identificando-se as palavras, n-gramas e frases que os compõem. Foi calculado o índice de Zipf e verificou-se que poucas palavras compõem a maioria dos laudos e que a maioria das palavras não tem representatividade estatística A partir dos termos identificados foi realizada a tradução e a comparação da existência desses em um vocabulário médico padronizado com terminologia internacional, o SNOMEDCT. Os termos que tinham uma relação completa e direta com os termos traduzidos foram incorporados nos termos de referência juntamente com a classe à qual o termo pertence e seu identificador. Foram selecionados outros 200 laudos de radiologia de tórax para realizar o experimento de rotulação dos termos em relação à referência. A eficiência obtida neste estágio, que é o percentual de rotulação dos laudos, foi de 45,55%. A partir de então foram incorporados aos termos de referência, sob a classe de conceito de ligação, artigos, preposições e pronomes. É importante ressaltar que esses termos não adicionam conhecimento de saúde ao texto. A eficiência obtida foi de 73,23%, aumentando significativamente a eficiência obtida anteriormente. Finalizamos o trabalho com algumas formas de aplicação dos laudos rotulados para a interoperabilidade de sistemas, utilizando para isto ontologias, o HL7 CDA (Clinical Documents Architecture) e o modelo de arquétipos da Fundação OpenEHR. / This work promotes a study in health informatics technology which analyses reports of chest X-ray through statistical natural language processing methods for the purpose of supporting the interoperability between health systems. Two thousand radiology reports were used for the extraction of knowledge by identifying the words, n-grams and phrases of reports. Zipfs constant was studied and it was determined that few words make up the majority of the reports and that most of the words do not have statistical significance. The translation and comparison with exisiting standardized medical vocabulary with international terminology, called SNOMED-CT, was done based on the terms identified. The terms that had a complete and direct correlation with the translated terms were incorporated into the reference terms along with its class and the word identifier. Another 200 reports of chest x-rays were selected to perform the terms tagging experiment of with respect to the reference. The efficiency obtained, which is the percentage of labeling of the reports, was 45.55%. Subsequentely, articles, prepositions and pronouns were incorporated into the terms of reference under the linkage concept of class. It is important to note that these terms do not carry health knowledge to the text. Thus, the efficiency ratio was 73.23%, significantly increasing the efficiency obtained previously. The study was concluded with some forms of application of the reports tagged for system interoperability, using different ontologies, the HL7 CDA (Clinical Documents Architecture) and the archetypes at OpenEHR Fondation.
|
Page generated in 0.1597 seconds